Recently I finished a course Data Analysis with Python: Zeros To Pandas offered by Jovian.ml. I thoroughly enjoyed to be the part of this six week long course and each module of this course helped me in learning new things.
So through this blog I will be sharing few of the important tools and functions which I found useful while completing various assignments and the course project.
Loading Data — Essential :)
This is basic code to load data from the online repositories of datasets. We can copy link of data and then download it with help of urlretrieve. Following which we can import pandas Python library and create a dataframe using read_csv passing the argument of file name which here is ‘ italy-covid-daywise.csv’.
Viewing the dataframe
Sometimes the data set on which we want to research contains thousands of columns or rows and these many rows and columns by default are not shown in output.To study all of them we can use set_option and pass “max_columns” or “max_rows” as parameters for viewing all columns or all rows as per our need.
The above code is using the utility of groupby function, groupby takes parameter in form of column names basis on which we have to group our data for better understanding. Above code groups data into columns of Country which has data of its female and male participants in survey in Gender column and corresponding columns to denote total count of respective gender.
Visualisation Of Data
Visualisation not only serves the purpose of better presentation but also helps in recognising the anomalies in data entries.
countplot serves the purpose of plotting the frequency of various common entries in a particular column of dataframe, palette=’Set1’ here is colour scheme of plot.
Here we have plotted the frequency of Windows, MacOS, Linux and Unix entries in column name OperatingSystem of survey_df dataframe.
So that is it for this blog , I will be sharing more in coming days till then you can go through my course project provided below.