Python is popular with developers because of many good reasons:
Clear and easy syntax
Easy to read, learn and understand
Type declarations are not required
Memory management is fast and automatic
Makes it easy to write shorter code than other programming languages.
Python has many useful and important packages for doing data analysis. One of those packages is Pandas which makes importing and analyzing data much easier.
Explaining following methods of Pandas package:
If csv file has null values then they are displayed as NaN in Data Frame. The Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.
Pandas provide drop() method to remove rows by using index label or column name. This method is used to drop rows that do not satisfy the given conditions. This helps data analysts to delete and filter Data Frame.
==> drop method
For drop() method, below is an example where a csv file is read and stored in a Data Frame. A list is defined that contains the names of all the columns we want to drop. Next, we call the drop() function passing the axis parameter as 1.
This tells Pandas that we want the changes to be made directly and it should look for the values to be dropped in the cloumn names provided in the ‘to_drop’ list.
import pandas as pd
import numpy as np
#Reading the csv file keeping the low_memory parameter as false
df = pd.read_csv('../input/chat-history/result_ED.csv', low_memory=False)
#Defined a list that contains the names of all the columns we want to drop.
#Next, we call the drop() function passing the axis parameter as 1.
#This tells Pandas that we want the changes to be made directly and it should look
#for the values to be dropped in the cloumn names provided in the 'to_drop' list.
to_drop = ['date','action','title','inviter','photo','width','height']
result_df = df.drop(to_drop, axis=1)
In below output, we see that the columns ‘date’, ’action’, ’title’, ’inviter’, ’photo’, ’width’ and ’height’ are not displayed. The remaining columns of the Data Frame are displayed in the output.
==> dropna method
For dropna() method, using the same example to drop the NaN values from the list of columns [‘id’,’from’,’reply_to_message_id’]
And the remaining columns ‘actor’ and ‘actor_id’ from the Data Frame ‘df_sorted_data’ still have NaN values.
#Dropping the NaN values using the axis parameter as '0'
#And from the list of columns ['id','from','reply_to_message_id'] mentioned in subset parameter
#And the remaining columns 'actor' and 'actor_id' from the Data Frame 'df_sorted_data' still have NaN values
df_sorted_data = result_df.dropna(axis=0, subset=['id','from','reply_to_message_id'])
The row count from the two outputs changed from 14566 to 4882 because the rows were deleted by the dropna() method. For dropping rows with NaN values using only the column names mentioned in subset parameter
Thanks for reading!