dropna() and drop() in Python

Python is popular with developers because of many good reasons:

  • Clear and easy syntax

  • Easy to read, learn and understand

  • Type declarations are not required

  • Memory management is fast and automatic

  • Makes it easy to write shorter code than other programming languages.

Python has many useful and important packages for doing data analysis. One of those packages is Pandas which makes importing and analyzing data much easier. 


Explaining following methods of Pandas package:

  • dropna() method

  • drop() method

Pandas DataFrame.dropna()

  • If csv file has null values then they are displayed as NaN in Data Frame. The Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.



Syntax:



Parameters:




Pandas DataFrame.drop()


Pandas provide drop() method to remove rows by using index label or column name. This method is used to drop rows that do not satisfy the given conditions. This helps data analysts to delete and filter Data Frame.


Syntax:

Parameters:




Examples:


==> drop method

  • For drop() method, below is an example where a csv file is read and stored in a Data Frame. A list is defined that contains the names of all the columns we want to drop. Next, we call the drop() function passing the axis parameter as 1.

  • This tells Pandas that we want the changes to be made directly and it should look for the values to be dropped in the cloumn names provided in the ‘to_drop’ list.


#Importing pandas


import pandas as pd

import numpy as np


#Reading the csv file keeping the low_memory parameter as false


df = pd.read_csv('../input/chat-history/result_ED.csv', low_memory=False)

df


#Defined a list that contains the names of all the columns we want to drop.

#Next, we call the drop() function passing the axis parameter as 1.

#This tells Pandas that we want the changes to be made directly and it should look

#for the values to be dropped in the cloumn names provided in the 'to_drop' list.


to_drop = ['date','action','title','inviter','photo','width','height']

result_df = df.drop(to_drop, axis=1)


result_df


Output:

In below output, we see that the columns ‘date’, ’action’, ’title’, ’inviter’, ’photo’, ’width’ and ’height’ are not displayed. The remaining columns of the Data Frame are displayed in the output.




==> dropna method

  • For dropna() method, using the same example to drop the NaN values from the list of columns [‘id’,’from’,’reply_to_message_id’] 

  • And the remaining columns ‘actor’ and ‘actor_id’ from the Data Frame ‘df_sorted_data’ still have NaN values.


#Dropping the NaN values using the axis parameter as '0'

#And from the list of columns ['id','from','reply_to_message_id'] mentioned in subset parameter

#And the remaining columns 'actor' and 'actor_id' from the Data Frame 'df_sorted_data' still have NaN values


df_sorted_data = result_df.dropna(axis=0, subset=['id','from','reply_to_message_id'])

df_sorted_data


Output:




  • The row count from the two outputs changed from 14566 to 4882 because the rows were deleted by the dropna() method. For dropping rows with NaN values using only the column names mentioned in subset parameter



Thanks for reading!

44 views0 comments

Recent Posts

See All
 

© Numpy Ninja.