hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Difference between loc and iloc in pandas


Photo by Emily Morter on Unsplash


In Data analysis, it is very important how you select data or in another terms Slicing and Dicing of data from a Data frame. For that pandas libarary of python provides two very useful function loc[] and iloc[] . They are easy to understand , quick and fast.


We will further study more about these functions in this article. Like how to select data from data frame using these functions via single value , via list of value , via range of data and etc..


Difference between loc[] and iloc[] :


The main distinct difference between loc[] and iloc[] is :


loc[]:


loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.



  • START is the name of the row/column label. If we do not provide anything in START then loc[] will takes rows from beginning.

  • STOP is the name of the last row/column label to take. If we do not provide anything in STOP then loc[] will takes all the row/column.

  • STEP as the number of indices to advance after each extraction

iloc[]:


iloc[] is index-based (integer-position based) to select rows and/or columns in pandas.



  • START is the integer index of the row/column. If we do not provide anything in START then iloc[] will take from the first row/column.

  • STOP is the integer index of the last row/column where you wanted to stop the selection. If we do not provide anything in STOP then iloc[] will take all rows/columns.

  • STEP as the number of indices to advance after each extraction.



Lets take more example with the Data set below :



In this Dataset we have 17 columns :


Index(['id', 'city', 'date', 'player_of_match', 'venue', 'neutral_venue','team1', 'team2', 'toss_winner', 'toss_decision', 'winner', 'result', 'result_margin', 'eliminator', 'method', 'umpire1', 'umpire2'],dtype='object')

Selecting data by single value :


Both loc and iloc allow input to be a single value. We can use the following syntax for data selection:

  • loc[row_label, column_label]

  • iloc[row_position, column_position]

To get single row we use :


df.loc[0] , df.iloc[1]

id                                      335982
city                                 Bangalore
date                                2008-04-18
player_of_match                    BB McCullum 
venue                    M Chinnaswamy Stadium 
neutral_venue                                0
team1              Royal Challengers Bangalore
team2                    Kolkata Knight Riders
toss_winner        Royal Challengers Bangalore 
toss_decision                            field 
winner                   Kolkata Knight Riders 
result                                    runs 
result_margin                            140.0 
eliminator                                   N 
method                                     NaN 
umpire1                              Asad Rauf 
umpire2                            RE Koertzen 
Name: 0, dtype: object

To get all value of 'City' columns :


df.loc[:,'city']
0       Bangalore
1      Chandigarh
2           Delhi
3          Mumbai
4         Kolkata
          ...    
811         Dubai
812         Dubai
813     Abu Dhabi
814     Abu Dhabi
815         Dubai
Name: city, Length: 816, dtype: object


## The equivalent "iloc" statement 
>>> df.iloc[:,1]

Selecting data by a List :


We can pass a list of labels to loc to select multiple rows or columns:

df.loc[:,['city','player_of_match','venue']]


## The equivalent "iloc" statement 
>>>df.iloc[:,[1,3,4]]

Selecting a Range of Data via slice :


Slice (written as start:stop:step) is a powerful technique that allows selecting a range of data. It is very useful when we want to select everything in between two items . As we have seen in above diagrams .


df.loc[1:4,'city':'venue']

In the above code snippet we want to select row from index 1 to 4 and all the column in between "city" and "venue"




## The equivalent "iloc" statement
>>>df.iloc[1:5,1:5]

In the above code snippet we use our STOP as "5" because iloc take index position value so it will start with "1" row and stop at "n-1" row i.e "5-1=4"


Selecting via conditions :


We often want to select data based on a condition. i.e we want 'player_of_match', 'team1', 'winner' of the match played in city = "Delhi"


df.loc[df['city'] == 'Delhi',['player_of_match','team1','winner',]]

Output is:



df.iloc[df['city'] == 'Delhi',[3,6,10]] 

For iloc, we will get a ValueError if pass the condition straight into the statement


We get the error because iloc cannot accept a boolean Series. It only accepts a boolean list. We can use the list() function to convert a Series into a boolean list.


   df.iloc[list(df['city'] == 'Delhi')]

Output :



Conclusion :


So in this article we see difference between loc[] and iloc[]. loc[] is label based and iloc[] is index based and we can not perform conditions directly to iloc[] for that we have to convert it into list. Both are majorly use in Slicing and Dicing of data.

161 views0 comments

Recent Posts

See All