top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Difference between loc and iloc in pandas



In Data analysis, it is very important how you select data or in another terms Slicing and Dicing of data from a Data frame. For that pandas libarary of python provides two very useful function loc[] and iloc[] . They are easy to understand , quick and fast.


We will further study more about these functions in this article. Like how to select data from data frame using these functions via single value , via list of value , via range of data and etc..


Difference between loc[] and iloc[] :


The main distinct difference between loc[] and iloc[] is :


loc[]:


loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.



  • START is the name of the row/column label. If we do not provide anything in START then loc[] will takes rows from beginning.

  • STOP is the name of the last row/column label to take. If we do not provide anything in STOP then loc[] will takes all the row/column.

  • STEP as the number of indices to advance after each extraction

iloc[]:


iloc[] is index-based (integer-position based) to select rows and/or columns in pandas.



  • START is the integer index of the row/column. If we do not provide anything in START then iloc[] will take from the first row/column.

  • STOP is the integer index of the last row/column where you wanted to stop the selection. If we do not provide anything in STOP then iloc[] will take all rows/columns.

  • STEP as the number of indices to advance after each extraction.



Lets take more example with the Data set below :



In this Dataset we have 17 columns :


Index(['id', 'city', 'date', 'player_of_match', 'venue', 'neutral_venue','team1', 'team2', 'toss_winner', 'toss_decision', 'winner', 'result', 'result_margin', 'eliminator', 'method', 'umpire1', 'umpire2'],dtype='object')

Selecting data by single value :


Both loc and iloc allow input to be a single value. We can use the following syntax for data selection:

  • loc[row_label, column_label]

  • iloc[row_position, column_position]

To get single row we use :


df.loc[0] , df.iloc[1]

id                                      335982
city                                 Bangalore
date                                2008-04-18
player_of_match                    BB McCullum 
venue                    M Chinnaswamy Stadium 
neutral_venue                                0
team1              Royal Challengers Bangalore
team2                    Kolkata Knight Riders
toss_winner        Royal Challengers Bangalore 
toss_decision                            field 
winner                   Kolkata Knight Riders 
result                                    runs 
result_margin                            140.0 
eliminator                                   N 
method                                     NaN 
umpire1                              Asad Rauf 
umpire2                            RE Koertzen 
Name: 0, dtype: object

To get all value of 'City' columns :


df.loc[:,'city']
0       Bangalore
1      Chandigarh
2           Delhi
3          Mumbai
4         Kolkata
          ...    
811         Dubai
812         Dubai
813     Abu Dhabi
814     Abu Dhabi
815         Dubai
Name: city, Length: 816, dtype: object


## The equivalent "iloc" statement 
>>> df.iloc[:,1]

Selecting data by a List :


We can pass a list of labels to loc to select multiple rows or columns:

df.loc[:,['city','player_of_match','venue']]


## The equivalent "iloc" statement 
>>>df.iloc[:,[1,3,4]]

Selecting a Range of Data via slice :


Slice (written as start:stop:step) is a powerful technique that allows selecting a range of data. It is very useful when we want to select everything in between two items . As we have seen in above diagrams .


df.loc[1:4,'city':'venue']

In the above code snippet we want to select row from index 1 to 4 and all the column in between "city" and "venue"




## The equivalent "iloc" statement
>>>df.iloc[1:5,1:5]

In the above code snippet we use our STOP as "5" because iloc take index position value so it will start with "1" row and stop at "n-1" row i.e "5-1=4"


Selecting via conditions :


We often want to select data based on a condition. i.e we want 'player_of_match', 'team1', 'winner' of the match played in city = "Delhi"


df.loc[df['city'] == 'Delhi',['player_of_match','team1','winner',]]

Output is:



df.iloc[df['city'] == 'Delhi',[3,6,10]] 

For iloc, we will get a ValueError if pass the condition straight into the statement


We get the error because iloc cannot accept a boolean Series. It only accepts a boolean list. We can use the list() function to convert a Series into a boolean list.


   df.iloc[list(df['city'] == 'Delhi')]

Output :



Conclusion :


So in this article we see difference between loc[] and iloc[]. loc[] is label based and iloc[] is index based and we can not perform conditions directly to iloc[] for that we have to convert it into list. Both are majorly use in Slicing and Dicing of data.

459 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page