Photo by Emily Morter on Unsplash
In Data analysis, it is very important how you select data or in another terms Slicing and Dicing of data from a Data frame. For that pandas libarary of python provides two very useful function loc[] and iloc[] . They are easy to understand , quick and fast.
We will further study more about these functions in this article. Like how to select data from data frame using these functions via single value , via list of value , via range of data and etc..
Difference between loc[] and iloc[] :
The main distinct difference between loc[] and iloc[] is :
loc[]:
loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.
START is the name of the row/column label. If we do not provide anything in START then loc[] will takes rows from beginning.
STOP is the name of the last row/column label to take. If we do not provide anything in STOP then loc[] will takes all the row/column.
STEP as the number of indices to advance after each extraction
iloc[]:
iloc[] is index-based (integer-position based) to select rows and/or columns in pandas.
START is the integer index of the row/column. If we do not provide anything in START then iloc[] will take from the first row/column.
STOP is the integer index of the last row/column where you wanted to stop the selection. If we do not provide anything in STOP then iloc[] will take all rows/columns.
STEP as the number of indices to advance after each extraction.
Lets take more example with the Data set below :
In this Dataset we have 17 columns :
Index(['id', 'city', 'date', 'player_of_match', 'venue', 'neutral_venue','team1', 'team2', 'toss_winner', 'toss_decision', 'winner', 'result', 'result_margin', 'eliminator', 'method', 'umpire1', 'umpire2'],dtype='object')
Selecting data by single value :
Both loc and iloc allow input to be a single value. We can use the following syntax for data selection:
loc[row_label, column_label]
iloc[row_position, column_position]
To get single row we use :
df.loc[0] , df.iloc[1]
id 335982
city Bangalore
date 2008-04-18
player_of_match BB McCullum
venue M Chinnaswamy Stadium
neutral_venue 0
team1 Royal Challengers Bangalore
team2 Kolkata Knight Riders
toss_winner Royal Challengers Bangalore
toss_decision field
winner Kolkata Knight Riders
result runs
result_margin 140.0
eliminator N
method NaN
umpire1 Asad Rauf
umpire2 RE Koertzen
Name: 0, dtype: object
To get all value of 'City' columns :
df.loc[:,'city']
0 Bangalore
1 Chandigarh
2 Delhi
3 Mumbai
4 Kolkata
...
811 Dubai
812 Dubai
813 Abu Dhabi
814 Abu Dhabi
815 Dubai
Name: city, Length: 816, dtype: object
## The equivalent "iloc" statement
>>> df.iloc[:,1]
Selecting data by a List :
We can pass a list of labels to loc to select multiple rows or columns:
df.loc[:,['city','player_of_match','venue']]
## The equivalent "iloc" statement
>>>df.iloc[:,[1,3,4]]
Selecting a Range of Data via slice :
Slice (written as start:stop:step) is a powerful technique that allows selecting a range of data. It is very useful when we want to select everything in between two items . As we have seen in above diagrams .
df.loc[1:4,'city':'venue']
In the above code snippet we want to select row from index 1 to 4 and all the column in between "city" and "venue"
## The equivalent "iloc" statement
>>>df.iloc[1:5,1:5]
In the above code snippet we use our STOP as "5" because iloc take index position value so it will start with "1" row and stop at "n-1" row i.e "5-1=4"
Selecting via conditions :
We often want to select data based on a condition. i.e we want 'player_of_match', 'team1', 'winner' of the match played in city = "Delhi"
df.loc[df['city'] == 'Delhi',['player_of_match','team1','winner',]]
Output is:
df.iloc[df['city'] == 'Delhi',[3,6,10]]
For iloc, we will get a ValueError if pass the condition straight into the statement
We get the error because iloc cannot accept a boolean Series. It only accepts a boolean list. We can use the list() function to convert a Series into a boolean list.
df.iloc[list(df['city'] == 'Delhi')]
Output :
Conclusion :
So in this article we see difference between loc[] and iloc[]. loc[] is label based and iloc[] is index based and we can not perform conditions directly to iloc[] for that we have to convert it into list. Both are majorly use in Slicing and Dicing of data.
Comments