In Data analysis, it is very important how you select data or in another terms Slicing and Dicing of data from a Data frame. For that pandas libarary of python provides two very useful function loc and iloc . They are easy to understand , quick and fast.
We will further study more about these functions in this article. Like how to select data from data frame using these functions via single value , via list of value , via range of data and etc..
Difference between loc and iloc :
The main distinct difference between loc and iloc is :
loc is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.
START is the name of the row/column label. If we do not provide anything in START then loc will takes rows from beginning.
STOP is the name of the last row/column label to take. If we do not provide anything in STOP then loc will takes all the row/column.
STEP as the number of indices to advance after each extraction
iloc is index-based (integer-position based) to select rows and/or columns in pandas.
START is the integer index of the row/column. If we do not provide anything in START then iloc will take from the first row/column.
STOP is the integer index of the last row/column where you wanted to stop the selection. If we do not provide anything in STOP then iloc will take all rows/columns.
STEP as the number of indices to advance after each extraction.
Lets take more example with the Data set below :
In this Dataset we have 17 columns :
Index(['id', 'city', 'date', 'player_of_match', 'venue', 'neutral_venue','team1', 'team2', 'toss_winner', 'toss_decision', 'winner', 'result', 'result_margin', 'eliminator', 'method', 'umpire1', 'umpire2'],dtype='object')
Selecting data by single value :
Both loc and iloc allow input to be a single value. We can use the following syntax for data selection:
To get single row we use :
df.loc , df.iloc
id 335982 city Bangalore date 2008-04-18 player_of_match BB McCullum venue M Chinnaswamy Stadium neutral_venue 0 team1 Royal Challengers Bangalore team2 Kolkata Knight Riders toss_winner Royal Challengers Bangalore toss_decision field winner Kolkata Knight Riders result runs result_margin 140.0 eliminator N method NaN umpire1 Asad Rauf umpire2 RE Koertzen Name: 0, dtype: object
To get all value of 'City' columns :
0 Bangalore 1 Chandigarh 2 Delhi 3 Mumbai 4 Kolkata ... 811 Dubai 812 Dubai 813 Abu Dhabi 814 Abu Dhabi 815 Dubai Name: city, Length: 816, dtype: object ## The equivalent "iloc" statement >>> df.iloc[:,1]
Selecting data by a List :
We can pass a list of labels to loc to select multiple rows or columns:
## The equivalent "iloc" statement >>>df.iloc[:,[1,3,4]]
Selecting a Range of Data via slice :
Slice (written as start:stop:step) is a powerful technique that allows selecting a range of data. It is very useful when we want to select everything in between two items . As we have seen in above diagrams .
In the above code snippet we want to select row from index 1 to 4 and all the column in between "city" and "venue"
## The equivalent "iloc" statement >>>df.iloc[1:5,1:5]
In the above code snippet we use our STOP as "5" because iloc take index position value so it will start with "1" row and stop at "n-1" row i.e "5-1=4"
Selecting via conditions :
We often want to select data based on a condition. i.e we want 'player_of_match', 'team1', 'winner' of the match played in city = "Delhi"
df.loc[df['city'] == 'Delhi',['player_of_match','team1','winner',]]
df.iloc[df['city'] == 'Delhi',[3,6,10]]
For iloc, we will get a ValueError if pass the condition straight into the statement
We get the error because iloc cannot accept a boolean Series. It only accepts a boolean list. We can use the list() function to convert a Series into a boolean list.
df.iloc[list(df['city'] == 'Delhi')]
So in this article we see difference between loc and iloc. loc is label based and iloc is index based and we can not perform conditions directly to iloc for that we have to convert it into list. Both are majorly use in Slicing and Dicing of data.