Pandas offers a lot of functions to play around with dataframes. In its current stable version of 2.2, there have been many enhancements to its native functions – many of them requested by users for sometime now.
Consider merging dataframes. Two or more dataframes can be merged in the SQL-standard ways like left, right, inner, outer or cross. Merging dataframes in Pandas is a breeze but there’s more that can be done.
Inner-joining these 2 dataframes will yield:
How to know which observations have counts very close to each other? For this, merge_asof helps. But before that, the counts need to be sorted.
The parameter “direction” tells how to do the merge. Three choices are available:“backward” – look backward to check for the tolerated distance
“forward” – look forward to check for the tolerated distance
“nearest” – get the closest by the tolerated distance, whether forward or backward
Let’s add a date column to the dataframes. The “date_range” function comes in handy. Specify the start and the end dates, provide the frequency (say every 15 hours) or give the periods.
Now, say the observations recorded on business days or weekdays only matter. To get the business days, Pandas provides a bdate_range function similar in structure to the date_range function.
For example, the weekend of Feb 10th and 11th are excluded here.
Now, to get only those observations that have been recorded on any of these days only.
Often, a column needs to be done away with for
which ”drop” is a commonly used function.
A simpler way is to just “pop” it.
Now, let’s add some groupings to the data.
Say, we need observations whose group means are less than a number. Then we need to group, take the mean, save it as a dataframe and then check for those observations whose mean is less than the number.
Instead, we can use “filter”.
Lambda functions are super-useful as anonymous functions that can be used freely anywhere.
Last but not least, often we got to check if the dataframes are equal. And in the case above, if we have measured the same number of observations in the 2 dataframes.
This is not possible.
Instead, we can use the testing functions in Pandas.
Comments