Resampling time series data in Pandas
What is Resampling?
It is related to time series data and what we do
Change the frequency at which time series data is reported. E.g. Changing yearly value into monthly or changing yearly data into daily. So, changing the level in the hierarchy.
This can be used to derive more reliable and consistent data. Resampling is particularly useful when dealing with time series data that needs to be analyzed or visualized at different granularities.
DOWNSAMPLING VS UPSAMPLING
DOWNSAMPLING: Decrease frequencies
e.g. Second to hour
month to quarter
So, to decrease the value of frequency we use DOWNSAMPLING.
UPSAMPLING: Increase frequencies
e.g. Month to day
Hour to second
So, to get more observation we use UPSAMPLING.
HOW CAN WE DO THIS PANDAS
1. Change the index to time series data
2. Upsample with interpolation
3. Downsample with aggregation
Let’s understand it by an example. So, I will use Jupyter Notebook for this
First we, import all the required libraries then the data.
Now, we will change the index to time series data
Here, you can see date column is working as an index.
Now. Let’s do the upsampling.
So, we have the data here for everyday. Let’s change into hour
Here I used 'H' for hour. You can read more about this in Pandas documentation like H for hour, 'D' for day.
ffill is forward fill
It represent on 2000-01-01 00:00:00, the price is 1394.46
And then on 2000-01-01 01:00:00, the price is 1394.46 and so on.
Now, Let’s do the down sampling
Now, it is giving result month wise specifically on the last date, so, may be it is a closing price.
We can visualize it
So, here we can see the average price for the month plotted.
Maybe, if we want to see the quarter price
So, we can plot this as well
Pandas was originally conceived for the time series (“Panel data”). It can help with joins too.
Resampling in pandas is a powerful feature that allows you to change the frequency of your time series data. It can be used to aggregate data, fill missing values, or perform various operations at different time intervals.
Resampling is particularly useful when dealing with time series data that needs to be analyzed or visualized at different granularities.
This is what resampling. We can use different ways as well referring pandas document for example: biweekly, twice a month etc. Some of them are given below:
B - business day frequency
C- custom business day frequency (experimental)
D- calendar day frequency
W- weekly frequency
M- month end frequency
SM- semi-month end frequency (15th and end of month)
BM- business month end frequency
CBM- custom business month end frequency
MS- month start frequency
SMS- semi-month start frequency (1st and 15th)
BMS- business month start frequency
CBMS- custom business month start frequency
Q- quarter end frequency
BQ- business quarter endfrequency
QS- quarter start frequency
BQS- business quarter start frequency
A-year end frequency
BA, BY- business year end frequency
AS, YS- year start frequency
BAS, BYS- business year start frequency
BH- business hour frequency
H- hourly frequency
T, min- minutely frequency
S- secondly frequency
L, ms- milliseconds
U, us- microseconds
N- nanoseconds
Also, we can use other aggregate functions as well. For example: sum, max, min, standard deviation.
sum of the price month wise:
Maximum price in each month:
Minimum price in each month:
Standard deviation of price month wise:
This is all about resampling. I hope it help you to understand the concept of Resampling.
Nicely written in simple and lucid language.