top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

What is Resampling?

Resampling time series data in Pandas

What is Resampling?

It is related to time series data and what we do

Change the frequency at which time series data is reported. E.g.  Changing yearly value into monthly or changing yearly data into daily. So, changing the level in the hierarchy.

This can be used to derive more reliable and consistent data. Resampling is particularly useful when dealing with time series data that needs to be analyzed or visualized at different granularities.

DOWNSAMPLING VS UPSAMPLING

DOWNSAMPLING: Decrease frequencies

e.g. Second to hour

        month to quarter

So, to decrease the value of frequency we use DOWNSAMPLING.

UPSAMPLING: Increase frequencies

e.g. Month to day

       Hour to second

So, to get more observation we use UPSAMPLING.

HOW CAN WE DO THIS PANDAS

1.      Change the index to time series data

2.      Upsample with interpolation

3.      Downsample with aggregation

Let’s understand it by an example. So, I will use Jupyter Notebook for this

First we, import all the required libraries then the data.

Now, we will change the index to time series data

Here, you can see date column is working as an index.

Now. Let’s do the upsampling.

So, we have the data here for everyday. Let’s change into hour

Here I used 'H' for hour. You can read more about this in Pandas documentation like H for hour, 'D' for day.

ffill is forward fill

It represent on 2000-01-01 00:00:00, the price is 1394.46 

And then on 2000-01-01 01:00:00, the price is 1394.46 and so on.

Now, Let’s do the down sampling

Now, it is giving result month wise specifically on the last date, so, may be it is a closing price.

We can visualize it

So, here we can see the average price for the month plotted.

Maybe, if we want to see the quarter price

So, we can plot this as well

Pandas was originally conceived for the time series (“Panel data”). It can help with joins too.

Resampling in pandas is a powerful feature that allows you to change the frequency of your time series data. It can be used to aggregate data, fill missing values, or perform various operations at different time intervals.

 Resampling is particularly useful when dealing with time series data that needs to be analyzed or visualized at different granularities.

This is what resampling. We can use different ways as well referring pandas document for example: biweekly, twice a month etc. Some of them are given below:


B - business day frequency

 C-  custom business day frequency (experimental)

 D- calendar day frequency

W- weekly frequency

M-  month end frequency

SM- semi-month end frequency (15th and end of month)

BM- business month end frequency

CBM- custom business month end frequency

MS- month start frequency

SMS- semi-month start frequency (1st and 15th)

BMS- business month start frequency

CBMS- custom business month start frequency

Q-  quarter end frequency

BQ- business quarter endfrequency

QS- quarter start frequency

BQS- business quarter start frequency

A-year end frequency

BA, BY- business year end frequency

AS, YS- year start frequency

BAS, BYS- business year start frequency

BH- business hour frequency

H- hourly frequency

T, min- minutely frequency

S- secondly frequency

L, ms- milliseconds

U, us- microseconds

N- nanoseconds

 

  Also, we can use other aggregate functions as well. For example: sum, max, min, standard deviation.


sum of the price month wise:

Maximum price in each month:

Minimum price in each month:

Standard deviation of price month wise:


This is all about resampling. I hope it help you to understand the concept of Resampling.

28 views1 comment

1 Comment

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Guest
Jul 05
Rated 4 out of 5 stars.

Nicely written in simple and lucid language.

Like
bottom of page