Rolling calculations are like a window through which we view our data. Instead of viewing the whole dataset at once, we look at subset of data at a time, like a fixed number of rows or a time period and moving that window along. Within each window or subset, we calculate aggregate values like sums, averages, or maximums. It is also called as rolling / sliding window calculations. This helps us see how these values change over time or across different parts of our dataset. It is very useful tool to understand trends, patterns, and fluctuations, making it easier to analyze and make data driven information.
Scenarios to use rolling Calculations in SQL:
Calculate the total sales per region ,day
Calculate the profit margin for each product
Find customers who have spent more than a certain amount
Calculate the number of days between order date and delivery date
Calculate a running total of sales, min/max sales each day, average sales each day
Classify products based on their sales performance
Calculate the running cumulative sales for each day using self-join
Calculate employee bonuses based on performance metrics
Objective: To find Rolling cummulative/Running Total, rolling sum each day, rolling sum by salesperson, rolling average sales, minimum sales each day, maximum sales each day, standard deviation each day, rolling count.
Data: 1. To Create a Table.
CREATE TABLE sales_history (
order_id INT,
sales_date DATE,
salesperson VARCHAR(20),
order_value INT
);
2.To insert the values to the table created.
INSERT INTO sales_history
(order_id, sales_date, salesperson, order_value) VALUES
(1, '20210801', 'John', 15),
(2, '20210801', 'Sarah', 8),
(3, '20210801', 'Sally', 19),
(4, '20210802', 'John', 2),
(5, '20210802', 'Mark', 18),
(6, '20210803', 'Sally', 3),
(7, '20210803', 'Mark', 21),
(8, '20210803', 'Sarah', 16),
(9, '20210804', 'John', 4);
3.To Display Sales_History_table
select * from sales_history
Simple Rolling Calculations: is to track the running total of a value over a moving window of data. It's a running tally or adding up numbers as we go along. It helps us see how the total accumulates or changes over time.
Objective: To find the rolling cumulative or rolling total
SELECT
order_id,
sales_date,
salesperson,
order_value,
SUM(order_value)
OVER (ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS running_total
FROM sales_history
ORDER BY order_id ASC;
Query Brief:The above query aggregates sum on 'Order_Value' field based on the the current row and the two rows preceding it from 'Sales_History' Table and ensures the rows are in ascending order of 'Order_ID'.
The Ouput: Row 1 has 0 preceding row, hence 'Running_Total' is 15. Row 2 has 1 preceding row, hence 'Running_Total' = 15+8(Current Row). Row 3 has 2 preceding rows, hence 'Running_Total' = 15+8+19(current Row) and so on.
Rolling Sum Calculation: is adding up numbers as you go along, but instead of adding everything at once, you look at small chunks of data at a time. It helps you understand trends and patterns, like whether sales are increasing over months or if expenses are fluctuating from week to week. It helps in analyzing changes and make informed decisions based on cumulative data.
Objective: 1. To find Rolling sum for each day using windows function.
SELECT
order_id,
sales_date,
salesperson,
order_value,
SUM(order_value)
OVER (
PARTITION BY Sales_date
ORDER BY order_id ASC) AS running_total
FROM sales_history
ORDER BY order_id ASC;
Query Brief: The above query aggregates sum on 'Order_Value' field based on 'Sales_date' Partition and orders the subset or window in asceding of 'Order_id' from 'Sales_History' Table and ensures over all rows are in ascending order of 'Order_ID'.
The output: Row 1 has 0 row before it with same 'Sales_date' , hence 'Running_Total' is 15. Row 2 has 1 row as same 'Sales_date' as Row 1, hence 'Running_Total' = 15+8(Current Row). Row 3 has 2row as same 'Sales_date' as Row 1 and Row2, hence 'Running_Total' = 15+8+19(current Row).Row 4 has 0 row before it with same 'Sales_date' , hence 'Running_Total' is 2. Running total is reset based on 'sales_date' Partition.
2.To find Rolling sum by Sales Person using windows function
SELECT
order_id,
sales_date,
salesperson,
order_value,
sum(order_value) OVER (
PARTITION BY salesperson
ORDER BY order_id ASC) AS running_total
FROM sales_history
ORDER BY salesperson ASC, order_id ASC;
Query Brief:Â The above query aggregates sum on 'Order_Value' field based on 'Salesperson'Â Partition and orders the subset or window in ascending order of 'order_id' from 'Sales_History' Table and ensures the overall rows are in ascending order of 'salesperson' first, then in ascending order of 'Order_ID'.
The output:Row 1 has 0 row before it with same 'Salesperson' , hence 'Running_Total' is 15. Row 2 has 1 row as same 'Salesperson' as Row 1, hence 'Running_Total' = 15+2(Current Row). Row 3 has 2row as same 'Salesperson' as Row 1 and Row2, hence 'Running_Total' = 15+2+4(current Row).Row 4 has 0 row before it with same 'Salesperson' , hence 'Running_Total' is 18. Running total is reset based on 'salesperson' Partition.
Rolling Average Calculation: is to smooth out fluctuations in data over time. It is easier to identify trend or pattern without getting distracted by individual spikes or dips. For example, if we're tracking daily temperatures over a month, a rolling average helps us focus on whether it's getting warmer or colder overall, rather than being influenced by sudden hot or cold days.
Objective: To find Rolling average each day using windows function.
SELECT
order_id,
sales_date,
salesperson,
order_value,
avg(order_value)
OVER (
PARTITION BY Sales_date
ORDER BY sales_date ASC) AS running_total
FROM sales_history
ORDER BY order_id ASC;
Query Brief:Â The above query aggregates average on 'Order_Value' field based on 'Sales_date'Â Partition and orders the subset or window in ascending order of 'Sales_date' from 'Sales_History' Table and ensures overall rows are in ascending order of 'Order_ID'.
The output:Row 1 calculates average of same 'Sales_date' hence 'Running _total' = 15(current row)+8+19/3.Row 2 calculates average of same 'Sales_date' hence 'Running _total' = 15+8(current row)+19/3.Row 3 calculates average of same 'Sales_date' hence 'Running _total' = 15+8+19(current row)/3.Row 4 calculates average of same 'Sales_date' hence 'Running total' = 2(current row)+18/2.Running total is reset based on 'sales_date' Partition.
Rolling Minimum Calculation: is to track the smallest value within a moving window of data. It's like scanning through a series of numbers and noting the lowest value in each group of a certain size. we can find minimum sales of each day.
Objective: To find Rolling minimum value of order for each day using windows function.
SELECT
order_id,
sales_date,
salesperson,
order_value,
Min(order_value)
OVER (
PARTITION BY Sales_date
ORDER BY Sales_date ASC) AS running_total
FROM sales_history
ORDER BY order_id ASC;
Query Brief:Â The above query aggregates Minimum value of order on 'Order_Value' field based on 'Sales_date'Â Partition and orders the subset or window in ascending order of 'Sales_date' from 'Sales_History' Table and ensures overall rows are in ascending order of 'Order_id'.
The output: Row 1 calculates minimum value of order of same 'Sales_date' hence 'Running _total' = 8 minimum of (15,8,19).Row 2 calculates minimum value of order of same 'Sales_date' hence Running _total' = 8 minimum of (15,8,19).Row 3 calculates minimum value of order of same 'Sales_date' hence 'Running _total' = 8 minimum of (15,8,19).Row 4 calculates minimum value of order of same 'Sales_date' hence 'Running total' = 2 minimum of (2,18) .Running total is reset based on 'sales_date' Partition.
Rolling Maximum Calculation: is to track the highest value within a moving window of data. It's like scanning through a series of numbers and noting the highest value in each group of a certain size. we can find maximum sales of each day.
Objective:To find Rolling maximum value of order for each day using windows function.
SELECT
order_id,
sales_date,
salesperson,
order_value,
Max(order_value)
OVER (
PARTITION BY Sales_date
ORDER BY Sales_date ASC) AS running_total
FROM sales_history
ORDER BY sales_date ASC;
Query Brief:Â The above query aggregates Maximum value of order on 'Order_Value' field based on 'Sales_date'Â Partition and orders the subset or window in ascending order of 'Sales_date' from 'Sales_History' Table and ensures overall rows are in ascending order of 'Sales_date'.
The output: Row 1 calculates maximum value of order of same 'Sales_date' hence 'Running _total' = 19 maximum of (15,8,19).Row 2 calculates maximum value of order of same 'Sales_date' hence Running _total' = 19 maximum of (15,8,19).Row 3 calculates maximum value of order of same 'Sales_date' hence 'Running _total' = 19 maximum of (15,8,19).Row 4 calculates maximum value of order of same 'Sales_date' hence 'Running total' = 18 maximum of (2,18) .Running total is reset based on 'sales_date' Partition.
Rolling Standard Deviation Calculation:  is to measure how spread out are the variable data is within a moving window of time or a set of data points. It's like looking at how much data points deviate from the average value over a period. It's very useful in monitoring stock prices, a rolling standard deviation helps us measure how much the prices are fluctuating day by day.
Objective: To find Rolling Standard Deviation for each day using windows function.
SELECT
order_id,
sales_date,
salesperson,
order_value,
stddev(order_value)
OVER (
PARTITION BY Sales_date
ORDER BY Sales_date ASC) AS running_total
FROM sales_history
ORDER BY sales_date ASC;
Query Brief:Â The above query aggregates standard deviation on 'Order_Value' field based on 'Sales_date'Â Partition and orders the subset or window in ascending order of 'Sales_date' from 'Sales_History' Table and ensures overall rows are in ascending order of 'Sales_date'.
The output: Row 1 calculates SD of same 'Sales_date' hence 'Running _total' = 5.67. Row 2 calculates SD of same 'Sales_date' hence Running _total' =5.67. Row 3 calculates SD of same 'Sales_date' hence 'Running _total' = 5.67. Row 4 calculates SD of same 'Sales_date' hence 'Running total' = 11.31. Running total is reset based on 'sales_date' Partition.
Rolling Count Calculation: counts how many times something happens within a specific period as you move through your dataset. For example, number of sales in each day.
Objective: To find Rolling count of order for each day using windows function.
SELECT
order_id,
sales_date,
salesperson,
order_value,
count(order_value)
OVER (
PARTITION BY Sales_date
ORDER BY Sales_date ASC) AS running_total
FROM sales_history
ORDER BY sales_date ASC;
Query Brief:Â The above query aggregates count of order on 'Order_Value' field based on 'Sales_date'Â Partition and orders the subset or window in ascending order of 'Sales_date' from 'Sales_History' Table and ensures overall rows are in ascending order of 'Sales_date'.
The output: Row 1 calculates count of order of same 'Sales_date' hence 'Running _total' = 3. Row 2 calculates count of order of same 'Sales_date' hence 'Running _total' = 3.Row 3 calculates count of order of same 'Sales_date' hence 'Running _total' = 3.Row 4 calculates count of order of same 'Sales_date' hence 'Running _total' = 2.Running total is reset based on 'sales_date' Partition.
Rolling calculations like rolling sums, averages, and other calculations, are powerful tools in SQL for understanding trends and patterns in time-series data. It helps us see the bigger picture by smoothing out fluctuations and highlighting long-term trends. With rolling calculation, we can make better predictions, monitor performance, and identify areas for improvement in our data analysis. These insights enable us to make informed decisions, drive continuous improvement, and stay ahead in our dynamic and competitive environment.
Comments