top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Rolling Calculations using different functions in SQL

Rolling calculations are like a window through which we view our data. Instead of viewing the whole dataset at once, we look at subset of data at a time, like a fixed number of rows or a time period and moving that window along. Within each window or subset, we calculate aggregate values like sums, averages, or maximums. It is also called as rolling / sliding window calculations. This helps us see how these values change over time or across different parts of our dataset. It is very useful tool to understand trends, patterns, and fluctuations, making it easier to analyze and make data driven information.


Scenarios to use rolling Calculations in SQL:

  • Calculate the total sales per region ,day

  • Calculate the profit margin for each product

  • Find customers who have spent more than a certain amount

  • Calculate the number of days between order date and delivery date

  • Calculate a running total of sales, min/max sales each day, average sales each day

  • Classify products based on their sales performance

  • Calculate the running cumulative sales for each day using self-join

  • Calculate employee bonuses based on performance metrics


Objective: To find total sum of sales using different functions like windows function, self join, sub-query.


Data: 1. To Create a Table.


CREATE TABLE sales_history (

  order_id INT,

  sales_date DATE,

  salesperson VARCHAR(20),

  order_value INT

);


2.To insert the values to the table created.


INSERT INTO sales_history

(order_id, sales_date, salesperson, order_value) VALUES

(1, '20210801', 'John', 15),

(2, '20210801', 'Sarah', 8),

(3, '20210801', 'Sally', 19),

(4, '20210802', 'John', 2),

(5, '20210802', 'Mark', 18),

(6, '20210803', 'Sally', 3),

(7, '20210803', 'Mark', 21),

(8, '20210803', 'Sarah', 16),

(9, '20210804', 'John', 4);


3.To Display Sales_History_table


select * from sales_history


Lets Explore different ways of finding the running total of order value.


1.Rolling sum Calculation: The window function method is usually much faster than the self-join or subquery methods, especially for large datasets. However, it’s not supported by all SQL databases as the partition method may use some operators or aggregation functions


objective: To find Rolling sum using windows function.


SELECT

order_id,

sales_date,

salesperson,

order_value,

SUM(order_value)

OVER (order by sales_date

) AS running_total

FROM sales_history

ORDER BY order_id ASC;


Query Brief: The above query aggregates sum on 'Order_Value' field and orders the subset or window in asceding of 'sales_date' from 'Sales_History' Table and ensures over all rows are in ascending order of 'Order_ID'.


The output:Row 1 has 0 row before it with same 'Sales_date' , hence 'Running_Total' = 15(current row)+8+19. Row 2 has 1 row as same 'Sales_date' as Row 1, hence 'Running_Total' = 15+8(Current Row)+19. Row 3 has 2row as same 'Sales_date' as Row 1 and Row2, hence 'Running_Total' = 15+8+19(current Row).Row 4 has 0 row before it with same 'Sales_date' , hence 'Running_Total' =2(current row)+18. Running total is reset based on 'sales_date' Partition.

Query run time is 188.


2.Rolling sum using Self-Join :  Its very efficient for smaller data sets. it’s important to note that the self-join method can be inefficient for large datasets because it requires multiple passes over the data. For every row in the table, it needs to scan all rows again to calculate the running total. This can lead to performance issues when dealing with large volumes of data.


Objective: To find total Rolling sum of order value for each day using self-join.


SELECT

    s1.sales_date,

    s1.order_value,

    SUM(s2.order_value) AS running_total

FROM

    Sales_history s1

JOIN

    Sales_history s2 ON s1.Sales_date >= s2.Sales_date

GROUP BY

    s1.Sales_date, s1.order_value

ORDER BY

    s1.sales_date;


Query Brief: The above query aggregates total sum of 'Order_Value' field from 'Sales_history s2' table based on join  'Sales_history s1' table and  'Sales_history s2 table . Grouping the subset or window on 'Sales_date'  first, then 'order value' from  'Sales_history s1' table  and ensures over all rows are in ascending order of 'sales_date' from  'Sales_history s1' table.


The output:Row 1 calculates total sum of order of 'order_value' hence 'Running _total' = 19(current row)+8+15. Row 2 calculates total sum of order of 'order_value' hence 'Running _total' = 19+8(current row)+15.Row 3 calculates total sum of order of 'order_value' hence 'Running _total' = 19+8+15(Current value).Row 4 calculates total sum of order of 'order_value' hence 'Running _total' = 42(Previous day sale)+18(current row)+2.Running total is reset based on 'sales_date' Partition.

Query run time is 104.


3.Rolling sum using Sub-Query : For those who are new to SQL or not familiar with advanced features like window functions, using subqueries to calculate running totals may be more straightforward. The logic of a subquery is often easier to understand than a self-join or window function. If your database system doesn’t support window functions, or if you’re writing a query that needs to work across multiple database systems, subqueries can be a good choice. Subqueries allow you to control the order of operations more precisely than aggregate functions applied over a whole table.


Objective: To find total Rolling sum of order value for each day using sub-query.


SELECT

    s1.sales_date,

    s1.order_value,

    (Select SUM(s2.order_value) from sales_history s2

where s1.Sales_date >= s2.Sales_date) AS running_total

FROM

    Sales_history s1

ORDER BY

    s1.sales_date;


Query Brief: The above query aggregates total sum of 'Order_Value' field from 'Sales_history s2' table based on sub-query filter where 'Sales_date' from 'Sales_history s2' is less than or equal to 'sales_date from 'Sales_history s1'  and ensures over all rows are in ascending order of 'sales_date' from  'Sales_history s1' table.


The output:Row 1 calculates total sum of order of 'order_value' hence 'Running _total' = 15(current row)+8+19. Row 2 calculates totalsum of order of 'order_value' hence 'Running _total' = 15+8(current row)+19.Row 3 calculates total sum of order of 'order_value' hence 'Running _total' = 15+8+19(Current value).Row 4 calculates total sum of order of 'order_value' hence 'Running _total' = 42(previous day sale)+2(current row)+18.Running total is reset based on 'sales_date' Partition.

Query run time is 157.


Different SQL functions have varying run times depending on the size of the data sets. In the example above, calculating the cumulative total took different amounts of time with each method:

  • The window function took 188 milliseconds.

  • The self-join took 104 milliseconds.

  • The sub-query took 157 milliseconds.


Therefore, the self-join method was the most optimized for this example, as it had the shortest run time.


28 views0 comments

Kommentare

Mit 0 von 5 Sternen bewertet.
Noch keine Ratings

Rating hinzufügen
bottom of page