top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Basics of SQL Window Functions: A Beginner's Guide

Windows Functions are one of the most powerful features of SQL which is used widely in the field of data analytics to perform operations which cannot be done easily with group by and sub-query. This concept can be a little intimidating at first but once you get a hold of the core concept, they are most fun to work with! 

If you ever wondered

   |  When to use window functions

   |   Window functions are confusing

then this post is for you. We will go over key concepts of window functions and it can help significantly to level up your SQL skills. 


What are Window Functions?


The basic definition of window function is that it performs a calculation across a set of table rows that are somehow related to the current row. It applies ranking functions over a particular window( set of rows).

This type of calculation sounds similar to that performed by aggregate functions like group by. However, the difference between the two is that the window function don't cause the row to become grouped into a single output row and the rows retain their identities.


Lets understand this with a simple example, consider the following data-

Now lets find the average price in each category using group by


SELECT category_name, ROUND(AVG(amount),2)
FROM product p
JOIN prod_category pg on pg.category_id = p.category_id
GROUP BY category_name

We get the following output -


Notice that the AVG() function reduces the number of rows returned by the query.

Now lets use a Window function to implement the same -


SELECT category_name,amount, AVG(amount)  OVER(PARTITION BY category_name)
FROM product p join prod_category pg on p.category_id = pg.category_id


In this query, the AVG() function works as a window function that operates on a set of rows specified by the OVER clause. Each set of rows is called a window. So we get category_name, amount and average amount of each category in separate rows. It helps us to analyze average amount across each product in respective categories. Window functions helps us to see the output in current rows and apply aggregate values accordingly.


Basic Syntax for Window function-

window_function_name(arg1, arg2,..) OVER (

[ PARTITION BY exp1,exp2,.. ]

[ ORDER BY sort_expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ])

[ frame_clause ]


In this syntax-

window_function_name(arg1, arg2,..)  - Name of the window function along with arguments . Some window functions do not accept any argument.

PARTITION BY - Partitions rows into set of rows.

ORDER BY - determines the order in which the rows of a partition are processed by the window function.

The ORDER BY clause uses the NULLS FIRST or NULLS LAST option to specify whether null able values should be first or last in the result set. The default is NULLS LAST option.

Frame_clause - It defines a subset of rows in the current partition to which the window function is applied. 


Few important Window Functions-

ROW_NUMBER() - It assigns a unique sequential number to each row within a partition.The count starts from 1. 


RANK()- It assigns rank to all the rows within an ordered partition. Rank is assigned such that rank 1 given to the first row and rows having the same value are assigned the same rank. For the next rank after two same rank values, one rank value will be skipped.

SELECT product_id,product_name,category_name,amount,RANK() OVER (PARTITION BY category_name ORDER BY amount DESC) rank_by_amount--,DENSE_RANK () OVER ( ORDER BY amount DESC)  dense_rank_by_amount
FROM product
inner join prod_category USING(category_id)

We get the following output-

For the Category Dairy after two consecutive same ranks one value was skipped and so for product_id 1 the rank was assigned as 3.


DENSE RANK()- Just like RANK() function  assigns rank to each row within an ordered partition. The assigned rank starts from 1, the only difference between dense_rank and rank is that in dense_rank() after two same ranks for the next rank consecutive integer is used and no rank is skipped. 

SELECT product_id,product_name,category_name,amount,DENSE_RANK() OVER (PARTITION BY category_name ORDER BY amount DESC) dense_rank_by_amount--,DENSE_RANK () OVER ( ORDER BY amount DESC)  dense_rank_by_amount
FROM product
inner join prod_category USING(category_id)

Results look like this -

In case of dense_rank after two consecutive same ranks the for the next no values were skipped and product_id 1 was given rank 2.


LEAD() - The lead() function can access data from the next row. It is generally used for comparing the value of the current row with the value of the next row following the current row.


LAG() - The lag() function can access data from the previous row.  It is generally used for comparing the value of the current row with the value of the previous row.


46 views0 comments

Commentaires

Noté 0 étoile sur 5.
Pas encore de note

Ajouter une note
bottom of page