-Window functions process rows one by one and produce one output value for each input row.
-Unlike row-level functions, window functions can split input data into partitions (or “windows” or “window frames” — hence the name) where a partition includes multiple rows that are somehow related to the currently processed row.
BASIC SYNTAX USING WINDOWS FUNCTION
Types of Window Functions
Aggregate Window Functions
These are similar to regular aggregate functions but do not reduce the number of rows returned. Examples include SUM(), AVG(), MIN(), MAX(), COUNT().
SUM(): This function returns the sum of a numeric column.
AVG(): This function returns the average of a numeric column.
COUNT(): This function returns the number of rows that match a specified criterion.
MIN(): This function returns the smallest value of the selected column.
MAX(): This function returns the largest value of the selected column.
Ranking Window Functions:
These functions assign a unique rank to each row within a partition of a result set (or the overall data set). Examples are ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE().
RANK(): This function assigns a unique rank to each distinct row within the partition of a result set. The ranks are assigned in the order specified in the ORDER BY clause of the OVER() clause. If two or more rows tie for a rank, each tied row receives the same rank, and the next rank(s) are skipped.
Take a look at the various Ranking functions side-by-side below to see how they might look in code.
RANK example:
Joining the film and film_actor table will give actor and total number of films acted. Using the total films acted we are going to use RANK() and DENSE_RANK()
Film Table: film_id is the primary key
Film_actor Table: film_id is the primary key
select film_actor.actor_id,count(*) as total_film_acted,rank() over(order by count(film.film_id) desc)from filmjoinfilm_actoron film.film_id=film_actor.film_idgroup by film_actor.actor_id
Refer the screenshot below highlighted in Green are ranked correctly. However for actor_id=150 we need this to be ranked as 8. As per rank function for tie result it will not use consequent numbers. Hence, you are seeing 13 for actor_id=150. To overcome this we need to use DENSE_RANK()
DENSE_RANK(): This function works similarly to RANK(), but when two or more rows tie for a rank, the next rank is not skipped. So if you have three items at rank 2, the next rank listed would be 3.
ROW_NUMBER(): This function assigns a unique row number to each row within the partition, regardless of duplicates. If there are duplicate values in the ordered set, it will still assign different row numbers to each row.
The NTILE() function is used to divide an ordered partition into a specified number of groups, or "tiles", and assign a group number to each row in the partition. This can be useful for things like dividing a dataset into quartiles, deciles, or any other set of evenly sized groups.
Take a look at the various Ranking functions side-by-side below to see how they might look in code.
Value Window Functions
These functions return specific values from each partition. These functions provide a way to access specific data from a partition, allowing you to compare or calculate differences between values in a result set.
Examples are FIRST_VALUE(), LAST_VALUE(), LEAD(), LAG().
FIRST_VALUE(): This function returns the first value in an ordered set of values from a partition. For example, you could use this function to find the initial sale made by a salesperson.
LAST_VALUE(): This function returns the last value in an ordered set of values from a partition. It can be used to find the most recent sale amount for a particular product.
LEAD(): This function allows you to access data from subsequent rows in the same result set, providing a way to compare a current value with values from following rows. It’s useful for calculating the difference in sales amounts between two consecutive days.
LAG(): Similar to LEAD(), the LAG() function lets you access data from previous rows in the result set, without the need for a self-join. This can be handy for comparing current data with historical data. These functions are powerful tools for data analysis, enabling you to navigate through your data and gain insights from specific data points in relation to others.
Window Frame Specification
This concept refers to the subset of rows used to perform the calculations for a specific row. The window frame can be specified using the ROWS or RANGE clause, and it can be unbounded (considering all rows) or limited to a specific range.
ROWS: Defines the window frame in terms of physical rows. You can specify a fixed number of rows, or use UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING to include all rows.
RANGE: Defines the window frame based on logical groups of rows. Similar to ROWS, you can specify a range or use UNBOUNDED options.
Optimizing Window Functions
Window functions can often result in slow queries due to the fact they perform calculations across multiple rows. Here are some tips to optimize your window functions:
Reduce the number of rows: If you can, filter your data before applying the window function. The fewer rows the function has to work with, the faster your query will run. This is the best way to make sure that you can work more efficiently to debug and run your code, before releasing the beast on the full breadth of your data.
Use appropriate indexing: If you’re partitioning or ordering your data, ensure that appropriate indexes exist for those columns. This can significantly speed up the performance of your window function.
Avoid complex ordering: If possible, try to avoid using multiple columns in your ORDER BY clause within the window function. Each additional column can increase the computation time.
Limit the window frame: By default, window functions consider all rows in the partition. If you don’t need to consider all rows, use the ROWS or RANGE clause to limit the window frame.