A Window function performs a calculation across a set of table rows that are somehow related to the current row. These are also known as analytical functions, similar to the analytical function's calculations are performed on all the table rows in a set. Set of rows on which the window function operates is known as window.
Aggregate function returns a single row after aggerating the data from a set of rows. whereas window function which operates on a set of rows does not reduce the number of rows returned by the query.
Syntax for window function
Window functions are divided into three different categories:
Regular Aggregate functions when called with OVER clause act as Window functions. Rows retain their identity and also show an aggregated value for each row. Ranking window functions will rank the values of a specified field and categorize them according to their rank. Whereas value window functions copy values within the window from other rows to other rows.
Use cases for windows function
Window functions increase the efficiency and reduce the complexity of queries that analyze partitions (windows) of a data set by providing an alternative to more complex SQL concepts.
. The most common use of RANKING functions is to find the top (N) records based on a certain value. For example, Top 10 highest paid employees, Top 10 ranked students, Top 50 largest orders etc.
Accessing results from another row within specific windows (e.g., displaying a value associated with highest salary within a partition, displaying sales from a previous row, etc.)
Aggregation within a specific window (e.g., sales totals, running sales totals, running sales averages, etc.)
Following are the windows functions in postgress sql:
The function returns the cumulative distribution of a value within a set of values. In other words, is used to query for the relative position as a value within a set of given values.
Sometimes, you may want to create a report that shows the top or bottom x% values from a data set, for example, top 1% of products by revenue.
This returns a double precision value ≥ 0 and ≤ 1.
Row_Number: The function assigns a sequential number to each row in each partition.
Rank: The function assigns ranking within an ordered partition. The function assigns the same rank for the rows having the same values, with the next ranking(s) skipped.
Dense_Rank : The function assigns a rank to each row within an ordered partition, the same ranks are assigned to multiple rows and no ranks are skipped.
Percent_Rank : The function is like the cume_dist() function. The Percent_rank() function evaluates the relative standing of a value within a set of values.
0 < PERCENT_RANK() <= 1
First_Value: The function returns first row or value in a sorted partition of a result set.
Lag: Return a value evaluated at the row that is at a specified physical offset row before the current row within the partition.
Last_Value : Return a value evaluated against the last row within its partition.
Lead : Return a value evaluated at the row that is offset rows after the current row within the partition.
NTile: Divide rows in a partition as equally as possible and assign each row an integer starting from 1 to the argument value.
helps you to identify what percentile (or quartile, or any other subdivision) a given row falls into.
NTH_Value: Return a value evaluated against the nth row in an ordered partition.
We discuss the few of the above functions as follows. We are using below employees table to describe as an example table.
we will see the Few Aggregate functions with employees table example:
Rows retain their identity and also show an aggregated value for each row. In the example above it aggregates the data for each department and shows the sum of total for each department marked as sum, maximum value of salary in each department as max and minimum value of salary in each department marked as min in the new columns respectively.
Now we will see the difference between most commonly used ranking functions Row_number, Rank, Dense_rank with the example.
So, we can see that as mentioned in the definition of ROW_NUMBER the row numbers are sequential integers within each partition. Also, we can see difference between rank and dense rank that in dense rank there is a rank skipped if the two rows have same value in rank and while there is no rank skipped in DENSE_RANK value after repeated rank.
we will see the Few Value functions with employees table example:
The output is:
The LAG function allows to access data from the previous row in the same result set where as LEAD function allows to access data from next row in the same result without use of any SQL joins. You can see in above example using LAG , LEAD function to check the current employee salary which is lower or higher than the preivous employee.
PostgreSQL window functions are used to compare contrast data, and it is a key to analytic and various use cases in PostgreSQL. Windows functions are aggregate functions(SUM(), COUNT(), MIN(), etc.), ranking functions (ROW_NUMBER(), RANK(), and DENSE_RANK(),etc.) and value functions (LEAD() and LAG(),etc.) which returns each row of the table instead of a single row output and is used to compare the values between current rows and related to the current rows from all values of a table are explained.