What is window function?
Window function also referred as Analytic functions in some of the RDBMS. If you are planning to attend any SQL interview, then you are almost certain that there will be at least one question related to window function.
If we know how to write window function, we should be able to solve some of the most complex queries pretty easily.
For example, if there is a question asking about to display top 3 employees with maximum salary in each department, then window function is very helpful to write this type of queries.
The syntax for writing window function is slightly different from any other SQL queries. So now we can see how to write the window function and the different types of window functions in PostgreSQL.
Different types of window functions are:
RANK(): Rank the current row within its partition with gaps.
DENSE_RANK(): Rank the current row within its partition without gaps
.
ROW_NUMBER(): Number the current row within its partition starting from1
.
LEAD(): Return a value evaluated at the row that is offset rows after the current row within the partition.
LAG(): Return a value evaluated at the row that is at a physical offset row before the current row within the partition.
In this blog we can also see how we can use aggregate function as analytical function.
To better understand the window function the best way is first going through the aggregate function.
For this I created an employee table with four columns as emp_id, emp_name, dept_name, salary.
I created total 15 records within the table employee. If we want to find the maximum salary in this table, we write query as
Let’s say if we want to get maximum salary for each department, we will use group by clause as shown below.
We are able to extract the maximum salary with department wise in employee table.
Along with this if we want to display the maximum salary for each department with all the details from employee table, then we would not be able to do that just by using aggregate function or group by clause maybe we can use a with clause or write sub queries to do that. But the best way to form this kind of query is using window function. Let me show how we can write query using window function,
I wrote a query using a clause called over and gave an alias name ‘e’ to employee table. So that we could extract all the records and columns from employee table by adding a column ‘max_salary’ for each record.
Here I used aggregate function max() but also used over clause so PostgreSQL wouldn’t treat max() as aggregate along with over clause it treats as window function.
We are able to mention columns in the over clause what we need to extract also. Let’s see this with an example.
If we see this table, we came to know that we partitioned by dept_name , so that for each value in dept_name PostgreSQL will create a window and then it’s going to apply max() to each of those windows. If we consider Admin department the maximum salary was 8000, so it displays each admin department window the maximum salary. Eventually we got maximum salary for other departments also.
Here instead of max() we can use any other aggregate functions like min(), avg(), count().
There are also other window functions, let’s have a look at them.
ROW_NUMBER():
As the name suggests ROW_NUMBER() is going to assign a unique value to each of the row in the table.
The query using ROW_NUMBER is as shown below.
In the above example I took ‘e’ as alias name for employee table and rownum for new column name.
For each row in the table numbers are allocated uniquely. In the query I didn’t pass any columns in over clause, so SQL treats all the rows in the table as single window and displays row number for each record.
Now if we want to assign a row number based on different departments, then what we have to do is
In this query we used dept_name in over clause, PostgreSQL treated each department as window. So, row number assigned based on each department as shown in the above table.
What is the actual use of the row_number() is to showcase that let’s say if we had a requirement where we wanted to fetch the first two employees that joined in the company in each department. In the above table if we observe emp_id ,it came to known that they are not in the sorted order. We can sort the records using order by clause. Now we write the query to fetch first two records in each department.
We sorted the rows department wise and fetch the first two rows in each department.
RANK():
The RANK() assigns rank to each row within an ordered partition. We can see the RANK() with an example.
Let’s say we want to display the top two employee details with maxim salary in each of the department, what we have to do each ranking the employees based on salary then fetch the top two employees in each department.
The query sorted the salaries in descending order and fetched the top two salaries employee details from the employee table. So, this way we can use RANK() .
DENSE_RANK():
This function is same as RANK(), except one difference. We can see that with an example,
When we see the above table, we could see sal_rank and densal_rank column using RANK() and DENSE_RANK(). In Finance department we saw ranks like 3,3, and 5. Rank 3 repeats two times and the next rank jumped to 5 by skipping 4. But for DENSE_RANK() function rank 3 repeats two times and the next rank is 4. So, RANK() skips a value for each duplicate whereas DENSE_RANK() will not skip a value.
LAG():
The LAG() function has the ability to access data from previous row. If we want to know whether the salary of a current employee is higher or lower than the salary of the previous employee, we can use the LAG().
select e.*,
lag(salary) over(partition by dept_name order by emp_id desc) as prev_emp_sal
from employee e
When we see the above table the row with emp_id 114 is first one and there is no previous record. So, the prev_emp_sal cell shows null for emp_id 114.
For the remaining records it compared with the previous record and fill with the salary which is maximum in two records.
We can pass arguments to the LAG().
select e.*,
lag(salary,2,0) over(partition by dept_name order by emp_id desc) as prev_emp_sal
from employee e
Here the argument 2 shows the current row will compare with the previous two records and displays maximum salary. ‘0’ is the default value which will display if there is no previous row. The result is as shown below.
LEAD():
The LEAD() is opposite to LAG() , as LAG() compared the current row with previous one whereas LEAD() compares with the next row. Let’s see how it works.
select e.*,
lag(salary) over(partition by dept_name order by emp_id desc) as prev_emp_sal,
lead(salary) over(partition by dept_name order by emp_id desc) as next_emp_sal
from employee e
Here we got two columns prev_emp_sal and next_emp_sal using LAG() and LEAD().
LEAD() compares with the next employee salary and replace with it. Also LAG() displays the highest salary whereas LEAD() displays the next employee salary.
Now we will write a query to see the salary is higher or lower or equal when compared to the previous salary.
select e.*,
lag(salary) over(partition by dept_name order by emp_id desc) as prev_emp_sal,
case when e.salary>lag(salary) over(partition by dept_name order by emp_id)
then 'Higher than previous employee'
when e.salary<lag(salary) over(partition by dept_name order by emp_id)
then 'Lower than previous employee'
when e.salary=lag(salary) over(partition by dept_name order by emp_id)
then 'Equal to previous employee'
end sal_range
from employee e
The row shows null means there is no previous row.
I hope you guys can understand the window functions with all these examples.
There are also some other window functions like nth value, first value and last value etc.
FIRST_VALUE():
Returns a value evaluated against the first row within the partition.
LAST_VALUE():
Returns a value evaluated against the last row within the partition.
NTH_VALUE():
Returns a value evaluated against the nth row in an ordered partition.
NTILE():
Divides rows in a partition as equally as possible and assign each row an integer starting from 1 to the argument value.
Thanks for reading my blog.
REFERENCES:
Thank you for reading my blog.
Detailed Explanation!