In this guide, I will discuss some intermediate SQL operations! These are some fundamental knowledge that data scientists can’t live without them.
Today, I’ll show you:
Subqueries
Subqueries with SELECT, INSERT, UPDATE & DELETE
Corelated And Noncorrelated SQL Subqueries
Tips for using sub queries
These will definitely bring your SQL skillset to the next level and make you more efficient at your daily job.
In this guide, will not just discuss about fundamentals of subquery, how it works, and which problem it solves but also show you how to write subqueries in different scenarios, how to write both correlated and non-correlated sub-queries with real-world examples.
Let’s dive into this topic..
What is SubQuery
SubQuery in SQL is a query inside another query. Sometimes to get specific records from a database you may need to fire two separate SQL queries, subquery is a way to combine or join them in a single query. SQL query which is in the inner part of the main query is called the inner query while the outer part of the main query is called the outer query. In simple language query inside query, also called as Nested query.
In SubQuery
Inner query is executed first and Outer query is dependent on the result of inner query.
SubQuery is a Select query that is enclosed inside another query. The inner select query is usually used to determine the results of the outer select query.
There are some cases where your simple query isn’t enough to get solution of given scenario.
Let’s understand this phenomenon with examples .
Image shown below is an Employees table which consists of fields Employee id, First name, Last name, email, salary and other fields associated with employee’s information.
Now, Assume We want to find out which employee has a maximum salary.
So we will write a query :
SELECT first_name,salary FROM employees WHERE salary = MAX(salary);
ABOVE STATEMENT WILL THROW AN ERROR:
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
I will try to write it differently
SELECT first_name, MAX(salary) FROM employees;
THIS WILL ALSO THROW AN ERROR:
So I will breakdown my solution into steps
1. Select maximum salary from table.
2. Then save that table with different name.
3. Write another query for employee table to fetch record of the employee whose salary matches with the salary in new table. And might create a new table for this too to simplify the view.
Looks bit complex and protracted right ! and on top of that we are creating new tables to store temporary data which is not very efficient way to work with databases.
Instead of that will make a single nested query which does not required to store data into new tables and that runs very efficiently with SQL server.
Let’s see how the solution looks like
Here we are selecting first name and salary from employee table where salary is equal to, and then we are writing inner query that is, select maximum salary from employee table.
Here we are selecting first name and salary from the employees table where salary is equal to, and then we are writing inner query that is , select maximum salary from employee table.
So here, Inner query “SELECT MAX(salary) FROM employees”, calculates the maximum salary of an employee to the table and then my outer query “SELECT first_name,salary FROM employees WHERE salary = ”, will be executed based on salary value returned from the inner query.
And the output looks like this:
Now we want to verify that if the result we are getting is right. Let’s break it down query into parts and see how it works.
First (inner) query is executed and gives you the output value 24000.00. Then in second (Outer) query where highest salary value is used in WHERE condition And print the name and salary of the employee with highest salary. But in this process we are hard coding the Salary value.
Now , imagine if later on there are more other employees joined the company who has salary even higher than 24000.00, so in that scenario running just a second query will give a wrong output. Also when organization has large sets of data it’s not effective. So using the Subquery in such cases are the right solution.
Subqueries used with SELECT statement:
SQL subqueries are widely used with SELECT statement.
SYNTAX:
SELECT columnname FROM table WHERE columnname operator (SELECT columnname FROM table [WHERE])
Example: Let’s find out employees who earns salary that is less than average salary.
SELECT first_name,salary FROM employees
WHERE salary < (SELECT AVG(salary) FROM employees);
Query will give an output shown in image below:
Subqueries used with INSERT statement:
The INSERT statement uses the data returned from the subquery to INSERT INTO another table.
SYNTAX:
INSERT INTO tablename SELECT * FROM table WHERE VALUE OPERATOR (Inner Query)
Example: Let’s insert all the records of employees table into Employees_BKP table
QUERY :
INSERT INTO Employees_BKP
SELECT * FROM employees
WHERE employee_id IN (SELECT employee_id
FROM employees);
OUTPUT:
Subqueries used with UPDATE statement:
The subquery can be used in conjunction with the UPDATE statement.
Either single or multiple columns in a table can be updated when using subquery with the UPDATE statement.
SYNTAX:
UPDATE table SET column name = new value WHERE OPERATOR VALUES (SELECT column name FROM tablename [WHERE])
Example:
Assuming, we have Employees_BKP table available which is backup of employees table. The following example updates SALARY by 10% in the employees table for all the employees whose Salary is less than or equal to 5000.
QUERY:
UPDATE employees
SET SALARY = SALARY + (SALARY * 10)
WHERE salary IN
(SELECT salary FROM Employees_BKP
WHERE salary <= 5000
);
OUTPUT:
This will impact rows that has salary less than 5000 And increase it to 10%.In the table below you can see that records of the left table where salary is less than 5000 has been updated in table shown at the right hand side.
Subqueries used with DELETE statement:
The subquery can be used in conjunction with the DELETE statement.
Either single or multiple columns in a table can be updated when using subquery with the UPDATE statement.
SYNTAX:
DELETE FROM table name WHERE OPERATOR VALUE (Inner Query)
EXAMPLE
Assume that we have another dataset/table for employees who joined from 1987 to 1990. Now I want to delete records of those employees from this table.
The following example deletes the records from the employees table for all the employees whose who joined from 1987 to 1990
QUERY:
DELETE FROM employees
WHERE hire_date IN (SELECT hire_date FROM employees
WHERE hire_date BETWEEN '1987-01-01' AND '1990-12-31');
OUTPUT:
This would impact few rows in my existing table. You can clearly see that the first record, where the employee was hired in year 1987 has been deleted from the table.
SQL Corelated And Non Corelated Sub Queries
Subqueries can be categorized into two types:
A noncorrelated (simple)subquery obtains its results independently of its containing (outer) statement.
A correlated subquery requires values from its outer query in order to execute.
Correlated subqueries are used for row-by-row processing . With a normal nested subquery, the inner SELECT query runs first and executes once, returning values to be used by the main query. A correlated subquery, however, executes once for each row considered by the outer query. In other words, the inner query is driven by the outer query.
Example Of Correlated Subqueries: Find all the employees who earn more than the average salary in their department.
SELECT employee_id,first_name,last_name,salary,department_id
FROM employees
WHERE salary >(SELECT AVG(salary)
FROM employees
WHERE department_id = employees.department_id)
ORDER by department_id,
first_name,
last_name;
OUTPUT:
EXAMPLE of Correlated Subqueries With exist :
SELECT employee_id, manager_id, first_name, last_name FROM employees a
WHERE exists
(SELECT employee_id FROM employees b
WHERE b.manager_id = a.employee_id);
EXAMPLE of Correlated Subqueries With exist :
SELECT employee_id, manager_id, first_name, last_name FROM employees a
WHERE not exists
(SELECT employee_id FROM employees b
WHERE b.manager_id = a.employee_id);
Other than that use of correlation can be in UPDATE and DELETE Subqueries as well.
In non-correlated subquery, the inner query doesn't depend on the outer query and can run as a stand-alone query. All the subqueries other than the corelated subqueries are noncorrelated subquery. Difference between Correlated and Noncorrelated Subquery
Correlated Sub-queries are slower than non-correlated subqueries
In correlated subquery inner query dependent on the outer query, while in the noncorrelated subquery inner query or subquery does not depends on outer query and is run on its own.
In correlated subquery, the outer query is executed before the inner query or subquery while in the noncorrelated subquery inner query executes before the outer query.
Common examples of the correlated subquery is using EXIST and NOT EXIST keyword while non-correlated query mostly uses IN or NOT IN keywords.
When to use??
When the result of a one query needs to be used inside another query to get the results that you need.
If you want to use aggregate of aggregate function in sql query. You can’t do it in single query. Aggregate function calls can not be nested.
So for example, If you want to use max(avg(cost)), or min(avg(cost))
First inner query operation is performed and executed and the result is returned to the outer query, and
then the outer query is executed.
Sometimes using sql joins are wise choice over using subquery, Any information which you retrieve from the database using subquery can be retrieved by using different types of joins also. SQL is flexible and it provides different ways of doing the same thing. Some people find SQL Joins confusing and subquery specially noncorrelated more understanding but in terms of performance SQL Joins are more efficient than subqueries.
Subquery is used where we want to filter large number of rows. So let’s say if you have to process millions of rows in tables and you can use where clause then its efficient to use subqueries rather than joins. This is because, Database would have less data to process if it’s filtered inside the subquery instead of joins.
TIPS FOR USING SUBQUERIES
1 Breakdown your queries and write it in parts.
Find Average -> Find all employees -> Find all the employees who has greater than average salary.
Write the sub query first, execute it and check if you are getting the correct results.
Write the outer query see all the data.
Then add your sub query and do any other filtering.
2 Performance Depends on Data size, Any Indexes and weather it’s executed for every row.
Subqueries can be placed in several position : SELECT clause, WHERE clause , FROM clause.
Depending on the query written or the database, This subquery can be executed once or once per every
row. This could impact the performance.
3 Mandatory Syntax Check : Inner query must be between parentheses ().
4 Subquery is good replacement of temporary table.
5 Subquery should always return either a scalar value if used with where clause or value from a column if
used with IN or NOT IN clause.
Final Thoughts
In this blog we learned how to write a correlated subquery using Microsoft SQL Server Management Studio. Be alerted that correlated subqueries can be slow. However, with proper optimizing, their speed can be increased significantly. That’s all about subquery in SQL. It's an important concept to explore and learn, as both correlated and non-correlated subquery is important to solve SQL query-related problems. They are very essential learning skills from Data Analysis point of view. Master SQL concepts like correlated subqueries. I suppose this blog help understand the concept of Subqueries.
Comentários