top of page
ektashah0440

SQL Subqueries: The Secret Ingredient to Smarter Queries

What is Subquery: 

Subquery in SQL is a query within a query. It’s asking a smaller question inside a bigger question. At times, to get specific records from a database you have to write two separate SQL queries, so subquery here can help to combine or join them into a single query. By using a subquery, you can refine the results of the main query to only include the data that you are interested in.



Why use Subqueries:

  • Subqueries are used to simplify complex queries i.e. they allow you to break down complex problems into smaller parts.

  • It lets you get results that change depending on the data in your database.



Scenario:

Let’s say we have an employees and departments table. We have to find all employees who are located in the location id 1500.









First, you would find all the departments located at the location_id 1500.





Output 1:








Second, you would find all the employees that belong to the location_id 1500 by using the department_id list of the previous query.






Output 2:


To get this output you had to first look at the departments table to check which department belonged to the location 1500.


Because of the small data volume, it was easy to get a list of departments. However, in the real world the data volume is very high and working on such queries would be difficult.






A much better solution to this problem is to use a subquery. You can rewrite combining the two queries above as follows:


The query placed within the parentheses is called a subquery/ inner query/ inner select and the query that contains the subquery is called outer query/ outer select.



Here, the database first executed the subquery and substituted the subquery between the parentheses with its result i.e. a number of department_id located at the location 1500 and then executed the outer query.



Let's look into more examples of using subqueries to understand them better and how to use them using SELECT clause, FROM clause, ANY, IN OR NOT IN, EXISTS OR NOT EXISTS.

These are the tables we would be using:
























  1. SELECT Clause

SELECT is the most basic clause used to specify which columns to retrieve data from. The following example finds the salaries of each employee along with the maximum salary within the same department they work.



Here the outer query retrieves employee_id, first_name, last_name, salary. And the subquery calculates the maximum salary in the same department. It uses e.department_id from the outer query to match the department of the current employee.


Output:













  1. FROM Clause


Let's say you want an average salary from all the departments.







Output:










  1. ANY Operator

ANY clause is used to compare a value with one or more values returned by a subquery. The ANY operator holds true if the condition holds true for at least one value from the result of the subquery. It can be used with various operators like <, >, =, <=, >=, <>.







Output:








  1. IN or NOT IN Operator

Subquery with IN operator allows checking if a value is within a set of values returned by the subquery i.e. if a column value matches any value in the result, the condition is true.


Subquery with NOT IN operator checks if a value is NOT IN the list of values returned by the subquery i.e. it does not match any value from the result, the condition is true.


Let's look at an example with the NOT IN operator.








Output:



Here the outer query retrieves the employee_id, first_name, and last_name from the employees table.

The subquery retrieves the department_id of the department where the department_name is 'sales'.

The outer query then uses NOT IN to exclude employees who belong to the Sales department as returned by the subquery.




Note: When using IN or NOT IN with subqueries, especially where subqueries return a large number of values, the query may become slower. In some scenarios, a JOIN or EXISTS/NOT EXISTS may offer better performance.



  1. EXISTS OR NOT EXISTS Operator

The EXISTS operator checks for the existence of rows returned from the subquery. The subquery typically uses SELECT 1 (or another constant) because the actual data returned is not important, just the existence of rows.

The NOT EXISTS operator is opposite, i.e. returns true if the subquery does not return any rows.


These operators are most effective with correlated subqueries, where the inner query depends on the outer query.






SELECT 1 is used as the actual value doesn't matter just the existence of rows matter.




Output:

Here, WHERE EXISTS is used to filter out rows in the departments table where the subquery returns at least one row.

The subquery checks whether there is at least one employee in the department who has a salary greater than 10,000.



Correlated vs Non-correlated Subqueries

A correlated subquery depends on the outer query for its value. It's executed once for each row.

Example:

SELECT e1.employee_id, e1.salary

FROM employees e1

WHERE e1.salary > (SELECT AVG(e2.salary)

FROM employees e2

WHERE e2.department_id = e1.department_id);


Here, e1 is the alias for the employees table in the outer query and e2 is an alias for the employees table in the subquery. First, e1 retrieves employee id, salary. Then the subquery calculates the average salary for employees in the same department. And finally the outer query checks if the salary is greater than the calculated average salary. If true, it will show up in the output.



A Non-Correlated subquery does not depend on the outer query and are executed once, regardless on the number of rows.

Example:

SELECT employee_id, salary

FROM employees

WHERE salary > (SELECT AVG(salary) FROM employees);



Types of Subqueries


  1. Scalar - when a subquery returns a single value i.e. single row and single column.

  2. Single or multiple rows - returns single or multiple rows of data.

  3. Single or multiple columns - returns single or multiple columns.

  4. Table - returns multiple rows and columns.




Tips

  • Keep subqueries simple and readable.

  • Subqueries must always be enclosed in parentheses.

  • At times using JOINS or CTEs or window functions might be more efficient.

  • When using correlated queries, make sure the condition is correctly linked to the outer query.

  • Subqueries can be used in clauses like WHERE, FROM, HAVING, and with commands like SELECT, UPDATE, DELETE, and INSERT.












23 views

Recent Posts

See All
bottom of page