top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Handy SQL commands in a Data Analyst’s tool kit! Part 2


We saw 5 SQL commands in the previous blog. Let us see another 5 SQL commands in this blog. Let’s get started.

6. ORDER BY

Now we have a batch of data that is fetched. And we like to view the top or bottom records of a particular column. In this case, we can make use of the ORDER BY. We can sort the data fetched in either ascending or descending order. If the column contains numbers it is sorted numerically order and if the column contains text then it will be sorted in alphabetical order. There is one more aspect that can be noted. If we do not specify the ORDER BY option the SQL by default sorts by ascending order.

EX: Sorting the Gender table on the Gender_ID column in descending order.

QUERY: SELECT * FROM “Gender” ORDER BY “Gender_ID” DESC

7. GROUP BY

Now that we are able to sort the data in ascending and descending order, we would like to group data in a particular manner. Here in this case we can make use of the GROUP BY. It will help sort the data into groups.

Within SQL the grouping can be done when there is one of the aggregate functions that is used in the SQL query. GROUP BY always works with the aggregate functions in concurrence. Some of the aggregate functions are as follows:

  • · MAX : This will fetch the maximum value.

  • · MIN : This will fetch the minimum value.

  • · SUM : This will return the total sum of all the values.

  • · AVG : This will return the average of the values

  • · COUNT : This will fetch the total number of rows.

Here are some of the examples for the above-mentioned aggregate functions.

EX 1: MAX — To get the highest of the patients ID number. The table contains the highest patients ID as 1012.

QUERY: SELECT MAX(“Patient_ID”) FROM “Patients”;


EX 2: MIN — To get the lowest of the patients ID number. The table contains the lowest patients ID as 1

QUERY: SELECT MIN(“Patient_ID”) FROM “Patients”;


EX 3: SUM — To get the SUM of the patients ID number. Since I do not have another numeric column. I am trying to add the patients ID as a total.

QUERY: SELECT SUM(“Patient_ID”) FROM “Patients”;


EX 4: AVG — To get the AVG of the patients ID number. Since I do not have another numeric column. I am trying to get the average of the Patient ID.

QUERY: SELECT AVG(“Patient_ID”) FROM “Patients”;


EX 5.a: COUNT — To retrieve the total no of patients in the table.

QUERY: SELECT COUNT(“Patient_ID”) FROM “Patients”;


EX 5.b: GROUP BY with COUNT — To retrieve the total no of male and female patients in the table and sort by which gender is more.

QUERY: SELECT COUNT(“Patient_ID”), “Gender_ID”

FROM “Patients”

GROUP BY “Gender_ID”

ORDER BY COUNT(“Patient_ID”) DESC;


NOTE : Few more pointers are as follows. When we are using the aggregation functions we can not use the WHERE. Hence in such a situation, we have to use the HAVING clause. Also, the order of the keyword goes this way. SELECT -> AGGRE FUNC -> FROM -> GROUP BY -> HAVING -> ORDER BY


EX 6: GROUP BY with COUNT and HAVING — To retrieve the total no of male and female patients in the table and sort by which gender is more and display it if it satisfies an additional condition that the count is more than 500.

QUERY: SELECT COUNT(“Patient_ID”), “Gender_ID”

FROM “Patients”

GROUP BY “Gender_ID”

ORDER BY COUNT(“Patient_ID”) DESC;


8. JOIN

When we want to fetch data from 2 or more tables the JOIN comes in handy. There are couple of types of joins, and they are INNER JOIN, FULL JOIN, LEFT JOIN, RIGHT JOIN, and SELF JOIN.

Here is what the joins mean. We consider 2 tables to be joined here. One we think of it as the left table and the other as a right table. This is similar to the set theory.

  • · INNER JOIN : This JOIN returns records that have matching values in both tables (the intersection of 2 sets).

  • · FULL JOIN : This JOIN will return records that have matching values on either the left table or right table (the union of both sets). It is also called the FULL OUTTER JOIN.

  • · LEFT JOIN : This JOIN returns all the records from the left table and the matching records from the right table (This consists of all of the values in the left set). It is also called the LEFT OUTTER JOIN.

  • · RIGHT JOIN : This JOIN returns all the records from the right table and the matching records from the left table (This consists of all of the values in the right set). It is also called the RIGHT OUTTER JOIN.

  • · SELF JOIN : SELF JOIN works the same as the regular joins that are discussed above. Here the right and the left table is the same table. It is like joining a table with itself.

The LEFT JOIN is the most frequently used. The tables are switched in a LEFT JOIN and it is the same as a RIGHT JOIN. It is also possible to join more than 2 tables based on the same concepts.


Lets us see how the JOIN work with an example of an INNER JOIN


EX 1: Fetching the first name, last name, and patient ID of all the patients who are present in the EDVisits. PatientID is on the left table and EDvisit is on the right table and they are joined on the PatientID.

QUERY: SELECT “EDVisits”.”EDVisit_ID”, “Patients”.”Patient_ID”, “Patients”.”FirstName”, “Patients”.”LastName”

FROM “Patients”

INNER JOIN “EDVisits” ON “EDVisits”.”Patient_ID” = “Patients”.”Patient_ID”;


9. UNION

Now when we want to combine data from 2 or more tables into one, which is the data returned from 2 or more SELECT statements. We can make use of the UNION. But, one aspect to keep in mind is, all the columns and their datatype have to match among the tables we are trying to join using the UNION command.

  • · The UNION must contain the same number of columns within all the select statements

  • · These columns must be in the same order for all the select statements

  • · The columns must be of the same data types

Only the distinct values are selected in a UNION as default. If we need to allow duplicate values, we have to use UNION ALL. We can take a look at how the UNION works with this example.

EX 1: To get the list of all the patients who visited the AmbulatoryVisit and the EDVisits by their patient ID using a UNION command.

QUERY: SELECT “Patient_ID” FROM “EDVisits”

UNION

SELECT “Patient_ID” FROM “AmbulatoryVisits”

ORDER BY “Patient_ID”;


EX 2: UNION ALL returns all the duplicate values too.

QUERY: SELECT “Patient_ID” FROM “EDVisits”

UNION ALL

SELECT “Patient_ID” FROM “AmbulatoryVisits”

ORDER BY “Patient_ID”;


10. CASE

The CASE is similar to the if-then-else statement. The CASE check for conditions and it returns a value when the first condition is met. Once a condition is met, it will stop at that point and return the result. If none of the conditions under the WHEN are met, it will return the value that is in the ELSE clause.

If the CASE is missing the ELSE part and none of the conditions under the WHEN is true, it will return NULL.

Let’s look at an example of CASE

EX: Classifying the pulse recorded for a patient based on some range

QUERY: SELECT “Patient_ID”, “Pulse”,

CASE

WHEN “Pulse” > 80 THEN ‘The pluse is greater than 80’

WHEN “Pulse” = 80 THEN ‘The pluse is 80’

ELSE ‘The pluse is under 80’

END AS PluseText

FROM “AmbulatoryVisits”;




These were the top 10 SQL commands that come in handy for a data analyst. All these commands come under the data manipulation language (DML). Hoping to catch up on a different topic next time.

18 views0 comments

Recent Posts

See All
bottom of page