SQL has everything to do with database management, but at the same time it’s becoming the go-to tool for data extraction, manipulation, and analysis.
Whether you're just starting your journey as a data analyst or looking to improvise your proficiency in its tools, understanding essential SQL concepts is non-negotiable.
SQL not only aids transforming raw data into insightful reports, creating data-driven applications, and executing complex data operations, it also brings out hidden details from databases all expertise based on our query calibre !!!
Following list includes, but is not limited to, the concepts of SQL that are necessary for Data Analysis
· Basic Commands (DDL, DML, DCL, DQL and TCL commands)
· Exclusive use of SELECT statement
· CASE statement
· Joins and Unions
· Subqueries and CTEs (Common Table Expressions)
· String and Date formatting
· Window functions
Let’s explore a little more of the above listed SQL concepts.
SELECT statement
The most sought for and versatile commands of SQL is the SELECT statement which for definition purpose is said “The SELECT statement is the most basic and commonly used command in SQL, allowing you to retrieve data from one or more tables.”
Few illustrations:
· SELECT column1, column2, …
FROM table_name;
· SELECT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column1, LIMIT integer;
· SELECT column1, column2, aggregate_function(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2
· SELECT column1, column2, aggregate_function(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2
HAVING aggregate_function(column3) condition
ORDER BY column1, column2;
CASE statement
In simple words, the CASE expression is the way to build the IF - THEN logic into SQL.
A CASE statement must be followed by at least one WHEN…THEN expression and must end with an END keyword. Also, an ELSE is an optional in CASE statement
Syntax of CASE statement:
SELECT CASE expression
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE result
END
JOINS and UNIONS
Joins help you combine data from multiple tables based on a related column between them, while unions allow you to combine the result sets of two or more SELECT statements.
Types of Joins:
INNER JOIN: returns rows when there is a match in both tables.
LEFT JOIN: returns all rows from the left table and the matched rows from the right table.
RIGHT JOIN: returns all rows from the right table and the matched rows from the left table.
FULL JOIN: returns all rows when there is a match in one of the tables.
SELF JOIN: used when we need to join a table to itself in the scenario of finding duplicate records, comparing rows or hierarchical data structures
To illustrate the various JOIN types in SQL, consider a scenario where we want to compile the relationship between sales figures and their corresponding sales representatives across different regions.
For this purpose, we have two tables: sales_data and representatives. They are linked by the rep_id field, which serves as a foreign key in the sales_data table and a primary key in the representatives table. Here's what that looks like:
SELECT s.sales, r.sales_rep
FROM sales_data s
INNER JOIN representatives r
ON s.rep_id = r.rep_id
WHERE s.region = 'East';
Union and Union All:
· UNION : returns the distinct rows that appear in either of the two result sets. For ex.,
A = {1, 2, 3} and B = {2, 4}
A UNION B would result in {1, 2, 3, 4}
· UNION ALL : returns all the rows including duplicates
Now, A UNION ALL B results in {1, 2, 2, 3, 4}
String and Date Formatting
The manipulation of string and date values is common in data analysis. Understanding how to format these types properly is crucial for meaningful analysis.
String Functions:
CONCAT(): merges two or more strings into one.
SUBSTRING() or SUBSTR(): returns a part of a string.
LENGTH() or LEN(): returns the length of a string.
UPPER() or UCASE(): converts all characters in a string to upper case
LOWER() or LCASE(): converts all characters in a string to lower case
TRIM() : removes leading and trailing spaces from a string
REPLACE() : Replaces all occurrences of a specified substring within a string with another substring.
POSITION() : returns the position of the first occurrence of a substring
REVERSE() : reverses the characters in a string
Date Functions:
CURRENT_DATE or GETDATE() : returns the current date.
CURRENT_TIMESTAMP or NOW() : returns the current date and time
DATEADD() : add a specified time interval to a date
DATEDIFF() : calculates the difference between two dates
DATEPART() or EXTRACT() : extracts parts of a date such as (year, month, or day)
WINDOW Functions
Window functions are a powerful feature that allow you to perform calculations across a set of table rows related to the current row. This set of rows is referred to as the "window," and it can be defined by an OVER clause. This includes the capability to perform running totals, moving averages, and more.
Important Window functions include:
· ROW_NUMBER() :assigns a unique sequential number to each row in each partition in the order in which they. were sorted using the PARTITION clause. ORDER BY clause is mandatory for using the ROW_NUMBER
ROW_NUMBER ( )
OVER ( [ PARTITION BY col_1,col_2… ] ORDER BY col_3,col_4.. ASC or DESC) AS column_name
· RANK() : used to assign a rank to each row based on its value, just like ranks 1, 2, 3 … for ascending values of numbers except ranks are assigned with gaps in rank values if there are ties
RANK() OVER (
[PARTITION BY expression, ]
ORDER BY expression (ASC | DESC) )
· DENSE_RANK() : works similar to RANK() but doesn’t leave gaps between ranks in case of tied rows.
DENSE_RANK() OVER (
PARTITION BY expr1[{,expr2...}]
ORDER BY expr1 [ASC|DESC], [{,expr2...}])
Comments