top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Concepts of SQL for Data Analysis

SQL has everything to do with database management, but at the same time it’s becoming the go-to tool for data extraction, manipulation, and analysis.

Whether you're just starting your journey as a data analyst or looking to improvise your proficiency in its tools, understanding essential SQL concepts is non-negotiable.

 

SQL not only aids transforming raw data into insightful reports, creating data-driven applications, and executing complex data operations, it also brings out hidden details from databases all expertise based on our query calibre !!!

 

Following list includes, but is not limited to, the concepts of SQL that are necessary for Data Analysis

·       Basic Commands (DDL, DML, DCL, DQL and TCL commands)

·       Exclusive use of SELECT statement

·       CASE statement

·       Joins and Unions

·       Subqueries and CTEs (Common Table Expressions)

·       String and Date formatting

·       Window functions

 

Let’s explore a little more of the above listed SQL concepts.


SELECT statement

The most sought for and versatile commands of SQL is the SELECT statement which for definition purpose is said “The SELECT statement is the most basic and commonly used command in SQL, allowing you to retrieve data from one or more tables.”

Few illustrations:

·       SELECT column1, column2, …

FROM table_name;

 

·       SELECT column1, column2, ...

FROM table_name

WHERE condition

ORDER BY column1, LIMIT integer;

 

·       SELECT column1, column2, aggregate_function(column3)

FROM table_name

WHERE condition

GROUP BY column1, column2

 

·       SELECT column1, column2, aggregate_function(column3)

FROM table_name

WHERE condition

GROUP BY column1, column2

HAVING aggregate_function(column3) condition

ORDER BY column1, column2;

 

CASE statement

In simple words, the CASE expression is the way to build the IF - THEN logic into SQL.

A CASE statement must be followed by at least one WHEN…THEN expression and must end with an END keyword. Also, an ELSE is an optional in CASE statement

Syntax of CASE statement:

SELECT CASE expression

  WHEN condition1 THEN result1

  WHEN condition2 THEN result2

  ...

  ELSE result

END

 

JOINS and UNIONS

Joins help you combine data from multiple tables based on a related column between them, while unions allow you to combine the result sets of two or more SELECT statements.

Types of Joins:

  • INNER JOIN: returns rows when there is a match in both tables.

  • LEFT JOIN: returns all rows from the left table and the matched rows from the right table.

  • RIGHT JOIN: returns all rows from the right table and the matched rows from the left table.

  • FULL JOIN: returns all rows when there is a match in one of the tables.

  • SELF JOIN: used when we need to join a table to itself in the scenario of finding duplicate records, comparing rows or hierarchical data structures

 

To illustrate the various JOIN types in SQL, consider a scenario where we want to compile the relationship between sales figures and their corresponding sales representatives across different regions.

 

For this purpose, we have two tables: sales_data and representatives. They are linked by the rep_id field, which serves as a foreign key in the sales_data table and a primary key in the representatives table. Here's what that looks like:

 

SELECT s.sales, r.sales_rep

FROM sales_data s

INNER JOIN representatives r

ON s.rep_id = r.rep_id

WHERE s.region = 'East';

 

Union and Union All:

·       UNION : returns the distinct rows that appear in either of the two result sets. For ex.,

A = {1, 2, 3} and B = {2, 4}

A UNION B would result in {1, 2, 3, 4}

·       UNION ALL : returns all the rows including duplicates

Now, A UNION ALL B results in {1, 2, 2, 3, 4}

 

String and Date Formatting

The manipulation of string and date values is common in data analysis. Understanding how to format these types properly is crucial for meaningful analysis.

String Functions:

  • CONCAT(): merges two or more strings into one.

  • SUBSTRING() or SUBSTR(): returns a part of a string.

  • LENGTH() or LEN(): returns the length of a string.

  • UPPER() or UCASE(): converts all characters in a string to upper case

  • LOWER() or LCASE(): converts all characters in a string to lower case

  • TRIM() : removes leading and trailing spaces from a string

  • REPLACE() : Replaces all occurrences of a specified substring within a string with another substring.

  • POSITION() : returns the position of the first occurrence of a substring

  • REVERSE() : reverses the characters in a string

 

Date Functions:

  • CURRENT_DATE or GETDATE() : returns the current date.

  • CURRENT_TIMESTAMP or NOW() : returns the current date and time

  • DATEADD() : add a specified time interval to a date

  • DATEDIFF() : calculates the difference between two dates

  • DATEPART() or EXTRACT() : extracts parts of a date such as (year, month, or day)

 

WINDOW Functions

Window functions are a powerful feature that allow you to perform calculations across a set of table rows related to the current row. This set of rows is referred to as the "window," and it can be defined by an OVER clause. This includes the capability to perform running totals, moving averages, and more.

Important Window functions include:

·       ROW_NUMBER() :assigns a unique sequential number to each row in each partition in the order in which they. were sorted using the PARTITION clause. ORDER BY clause is mandatory  for using the ROW_NUMBER

 

ROW_NUMBER ( )

OVER ( [ PARTITION BY col_1,col_2… ] ORDER BY col_3,col_4.. ASC or DESC) AS column_name

·       RANK() : used to assign a rank to each row based on its value, just like ranks 1, 2, 3 … for ascending values of numbers except ranks are assigned with gaps in rank values if there are ties

 

RANK() OVER (

      [PARTITION BY expression, ]

     ORDER BY expression (ASC | DESC) )

·       DENSE_RANK() : works similar to RANK() but doesn’t leave gaps between ranks in case of tied rows.

 

DENSE_RANK() OVER (

PARTITION BY expr1[{,expr2...}]

ORDER BY expr1 [ASC|DESC], [{,expr2...}])

 

 

 

 

                                   

 

 

 

 

24 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page