SQL for Data Analysis-(Aggerate functions )Part 2
SQL is short for Structured Query Language. Originally, it used to be called SEQUEL (Structured English Query Language) and was used for storing and manipulating data in databases. Today SQL is used to perform all types of data operations in relational database management systems (RDBMS).
SQL is a powerful language where we can perform a wide range of operations:
insert, update, and delete records in a database (DML operations)
create new objects in a database (DDL operations)
set permissions on tables, procedures, functions, and views
and much, much more...
These SQL commands are mainly categorized into five categories as:
DDL – Data Definition Language
DQL – Data Query Language
DML – Data Manipulation Language
DCL – Data Control Language
TCL – Transaction Control Language
In the last part of the Blog post, we covered DDL,DML, TCL , DCL,DQL commands.
In this Blog we will cover the actual application of aggregations, filtering data, and combining data on
different dataset (dvdrental dataset)
Aggregate functions compute a single result from a set of input values.The built-in general-purpose aggregate functions are count, Avg, sum, min, max,ARRAY_AGG,
We want to find out the number of film listings we have. To do so we use the COUNT() function which will add up the number of rows where the specified column is not NULL.
select count(*) from film; --count all rows in table film;
select count (distinct title) from film; --count unique title from film table;
select count(first_name) from public.actor;
select count(Distinct first_name) from public.actor; --count unique names from table actor
Let’s say we want to find the total income generated from renting. Using the SUM function we can return the total sum of a numeric column.
select sum(rental_rate) as total_income from public.film
Let’s say you want to see what the average renting_rate, avg duration is. By using the AVG function the average value of the numeric column will be returned.
select AVG(rental_rate)as Avg_Rate from public.film
select round(AVG(rental_rate))as Avg_Rate from public.film
select round(AVG(length))as duration ,round(AVG(rental_rate))as Avg_Rate from public.film;
By default, SQL tries to be as precise as possible without rounding. We can make the result table easier to read using the ROUND() function.
ROUND() function takes two arguments inside the parenthesis:1. a column name 2. an integer
SELECT ROUND(AVG(rental_rate), 0) as round_value from film
0 after the comma represents numbers after the decimal point. since it is zero, it will return an integer.
4. MAXIMUM & MINIMUM
To find what is the maximum and minimum movie length or rental rate. we use the MIN and MAX functions.
Note: AS is a keyword in SQL that allows you to rename a column or table using an alias.
select min(rental_rate) as lowest_rate,max(rental_rate)as highest_rate,min(length)as short_film
you can list more than one aggregate function after a SELECT, and there is no strict limit to this.
However, as you apply more aggregate functions within a query, it can start to become more complex, and the results might not be easy to read with all the information. So, it is possible to do this, but it may not be the best choice to combine so many aggregate functions in a single query.
select count(*),round(AVG(rental_rate))as Avg_Rate from film
5. GROUP BY
GROUP BY is a clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.
Oftentimes, we will want to calculate an aggregate for data with certain characteristics.
Group By is a clause used with aggregate functions to combine data from one or more columns.It groups rows with the same values into summary rows and is commonly used with aggregate functions like AVG.
if we want to see how many films rented for 4 times, 5 times...
GROUP BY rental_duration;
if we want to see count of movies according to their release year.
GROUP BY release_year
order by release_year desc;
It seems all movies are released in 2006 , or we have data for all movies released in 2006
If we want to see number of movies rented considering rental rate
group by rental_rate
Here in this example group by and order by is stated with column number from select statement.
( like select count(*) is first column
and rental_rate is 2nd column )
group by 2
order by 1 ;
HAVING acts in the same way as the WHERE clause with the difference being that HAVING is used for aggregate functions. Let’s suppose want to return the count of the film that has a rental rate of 4.99.
Note:from Public.film , ' public.' part is optional, it comes when you drag it from tables.
select count(*), rental_rate
group by rental_rate
having rental_rate = 4.99;
group by rating
Having rating in ('PG', 'PG-13');
Note: You use IN operator in the WHERE or HAVING clause to check if a value matches any value in a list of values.
7. ORDER BY
To order the results obtained from a query we use the ORDER BY command. By default, the order will be ascending (A -> Z / 1 -> 100) however we can change this using DESC. Let’s say for example we want to order our listings by length.
order by length desc
select title, rental_rate
order by rental_rate desc
order by title desc
Note: LIMIT is a clause that lets you specify the maximum number of rows the result set will have. This saves space on our screen and makes our queries run faster
In the example below we are sorting rental rate high to low (Descending) and length small to high (ascending)
select title, rental_rate,length
order by rental_rate desc, length asc
As explained previously we can provide column number according to select statement column sequence.
select title, rental_rate,length
order by 2 desc,3 asc
Note: I haven't provided screenshots for Queries.
If you like my blog please share it with whoever is learning SQL.