top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

SQL for Data Analysis-(Aggerate functions )Part 2

SQL is short for Structured Query Language. Originally, it used to be called SEQUEL (Structured English Query Language) and was used for storing and manipulating data in databases. Today SQL is used to perform all types of data operations in relational database management systems (RDBMS).


SQL is a powerful language where we can perform a wide range of operations:


  • execute queries

  • fetch data

  • insert, update, and delete records in a database (DML operations)

  • create new objects in a database (DDL operations)

  • set permissions on tables, procedures, functions, and views

  • and much, much more...

These SQL commands are mainly categorized into five categories as:

  1. DDL – Data Definition Language

  2. DQL – Data Query Language

  3. DML – Data Manipulation Language

  4. DCL – Data Control Language

  5. TCL – Transaction Control Language

In the last part of the Blog post, we covered DDL,DML, TCL , DCL,DQL commands.

In this Blog we will cover the actual application of aggregations, filtering data, and combining data on

different dataset (dvdrental dataset)

Aggregate Functions

Aggregate functions compute a single result from a set of input values.The built-in general-purpose aggregate functions are count, Avg, sum, min, max,ARRAY_AGG,

1. COUNT

We want to find out the number of film listings we have. To do so we use the COUNT() function which will add up the number of rows where the specified column is not NULL.

select count(*) from film; --count all rows in table film;


select count (distinct title) from film; --count unique title from film table;


select count(first_name) from public.actor;



select count(Distinct first_name) from public.actor; --count unique names from table actor



2. SUM

Let’s say we want to find the total income generated from renting. Using the SUM function we can return the total sum of a numeric column.


select sum(rental_rate) as total_income from public.film



3. AVERAGE

Let’s say you want to see what the average renting_rate, avg duration is. By using the AVG function the average value of the numeric column will be returned.


select AVG(rental_rate)as Avg_Rate from public.film


select round(AVG(rental_rate))as Avg_Rate from public.film


select round(AVG(length))as duration ,round(AVG(rental_rate))as Avg_Rate from public.film;



By default, SQL tries to be as precise as possible without rounding. We can make the result table easier to read using the ROUND() function.

ROUND() function takes two arguments inside the parenthesis:1. a column name 2. an integer


SELECT ROUND(AVG(rental_rate), 0) as round_value from film




0 after the comma represents numbers after the decimal point. since it is zero, it will return an integer.

4. MAXIMUM & MINIMUM

To find what is the maximum and minimum movie length or rental rate. we use the MIN and MAX functions.

Note: AS is a keyword in SQL that allows you to rename a column or table using an alias.


select min(rental_rate) as lowest_rate,max(rental_rate)as highest_rate,min(length)as short_film

,max(length)as lengthy_film

from public.film



you can list more than one aggregate function after a SELECT, and there is no strict limit to this.

However, as you apply more aggregate functions within a query, it can start to become more complex, and the results might not be easy to read with all the information. So, it is possible to do this, but it may not be the best choice to combine so many aggregate functions in a single query.


select count(*),round(AVG(rental_rate))as Avg_Rate from film




5. GROUP BY

GROUP BY is a clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.

Oftentimes, we will want to calculate an aggregate for data with certain characteristics.

Group By is a clause used with aggregate functions to combine data from one or more columns.It groups rows with the same values into summary rows and is commonly used with aggregate functions like AVG.

if we want to see how many films rented for 4 times, 5 times...


SELECT count(*),rental_duration

FROM public.film

GROUP BY rental_duration;



if we want to see count of movies according to their release year.


SELECT count(*),release_year

FROM public.film

GROUP BY release_year

order by release_year desc;


It seems all movies are released in 2006 , or we have data for all movies released in 2006

If we want to see number of movies rented considering rental rate


select count(*),rental_rate

from public.film

group by rental_rate




Here in this example group by and order by is stated with column number from select statement.

( like select count(*) is first column

and rental_rate is 2nd column )


select count(*),rental_rate

from public.film

group by 2

order by 1 ;



6. HAVING

HAVING acts in the same way as the WHERE clause with the difference being that HAVING is used for aggregate functions. Let’s suppose want to return the count of the film that has a rental rate of 4.99.

Note:from Public.film , ' public.' part is optional, it comes when you drag it from tables.

select count(*), rental_rate

from public.film

group by rental_rate

having rental_rate = 4.99;


select count(*),rating

from public.film

group by rating

Having rating in ('PG', 'PG-13');



Note: You use IN operator in the WHERE or HAVING clause to check if a value matches any value in a list of values.

7. ORDER BY

To order the results obtained from a query we use the ORDER BY command. By default, the order will be ascending (A -> Z / 1 -> 100) however we can change this using DESC. Let’s say for example we want to order our listings by length.

select title,length

from public.film

order by length desc

limit 20;

more examples

select title, rental_rate

from public.film

order by rental_rate desc

limit 25;


select title,description

from public.film

order by title desc

limit 5;



Note: LIMIT is a clause that lets you specify the maximum number of rows the result set will have. This saves space on our screen and makes our queries run faster

In the example below we are sorting rental rate high to low (Descending) and length small to high (ascending)


select title, rental_rate,length

from public.film

order by rental_rate desc, length asc

limit 10;

As explained previously we can provide column number according to select statement column sequence.


select title, rental_rate,length

from public.film

order by 2 desc,3 asc

limit 10;


Note: I haven't provided screenshots for Queries.

If you like my blog please share it with whoever is learning SQL.

Thank you

47 views0 comments

コメント

5つ星のうち0と評価されています。
まだ評価がありません

評価を追加
bottom of page