Hello Techies,
Here I am going to explore some functions which I used during my hackathon and some more basics knowledge in PostgreSQL. I think this blog will help you learn more about unique inbuilt functions that I came across in PostgreSQL. Let’s learn together!!
Introduction:
PostgreSQL is an open-source, multi-platform relational database system that supports Windows, Linux, and Unix. Renowned for its reliability, performance, and rich feature set, PostgreSQL is designed with object-oriented capabilities. It utilizes multi-version concurrency control (MVCC) to manage concurrent operations effectively, ensuring data integrity and consistency.
What is a Query?
A query is a request for data or information from a database. It allows users to retrieve, manipulate, and manage data stored in relational database management systems (RDBMS). SQL (Structured Query Language) is the standard language used for writing queries.
Types of SQL Queries
Data Retrieval Queries:
SELECT: Used to fetch data from one or more tables. You can specify columns, filter results, and sort data.
Data Manipulation Queries:
INSERT: Adds new records to a table.
UPDATE: Modifies existing records.
DELETE: Removes records from a table.
Data Definition Queries:
CREATE: Creates new tables or databases.
ALTER: Modifies the structure of existing tables.
DROP: Deletes tables or databases.
Data Control Queries:
GRANT: Gives users access privileges to database objects.
REVOKE: Removes user access privileges.
Here are some functions with Queries:
Coalesce ():
This Function is commonly used in PostgreSQL and other programming languages to handle null values in data. Its primary purpose is to return the first non-null value from a list of arguments
You can pass multiple arguments to this function. The function evaluates the arguments in the order they are provided, and it returns the first non-null value. If all values are null, it returns null.
Simple Example:
If you have a table with a patient’s first name, middle name, and last _name , if you want to create a full name that uses the middle name if it exist.
Here in this example:
If the middle name of the patient is not null, it will be used.
If the middle name is null, it will check first name.
If both middle name and first name are null, it will return last name.
Correlation () :
This function calculates the correlation between two numeric columns. This function can be particularly useful for statistical analysis and understanding relationships between variables in your data and the function automatically ignores null values in the columns when calculating the correlation. If all values are null, it will return null.
You can also use this function with GROUP BY to analyze correlations within different groups.
Simple example:
You have a table cardiac complication with columns hospital outcomes and cardiac severity.
value1 and 2: These are the columns for which you want to compute the correlation and they should be of a numeric type.
If you want to see the correlation by different groups,
Example:
If your data looked like this:
age group | hospital outcomes | cardiac severity |
18-25 | 1 | 2 |
18-25 | 0 | 1 |
26-35 | 1 | 3 |
26-35 | 1 | 2 |
36-45 | 0 | 4 |
Query:
The result of your query might look something like this:
age group | correlation coefficient |
18-25 | 0.5 |
26-35 | 0.9 |
36-45 | -0.3 |
Row_ number ():
This function will assign a unique sequential integer to each row within a result set. It generates a unique number for each row starting from 1. You can specify an order using the ORDER BY clause.
You can use the PARTITION BY clause to reset the row number for each group of rows. This is useful for organizing data into subsets.
Simple example:
Assume you have a table called students with a column score. This query assigns a rank to each student based on their score, with the highest score receiving a rank of 1.
student name | score |
Alice | 95 |
Bob | 85 |
Charlie | 90 |
David | 70 |
After running the query:
Output:
student_ name | score | rank |
Alice | 95 | 1 |
Charlie | 90 | 2 |
Bob | 85 | 3 |
David | 70 | 4 |
Explanation of Output
Alice has the highest score (95) and is ranked 1.
Charlie follows with a score of 90, ranked 2.
Bob has a score of 85, ranked 3.
David has the lowest score (70) and is ranked 4
Crosstab ():
This crosstab function helps you turn row data into columnar format, making it easier to compare different categories side by side.
Simple Example
Let's say you have a `sales` table like this:
product month amount |
A Jan 100 |
A Feb 150 |
B Jan 200 |
B Feb 250 |
You want to see the total sales for each product by month in a more readable format.
First you need to create extension
Next create a simple query using cross tab
Output for this query:
product Jan Feb |
A 100 150 |
B 200 250 |
Count () over () :
SQL, combining the COUNT () function with the OVER () clause and PARTITION BY opens the door to a totally different way to count rows. The COUNT () function combined with the OVER () clause in SQL allows you to perform a count operation that can be used in a windowed context. This means you can count rows based on a specified partition or the entire dataset without collapsing the result into a single row.
Example
Let’s say you have a table called employees:
Employee name | department | salary |
Alice | HR | 50000 |
Bob | HR | 60000 |
Charlie | IT | 70000 |
David | IT | 80000 |
Eve | Finance | 90000 |
Create a simple query using count () over ():
Output:
Employee name | department | Department count |
Alice | HR | 2 |
Bob | HR | 2 |
Charlie | IT | 2 |
David | IT | 2 |
Eve | Finance | 1 |
Explanation of Output:
The HR department has 2 employees.
The IT department also has 2 employees.
The Finance department has 1 employee.
A quick summary of all function used above:
Coalesce: This function is very powerful for dealing with null values.
Correlation: This function is a straightforward and effective way to assess the linear relationship between two numeric variables.
Row _number: This function is a powerful tool for creating ordered sequences within your SQL queries, enabling tasks like ranking, sorting, and grouping.
Crosstab: This function allows you to pivot your data easily, turning rows into columns for better comparison and analysis.
Count () Over (): This is a powerful way to get counts alongside your detailed rows without losing the context of each individual row. It’s particularly useful for reporting and analytics tasks!
These PostgreSQL functions provide powerful tools for data manipulation and analysis. Whether you're cleaning up datasets, analyzing relationships, assigning ranks, or pivoting data, each function plays a crucial role in helping you derive insights from your data effectively. Integrating these functions into your SQL queries can significantly enhance your data analysis capabilities.
Happy querying!