While working with SQL database there are only two concepts Normalization and Denormalizations. In this blog we will drive in to what normalization and denormalizations are, along with their advantages and disadvantages.
What is normalization?
Normalization is the process of eliminating data redundancy and organizing data in database. If the same data is stored in more than one table, then normalization is done. It is a process that puts data into tabular form and removes the data from relational tables. The main goal of normalization is breaking down large and complex tables into smaller tables without loosing relationships between data.
Advantages of Normalization:
1.Data Integrity: Ensure data is accurate and consistent across the database.
2.Reduce Redundancy: We save storage space by eliminating duplicate data.
3.Improve the query performance: we can perform faster queries on smaller tables.
Types of Normalization:
There are four normalization types: First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF)
First Normal Form(1NF): For normalization table removing redundancy column and fields, adding primary key.
Requirements for 1NF:
1.Atomic value: Each row in the table must contain a single value.
2.No repeating rows
3.No repeating groups
Example:
Consider a table storing customer orders:
Order ID | Customers Name | Items |
10 | John | Pen, Pencil |
20 | Tom | Book, Eraser |
In this tables Items Columns having multiple values, which breaks rule for 1NF. To bring inti 1NF we need to split it into multiple rows.
Order ID | Customers Name | Items |
10 | John | Pen |
10 | John | Pencil |
20 | Tom | Book |
20 | Tom | Eraser |
By applying the First Normal Form, you achieve atomicity, and every column has unique values.
Second Normal Form(2NF): this is process of normalization database which develops upon first normal form.
Requirements for 2 NF:
1.Table should be in first normal forms.
2. All non-prime attributes are fully functionally dependent on the primary key.
Example
Order ID | Product ID | Customers Name | Items |
10 | 100 | John | Pen |
10 | 101 | John | Pencil |
Here ‘Customer Name’ is dependent on Order ID not on Product Id, which is breaking 2NF, so for normalizing Table,
we create separate table for Orders
Order ID | Customers Name |
1 | John |
Create separate for Products
Product ID | Items |
100 | Pen |
101 | Pencil |
Create linking table
Order ID | Product ID |
1 | 100 |
1 | 101 |
Third Normal Form(3NF): It is removing dependencies, ensures that any column that is not the primary key is only dependent on the primary key.
Requirements for 3 NF:
1.It should be in the second normal form.
2.It has no transitive dependencies.
Example
Consider a table,
Employee ID | Employee Name | Department | Manager Name |
10 | John | Sales | Tom |
Here ‘Manager Name’ is dependent on ‘Department’ , not directly depend on Employee ID, which is breaking 3NF, To normalize table
Create separate table for Department
Department | Manager Name |
Sales | Tom |
Employee Table
Employee ID | Employee Name | Department |
10 | John | Sales |
Boyce-Codd Normal Form (BCNF): it is advance version of Third Normal form(3NF).
Requirements for Boyce-Codd Normal Form (BCNF)
1.A table is in BCNF if it is in 3NF.
2. For every functional dependency A→B, A is super key.
In simple terms BCNF deals in the situation where a table has more than one candidate key, and some of the candidate keys overlap with functional dependency.
Example:
Consider a table that courses taught by different teachers in various departments.
Course ID | Teacher | Department |
C100 | John | Information Technology |
C105 | Maya | Computer Science |
C100 | Tom | Mathematics |
C 110 | Lara | Chemistry |
Table Structure:
Course ID: It is the course identifier.
Teacher: Teacher teaching the course.
Department: Department offering the course.
Functional Dependency:
Course ID → Department: Each course is having single department.
Course ID, Teacher → Department: this is composite key which uniquely identifies each row. But it doesn’t make any logical sense hence teachers should be linked department through a course they teach, not independently.
Course ID | Teacher | Department |
C100 | John | Information Technology |
C105 | Maya | Computer Science |
C100 | Tom | Mathematics |
C 110 | Lara | Chemistry |
This table is not BCNF because,
Course ID is not super key, but it determines Department.
Courser ID is not super key which determines another super key i.e. Department which breaks the BCNF rule.
For normalization BCNF Table, we can create two tables here,
Course Table:
‘Course ID’ and ‘Departments ‘are attributes.
This table has the relationship between ‘Course’ and ‘Departments’.
Course ID | Department |
C100 | Information Technology |
C105 | Computer Science |
C100 | Mathematics |
C 110 | Chemistry |
Teachers Table:
‘Course ID’ and ‘Teacher’ are attributes.
This table shows teacher is teaching which course.
Course ID | Teacher |
C100 | John |
C105 | Maya |
C100 | Tom |
C 110 | Lara |
Now both tables are in BCNF.
1.for course table, ‘Course ID’ is super key, courses table correctly indicating which department offers each course.
2. For Teachers table, Teacher is teaching each course, where ‘Course ID’ is super key.
Denormalization
It is a process of reducing the complexity of database. It has redundancies but it can improve its performance by avoiding the number of joins. It is a completely reverse procedure of normalizations. In this process data is combined from multiple tables and form a single table to reduce complex joins.
Advantages of Denormalization:
1.Reduces number of joins: Data is stored in limited tables, so fewer joins are required to retrieve the data and because of this speed up the query performance.
2.Optimizing the Performance: Denormalization can improve the query performance of specific queries which can be used frequently.
3.Reduce data complexity: Combining all related data in to limited tables, Denormalization can simplify to database, and it is easier to manage.
Example:
Let’s see how a denormalization table works.
We can take the following normalization tables.
1.Customers Table
Customer ID | Customer Name | Customer Email Id |
10 | John | |
20 | Tom |
2.Orders Table
Order ID | Customer ID | Order Date |
101 | 10 | 2024-26-08 |
102 | 20 | 2024-06-09 |
3.Order Item Table
Order Item ID | Order ID | Product ID | Quantity |
500 | 101 | P200 | 2 |
501 | 101 | P203 | 1 |
502 | 102 | P202 | 1 |
4.Products Table
Product ID | Product Name | Price |
P200 | Laptop | 800 |
P203 | Phone | 400 |
P202 | Tablet | 200 |
To improve the performance of query, we might renormalize by combining these tables.
Order ID | Customer Name | Customer Email | Order Date | Product ID | Product Name | Price | Quantity |
101 | John | 2024-26-08 | P200 | Laptop | 800 | 2 | |
101 | John | 2024-26-08 | P203 | Phone | 400 | 1 | |
102 | Tom | 2024-06-09 | P202 | Tablet | 200 | 1 |
In this denormalization table,
‘Customer Name’ and ‘Customer Email’ are repeated in each product.
Disadvantages of Denormalization:
1.Redundancy increases: It increases data redundancy, which leads to more space.
2.Increase complexity: Denormalization data have more complex, which is difficult to maintain in data store.
Disadvantages of Normalization:
1.Repeated joins: In normalization, data is spread across multiple tables. For getting all the information together, we might have repeatedly joined these tables together. If we do join frequently then it can slow down database and make it work harder.
2.Difficult to adapt to changes: in normalization, when structure of data changes, the normalization database is more challenging and time consuming compared to denormalization one.
What is the difference between Normalization and Denormalization?
Normalization | Denormalization |
Normalization is used in an OLTP system | Denormalization is used in an OLAP system |
Data integrity is maintained. | Data integrity is harder to maintain. |
Redundant data is eliminated. | Increases the redundant data. |
Increases number of joins | Reduces number of joins. |
Choosing normalization and denormalization depends upon the need of application. If we want data accuracy and minimum redundancy, then we will go with normalization. If we want faster queries and ready to manage redundant data, then we will go with denormalization.