top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Balancing Act: Normalization vs. Denormalization in SQL


While working with SQL database there are only two concepts Normalization and Denormalizations. In this blog we will drive in to what normalization and denormalizations are, along with their advantages and disadvantages.


What is normalization?

Normalization is the process of eliminating data redundancy and organizing data in database. If the same data is stored in more than one table, then normalization is done. It is a process that puts data into tabular form and removes the data from relational tables. The main goal of normalization is breaking down large and complex tables into smaller tables without loosing relationships between data.


Advantages of Normalization:

1.Data Integrity: Ensure data is accurate and consistent across the database.

2.Reduce Redundancy: We save storage space by eliminating duplicate data.

3.Improve the query performance: we can perform faster queries on smaller tables.


Types of Normalization:

There are four normalization types: First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF)


First Normal Form(1NF): For normalization table removing redundancy column and fields, adding primary key.

Requirements for 1NF:

1.Atomic value: Each row in the table must contain a single value.

2.No repeating rows

3.No repeating groups

Example:

Consider a table storing customer orders:

Order ID

Customers Name

Items

10

John

Pen, Pencil

20

Tom

Book, Eraser

 

In this tables Items Columns having multiple values, which breaks rule for 1NF. To bring inti 1NF we need to split it into multiple rows.

Order ID

Customers Name

Items

10

John

Pen

10

John

Pencil

20

Tom

Book

20

Tom

Eraser

 

By applying the First Normal Form, you achieve atomicity, and every column has unique values.


Second Normal Form(2NF): this is process of normalization database which develops upon first normal form.

Requirements for 2 NF:

1.Table should be in first normal forms.

2. All non-prime attributes are fully functionally dependent on the primary key.

Example

Order ID

Product ID

Customers Name

Items

10

100

John

Pen

10

101

John

Pencil

 

Here ‘Customer Name’ is dependent on Order ID not on Product Id, which is breaking 2NF, so for normalizing Table,

we create separate table for Orders

Order ID

Customers Name

1

John

Create separate for Products

Product ID

Items

100

Pen

101

Pencil

 Create linking table

Order ID

Product ID

1

100

1

101

Third Normal Form(3NF): It is removing dependencies, ensures that any column that is not the primary key is only dependent on the primary key.

Requirements for 3 NF:

1.It should be in the second normal form.

2.It has no transitive dependencies.

Example

Consider a table,

Employee ID

Employee Name

Department

Manager Name

10

John

Sales

Tom

 

Here ‘Manager Name’ is dependent on ‘Department’ , not directly depend on Employee ID, which is breaking  3NF, To normalize table

Create separate table for Department

Department

Manager Name

Sales

Tom

 Employee Table

Employee ID

Employee Name

Department

10

John

Sales

 

Boyce-Codd Normal Form (BCNF): it is advance version of Third Normal form(3NF).

Requirements for Boyce-Codd Normal Form (BCNF)

1.A table is in BCNF if it is in 3NF.

2. For every functional dependency A→B, A is super key.

In simple terms BCNF deals in the situation where a table has more than one candidate key, and some of the candidate keys overlap with functional dependency.

  

Example:

Consider a table that courses taught by different teachers in various departments.

Course ID

Teacher

Department

C100

John

Information Technology

C105

Maya

Computer Science

C100

Tom

Mathematics

C 110

Lara

Chemistry

 Table Structure:

Course ID: It is the course identifier.

Teacher: Teacher teaching the course.

Department: Department offering the course.

Functional Dependency:

Course ID → Department: Each course is having single department.

Course ID, Teacher → Department: this is composite key which uniquely identifies each row. But it doesn’t make any logical sense hence teachers should be linked department through a course they teach, not independently.

 

Course ID

Teacher

Department

C100

John

Information Technology

C105

Maya

Computer Science

C100

Tom

Mathematics

C 110

Lara

Chemistry

 This table is not BCNF because,

Course ID is not super key, but it determines Department.

Courser ID is not super key which determines another super key i.e. Department which breaks the BCNF rule.

For normalization BCNF Table, we can create two tables here,

 

 Course Table:

‘Course ID’ and ‘Departments ‘are attributes.

This table has the relationship between ‘Course’ and ‘Departments’.

Course ID

Department

C100

Information Technology

C105

Computer Science

C100

Mathematics

C 110

Chemistry

 Teachers Table:

‘Course ID’ and ‘Teacher’ are attributes.

This table shows teacher is teaching which course.

Course ID

Teacher

C100

John

C105

Maya

C100

Tom

C 110

Lara

 

Now both tables are in BCNF.

1.for course table, ‘Course ID’ is super key, courses table correctly indicating which department offers each course.

2. For Teachers table, Teacher is teaching each course, where ‘Course ID’ is super key.

 

Denormalization

It is a process of reducing the complexity of database. It has redundancies but it can improve its performance by avoiding the number of joins. It is a completely reverse procedure of normalizations. In this process data is combined from multiple tables and form a single table to reduce complex joins.

Advantages of Denormalization:

1.Reduces number of joins: Data is stored in limited tables, so fewer joins are required to retrieve the data and because of this speed up the query performance.

2.Optimizing the Performance: Denormalization can improve the query performance of specific queries which can be used frequently.

3.Reduce data complexity: Combining all related data in to limited tables, Denormalization can simplify to database, and it is easier to manage.

Example:

Let’s see how a denormalization table works.

We can take the following normalization tables.

1.Customers Table

Customer ID

Customer Name

Customer Email Id

10

John

20

Tom

 2.Orders Table

Order ID

Customer ID

Order Date

101

10

2024-26-08

102

20

2024-06-09

 3.Order Item Table

Order Item ID

Order ID

Product ID

Quantity

500

101

P200

2

501

101

P203

1

502

102

P202

1

 

4.Products Table

Product ID

Product Name

Price

P200

Laptop

800

P203

Phone

400

P202

Tablet

200

 

To improve the performance of query, we might renormalize by combining these tables.

 

 

Order ID

Customer Name

Customer Email

Order Date

Product ID

Product Name

Price

Quantity

101

John

2024-26-08

P200

Laptop

800

2

101

John

2024-26-08

P203

Phone

400

1

102

Tom

2024-06-09

P202

Tablet

200

1

In this denormalization table,

‘Customer Name’ and ‘Customer Email’ are repeated in each product.


Disadvantages of Denormalization:

1.Redundancy increases: It increases data redundancy, which leads to more space.

2.Increase complexity: Denormalization data have more complex, which is difficult to maintain in data store.


Disadvantages of Normalization:

1.Repeated joins: In normalization, data is spread across multiple tables. For getting all the information together, we might have repeatedly joined these tables together. If we do join frequently then it can slow down database and make it work harder.

2.Difficult to adapt to changes: in normalization, when structure of data changes, the normalization database is more challenging and time consuming compared to denormalization one.


What is the difference between Normalization and Denormalization?

Normalization

Denormalization

Normalization is used in an OLTP system

Denormalization is used in an OLAP system

Data integrity is maintained.

Data integrity is harder to maintain.

Redundant data is eliminated.

Increases the redundant data.

Increases number of joins

Reduces number of joins.

 

Choosing normalization and denormalization depends upon the need of application. If we want data accuracy and minimum redundancy, then we will go with normalization. If we want faster queries and ready to manage redundant data, then we will go with denormalization.   

 

 

 

 

 

12 views0 comments

Comentários

Avaliado com 0 de 5 estrelas.
Ainda sem avaliações

Adicione uma avaliação
bottom of page