top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Normalization in DBMS

Introduction:

  • It is a database design technique that reduces data redundancy and eliminates Insertion, Deletion, and Update anomalies.

  • It divides larger table in to smaller tables and links them using relationships.


Database Normalization is used for following Purpose:

  1. To Eliminate the redundant or useless data

  2. To Reduce the complexity of the data

  3. To Ensure the relationship between tables as well as data in the tables

  4. To Ensure data dependencies and data is logically stored.

Need Of Database Normalization :

  • If your database objects are not in structured or normalized way it is difficult to update the database without facing the database loss.

  • Insertion,Updation and deletion are very frequent if data is not normalized.

  • Normalization is part of successful database design.

  • Without Database normalization the database system can be slow,Inaccurate and inefficient.


Real Life Example of data which is not normalized:

Roll No

Student Name

Branch Name

HOD

Office Phone

101

Valli

CSE

Mr. George

9876543210

102

Nagesh

CSE

Mr. George

9876543210

103

Aadya

CSE

Mr. George

9876543210

104

Avyay

CSE

Mr. George

9876543210

In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod(Head of Department) and office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy.


Insertion Anomaly

Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL.Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.

These scenarios are nothing but Insertion anomalies.


Updation Anomaly

What if Mr. George leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.


Deletion Anomaly In our Student table, two different informations are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly.

Normalization Rule Normalization rules are divided into the following normal forms:

  1. First Normal Form

  2. Second Normal Form

  3. Third Normal Form

  4. BCNF

  5. Fourth Normal Form

First Normal Form (1NF) For a table to be in the First Normal Form, it should follow the following 4 rules:

  1. It should only have single(atomic) valued attributes/columns.

  2. Values stored in a column should be of the same domain

  3. All the columns in a table should have unique names.

  4. And the order in which data is stored, does not matter.

Time for an Example:


Roll No

Student Name

Subject

101

Valli

OS,CN

103

Nagesh

Java

102

Aadya

C,C++

The above table satisfies 3 out of 4 rules of 1NF. So now if we want bring the above table into 1NF we have update the table as below:


Roll No

Student Name

Subject

101

Valli

OS

101

Valli

CN

102

Nagesh

Java

103

Aadya

C

103

Aadya

C++

Using the First Normal Form, data redundancy increases, as there will be many columns with same data in multiple rows but each row as a whole will be unique.

Second Normal Form (2NF) For a table to be in the Second Normal Form,

  1. It should be in the First Normal form.

  2. And, it should not have Partial Dependency.

Dependency

Let's take an example of a Student table with columns student_id, name, reg_no(registration number), branch and address(student's home address).

In this table, student_id is the primary key and will be unique for every row, hence we can use student_id to fetch any row of data from this table

Even for a case, where student names are same, if we know the student_id we can easily fetch the correct record.

Student(Table):

student_id

name

reg_no

branch

address

10

Harika

02-EQ

CSE

South Windsor

11

Siva

03-LK

IT

Manchester

12

Harshi

04-gh

IT

Vernon

Hence we can say a Primary Key for a table is the column or a group of columns(composite key) which can uniquely identify each record in the table.

So all I need is student_id and every other column depends on it, or can be fetched using it.

This is Dependency and we also call it Functional Dependency.


Partial Dependency

Let's create another table for Subject, which will have subject_id and subject_name fields and subject_id will be the primary key.


subject_id

subject_name

1

Java

2

C++

3

Php

Now we have a Student table with student information and another table Subject for storing subject information.

Let's create another table Score, to store the marks obtained by students in the respective subjects. We will also be saving name of the teacher along with marks.


score_id

student_id

subject_id

marks

name of the teacher

1

10

1

70

Java Teacher

2

10

2

75

C++ Teacher

3

11

1

80

Java Teacher

In the score table we are saving the student_id to know which student's marks are these and subject_id to know for which subject the marks are for.

Together, student_id + subject_id forms a Candidate Key for this table, which can be the Primary key, but the teacher's name only depends on subject, hence the subject_id, and has nothing to do with student_id.

This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on the whole key.

To remove partial dependency

There can be many different solutions for this, but out objective is to remove teacher's name from Score table.

The simplest solution is to remove columns teacher from Score table and add it to the Subject table. Hence, the Subject table will become:

Subject(Table):

subject_id

subject_name

Teacher

1

Java

Java Teacher

2

C++

C++ Teacher

3

Php

Php Teacher

And our Score table is now in the second normal form, with no partial dependency.

Score(Table):

score_id

student_id

subject_id

marks

1

10

1

70

2

10

2

75

3

11

1

80

Third Normal Form (3NF) A table is said to be in the Third Normal Form when,

  1. It is in the Second Normal form.

  2. And, it doesn't have Transitive Dependency.

So Lets use the same example where we have 3 tables, Student, Subject and Score.

In the Score table, we need to store some more information, which is the exam name and total marks, so let's add 2 more columns to the Score table.


score_id

student_id

subject_id

marks

exam_name

​total_marks

Transitive Dependency:

When a non-prime attribute depends on other non-prime attributes rather than depending upon the prime attributes or primary key is called Transitive Dependency.


Our new column exam_name depends on both student and subject. For example, a mechanical engineering student will have Workshop exam but a computer science student won't. And for some subjects you have Prctical exams and for some you don't. So we can say that exam_name is dependent on both student_id and subject_id.

And the column total_marks depends on exam_name as with exam type the total score changes. For example, practicals are of less marks while theory exams are of more marks.

But, exam_name is just another column in the score table. It is not a primary key or even a part of the primary key, and total_marks depends on it.


Score(Table):


score_id

student_id

subject_id

marks

exam_id

To Remove Transitive Dependency:

Take out the columns exam_name and total_marks from Score table and put them in an Exam table and use the exam_id wherever required.

Exam(Table):

exam_id

exam_name

total_marks

1

Workshop

200

2

Mains

70

3

Practicals

20

Advantage of removing Transitive Dependency

The advantage of removing transitive dependency is,

  • Amount of data duplication is reduced.

  • Data integrity achieved.

Boyce and Codd Normal Form (BCNF) Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.

For a table to be in BCNF, following conditions must be satisfied:

  • It must be in 3rd Normal Form

  • and, for each functional dependency ( X → Y ), X should be a super Key.

Time for an Example:

Below we have a college enrolment table with columns student_id, subject and professor.


student_id

subject

professor

101

Java

P.Java

101

C++

P.Cpp

102

Java

P.Java2

103

C#

P.Chash

104

Java

P.Java

In the table above:

  • One student can enrol for multiple subjects. For example, student with student_id 101, has opted for subjects - Java & C++

  • For each subject, a professor is assigned to the student.

  • And, there can be multiple professors teaching one subject like we have for Java.

  • student_id, subject together form the primary key, because using student_id and subject, we can find all the columns of the table.

  • One professor teaches only one subject, but one subject may have two different professors.

Hence, there is a dependency between subject and professor here, where subject depends on the

professor name.

This table satisfies the 1st Normal form because all the values are atomic, column names are unique and all the values stored in a particular column are of same domain.

This table also satisfies the 2nd Normal Form as their is no Partial Dependency.

And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.


In the table above, student_id, subject form primary key, which means subject column is a prime attribute.

But, there is one more dependency, professor → subject.

And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF. So we have to split the above table into two tables :

Student(Table):


student_id

p_id

101

1

101

2

102

4

103

3

104

1

Professor(Table):

p_id

professor

subject

1

P.Java

Java

2

P.Cpp

C++

3

P.Chash

C#

4

P.Java2

Java


Fourth Normal Form (4NF) A table is said to be in the Fourth Normal Form when,

  1. It is in the Boyce-Codd Normal Form.

  2. And, it doesn't have Multi-Valued Dependency.

Multi-Valued Dependency:

A table is said to have multi-valued dependency, if the following conditions are true,

  1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have multi-valued dependency.

  2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.

  3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.

If all these conditions are true for any relation(table), it is said to have multi-valued dependency. Time for an Example:


s_id

course

hobby

1

Science

Cricket

1

Maths

Hockey

2

C#

Cricket

2

Php

Hockey

As you can see in the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two hobbies, Cricket and Hockey.

The columns course and hobby are independent of each other.

So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as well.

To remove multi-valued dependency, We need to split the above table into 2 tables.


CourseOpted(Table):


s_id

course

1

Science

1

Maths

2

C#

2

Php

Hobbies(Table):


s_id

hobby

1

Cricket

1

Hockey

2

Cricket

2

Hockey

Now this relation satisfies the fourth normal form.

A table can also have functional dependency along with multi-valued dependency. In that case, the functionally dependent columns are moved in a separate table and the multi-valued dependent columns are moved to separate tables.


Fifth Normal Form(5NF):

A table is said to be in the FifthNormal Form when,

  1. It should be in Fourth Normal Form.

  2. And, it doesn't have Join Dependency.

  3. 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

  4. Its also called as PJNF (Project Join Normal Form).

Time for an Example:

subject

lecturer

semester

Computer

Valli

Semester 1

Computer

Nagesh

Semester 1

Math

Nagesh

Semester 1

Math

Aadya

Semester 2

Chemistry

Avyay

Semester 1

In the above table, Nagesh takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2.

In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL.

But all three columns together acts as a primary key, so we can't leave other two columns blank.


So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

Semester

Subject

Semester 1

Computer

Semester 1

Math

Semester 1

Chemistry

Semester 2

Math

P2:


Subject

Lecturer

Computer

Valli

Computer

Nagesh

Math

Nagesh

Math

Aadya

Chemistry

Avyay

P3:


Semester

Lecturer

Semester 1

Valli

Semester 1

Nagesh

Semester 1

Nagesh

Semester 2

Aadya

Semester 1

Avyay


38 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page