Normalization is a method of organizing data in database in order to reduce the data redundancy and improve data integrity. Redundancy is nothing but like a data duplication.
We need to remove this duplicate data for better performance.
As we all know, there is two types of duplicity.
Row level
Column Level
1. Row level Duplicity
This data in table does not require row level duplicity. To remove Row level Duplicity
we use concept of primary key. So that we can improve the Database Structure and also maintain the data consistency.
Let’s us consider one example the Row Level Duplicity for better understanding. Here there is one table in which there are three columns.
In this table 1st and 3rd row is exactly same i.e. Row level Duplicity.
Student_id | Student_name | age |
1 | SAM | 20 |
2 | NIA | 25 |
3 | SAM | 20 |
We need to remove this duplicate data for better performance.
This Row level Duplicity is not needed data in the table. To remove Row level Duplicity, We need to use concept of primary key. Here, in this table Student_id is Primary key which is unique and not null.
Student_id | Student_name | age |
1 | SAM | 20 |
2 | NIA | 25 |
This is the result set in which there is no repeated rows, as repeated rows are removed that means row level duplicity is removed with the help of creating a student_id as primary key. In the result set two row shouldn’t be repeated.
2. Column level duplicity
In Column Level Duplicity, some columns get repeated. It could affect the performance of the table.
Let’s us consider one example the Column Level Duplicity for better understanding. Here there is one table in which there are seven columns, In which There is column Level Duplicity. Repeated data is Course Id- C1, Course Name-DBMS And Faculty IDd– F1
Student id | Student name | Course id | Course name | Faculty id | Faculty name | Salary |
1 | Sam | C1 | Dbms | F1 | Bob | 30000 |
2 | Nia | C2 | Java | F2 | Cloe | 40000 |
3 | Ricky | C1 | Dbms | F1 | Bob | 30000 |
4 | john | C1 | dbms | F1 | vish | 30000 |
While working on above like tables, we need to face some challenges while -
· Inserting
· Deleting
· Updating
Let us think from real-time perspective, Suppose new student has taken admission in university, it get easily added in above table. But university start new course like L5 and Course name is MBBS. So while updating this information we can only update two columns from above table i.e. course name and Course id. There is no such information to update the other columns like Student id, Student name, Faculty id, Faculty name, Salary.
We couldn’t update this type of incomplete information in existing table or database. We need to put many null values and also every time machine needs to excess redundant data also. But still this information needs to be store somewhere. This is just basic example, but in real-time scenarios there is huge data. That’s why we need to make multiple tables from this one table.
Here in this example, only new course can introduce in database, students will later register in the course. We need to update student id because its primary key.
In a same manner, there are Challenges with deletion and updating. We have a simple solution for this problem we divide this table in multiple table.
Student id | Student name |
1 | Sam |
2 | Nia |
3 | Ricky |
4 | John |
Course id | Course name |
C1 | Dbms |
C2 | Java |
C1 | Dbms |
C1 | dbms |
Faculty id | Faculty name | Salary |
F1 | Bob | 30000 |
F2 | Cloe | 40000 |
F1 | Bob | 30000 |
F1 | vish | 30000 |
Now you can add, delete, update any record in respected table. You can make changes accordingly; you can join two tables when it needs to be. This will increase the performance of the table/data also. It can improve database structure, you can efficiently access the data. Its very easy and quick to find information. You can avoid the redundant data and data consistency can also be maintained.
First Normal Form
According to First normal form, It should satisfied only when table doesn’t contain any multivalued attribute. That means, in one table there shouldn’t be a single attribute in which more than one value exist for single particular record.
Consider one basic example over here. Suppose we have one table which contains the columns, Roll no, name and course.
Roll no | name | Course |
1 | Seel | c/c++ |
2 | harry | java |
3 | ory | Dbms |
course column has more than one value, so this table does not satisfied 1st normal form
We need to decompose this table in two tables, in which table doesn’t contain multiple value.
Base table :
In above base table column Roll no. is Primary key.
In above table column Roll No works as a Foreign Key reference from base table.
So, it will easy to insert or update the information related to this tables. E.g.if one student want to enroll in no. of courses. We can insert this information easily in respected tables, so that it will easy to excess the information as per needed.
Second Normal Form
In second normal form the table or data must be in 1NF. That means table should not contain any multiple/repeated value attribute and second is all non-prime attribute should not contain partial dependency. That means it should be fully dependent on candidate key.
CUSTOMER ID | STORE ID | LOCATION |
1 | 1 | DELHI |
1 | 3 | MUMBAI |
2 | 1 | DELHI |
3 | 2 | BANGLORE |
4 | 3 | MUMBAI |
For example, consider one table shown as above. It contains three columns named as Customer ID, Store ID, Location. In which Customer id and Store id are candidate key or prime attribute. And Location is non-prime Attribute and Location that doesn’t depend on customer id so it shows partial dependency.
According to the Second Normal Form, all non-prime attribute should be dependent on candidate key. In above table, location depends on store id only but not depends on customer id. So, we need to decompose this table into two tables.
One table shown as above in which two columns included i.e. Store id and Customer ID. AS Store id Fully dependent on Customer id. Here Customer id is primary key and Store id is foreign key.
And one table shown as below which contains two columns i.e. Store ID and Location.
Location column is fully dependent on Store id column.
Third Normal Form
In Third Normal Form first rule is table must be in 2nd Normal Form and second rule is there should be no transitive Dependency in a table.
Let’s shed some light on Transitive Dependency, If table contains any non-prime attribute which are not involved for making of Candidate key. Here non-prime attribute depends on non-prime attribute. That means all non-prime attribute are fully dependent on each other.
ROLL NO | STATE | CITY |
1 | MAHARASHTRA | MUMBAI |
2 | BIHAR | PATNA |
3 | MAHARASHTRA | MUMBAI |
4 | BIHAR | PATNA |
5 | GOA | PANJI |
As per above example, column Roll no determines the column state which is non-prime attribute And column State determines the column city. This is the case of transitive dependency in which roll no determines city through state which is non-prime attribute.
Roll no is indirectly dependent on city is called Transitive dependency.
BCNF (Boyce Codd Normal Form)
Boyce Codd Normal Form is an advance version of third Normal Form in which it has more restrictions. As we know, in Third Normal Form there should not be a transitive dependency.
While working towards the BCNF we have to face more restriction. In BCNF if every Functional dependency A -> B then A has to be a super key.
This means that every right-hand side attribute of the functional dependency should depend on super key of particular table Super key is key that uniquely identifies rows in a table
Super key is key that uniquely identifies rows in a table.
Let’s see one example, consider above table in which three columns are there i.e. Student id, Subject and Professor.
One student can enroll for multiple subjects. For each subject a professor is assigned to the student, and it can be multiple professors who are teaching one subject. In the above table, column Student id and column subject both make a primary key. Subject is a prime attribute; professor is non-prime attribute which is not allowed by BCNF. So to resolve this issue,here we need to decompose this table into two table student table and professor table
Student id | Professor Id |
101 | 1 |
101 | 2 |
102 | 3 |
103 | 4 |
104 | 1 |
Professor | Subject |
P john | Java |
P sam | C++ |
P rockey | Java |
P sam | C# |
Advantage of Normalization
1. Better Database Organization
2. More table with smaller rows
3. Efficient data access
4. Quickly find information
5. Easier to implement security
6. More compact database
7. Allows easy modification
8. Reduction of redundant and duplicate data
9. Ensure consistent data after modification
In conclusion we understood different form of Normalization which helps us to reduce data redundancy and improve data integrity
Comments