Hi Folks,
We'll discuss normalization today. Previous article on relational model can be found at Relational Model.
Normalization
Normalization is a refinement process where it helps in removing anomalies in insert, update and delete operations. It is also called bottom up approach, because this technique requires full knowledge of every participating attribute and its dependencies on key attributes. If you try to add new attributes after normalization is done, it may change the normal form of database design itself. It is mainly used for two purposes:
- Eliminate redundant data
- Ensure data is logically stored.
Normalization rules are divided into following normal forms:
1. First Normal Form (1NF)
A relation is in 1NF if all of its attributes have domains that are atomic or indivisible. It leads to data redundancy. A table in 1NF should satisfy following five conditions:
- There is no top to bottom ordering of the rows
- There is no left to right ordering to the columns.
- There are no duplicate rows.
- Every row and column intersection contains exactly one value from application domain.
- All columns are regular (no hidden components)
Consider the table below which is not in 1NF:
Now, for it to be in 1NF, any row must not have a column with more than one values in it. Such data should be separated into multiple rows. So in 1NF, it will look like:
2. Second Normal Form (2NF)
A relation is in 2NF when it is in 1NF and there is no such non key attribute that depends on part of the candidate key (no partial dependency), but on the entire candidate key i.e a relation that has a single attribute as its candidate key is always in 2NF.
Look at the table in 1NF above, there are 2 rows for Adam to show different subjects he has opted for. This is inefficient use of space. Also the candidate key is {student, subject} but age depends only on student which doesn't follow 2NF rules. Therefore to make it in 2NF, its better to split the subjects into a different table and match it up using student's name as foreign key. Now, the new student table will look like:
Candidate key is student column on which age depends.
And new subject table looks like:
Candidate key here is {student, subject}. This way the tables above would never suffer from update anomalies.
3. Third Normal Form (3NF)
A relation is in 3NF if it is in 2NF and every attribute depends directly on primary key and not through a transitive relation when an attribute A may depend on a non key attribute B and B in turn depends on primary key C. Non key attributes are mutually independent.Through this, data duplication is removed and data integrity is achieved. Lets learn it from the example below:
Consider the table Student_Details below:
student_id is primary key here but street, city, state depends on zip. This dependency between zip and other fields is called transitive dependency. To make it in 3NF, we need to move street, city and state into a new table with zip as primary key. Therefore, the new table will look like:
Zip table
4. Boyce and Codd Normal Form (BCNF)
This is a higher version of 3NF. A 3NF table which doesn't have multiple overlapping candidate keys is said to be in BCNF. Following two conditions must be satisfied for BCNF:
- Relation (R) must be in 3NF
- For each functional dependency (A->B), A should be the super key for R.
Consider the database, Classes (course, teacher, book) where (c,t,b) Î Classes, means t is qualified to teach c and b is required textbook for that.
Logically, for any course C, database should list all teachers anyone of whom can be the course instructor and list of books to be followed for teaching.
But if Tom is new teacher who teaches database, then we might face an insertion anomaly i.e we will need to insert two tuples (database, Tom, DB concepts) and (database, Tom, Ullman). Therefore, its better to break classes into 2 tables:
Quote of the day
You have to expect things of yourself before you can do them. - Michael Jordan
Deepak A