What is Normalization in Database?
Most of the projects and websites contain lot of information. They store the data in tables. Since lot of data therefire database contains so many tables. In order to retrieve some information we must go through number of tables which leads to complex quires.
We may think that the solution to avoid these complex quires is instead of number of tables store all information in one big table and retrieve required information from it. It works fine but in this case table will contain redundant information. And also arises other anomalies. They cause problems when inserting new data, updating and deleting. i.e if we try to delete some field in database due to mixing all data along with this data, some other data may lost. Same problem when updating, other fields also may update since all are in same table. And inserting data may give, so much of redundant data.
To avoid these problems of inserting, deleting and updating data, divide the tables as small as possible. The ideal size of the table is two attributes. To make this way we should follow some rules for dividing tables. First test each table whether it’s design is good or not. If not good go for division (split the table). This process of splitting tables is called Normalization.
So, Normalization means split the tables into small tables which will contain less number of attributes in such a way that table design must not contain any problem of inserting, deleting, updating anomalies and guarantees no redundancy.
To do normalization we use concept of “Functional dependency” and “Candidate keys”. Using these concepts we check whether table design is good or not, if not good we go from Normalization (splitting the table).
Prerequisite
For better understanding of this concept you should know about:
- What is functional dependency
- How to find candidate keys
- How to find closure
- Basic knowledge about all keys
Without these also you can understand what is 1NF, 2NF, 3NF, BCNF. But to work with more complex problems you should know those concepts.
Our final aim is that, after normalization, whatever the functional dependency (FD) applicable on table (let FD is x -> y), the left hand side is always must be a key (here x). This final form is called BCNF
BCNF guarantees zero (0) % redundancy.
To go to BCNF first we have to check whether the table is in 1NF, 2NF, 3NF and then BCNF.
Note: Any Normal form aim is that to reduce redundancy and to avoid anomalies.
Normalization in DBMS
First Normal Form (1NF)
First Normal Form says that table is flat i.e in table there must no multi-valued and no composite attributes.
Example:
ID | Name | Course |
1 | Neeraj | C1 |
2 | Pankaj | C1, C2 |
3 | Pawan | C3 |
In above case Pankaj has two courses C1 and C2, so Course is multi valued. Hence it is not in 1NF.
Below table in is in 1NF.
ID | Name | Course |
1 | Neeraj | C1 |
2 | Pankaj | C1 |
2 | Pankaj | C2 |
3 | Pawan | C3 |
We need not worry about this, because while converting ER model (diagram) to relational model (table), we follow rules that they guarantees all attributes free from multi-valued and composite.
So any relational table by default is in 1NF.
Second Normal Form (2NF)
Second Normal Form says that, if candidate key containing more than one attribute then any part of that key (called partial key) should not determine anything.
Example:
RollNum | StuName | CorNum | CorName | Result |
1 | Neeraj | 3 | DBMS | A |
2 | Venkat | 2 | OS | C |
3 | Phani | 1 | FLAT | B |
Functional dependencies are:
- The attribute student name functionally depends on roll number. So RollNum -> StuName
- The attribute course name functionally depends on course number. So CorNum –> CorName
- The attribute Result depends on Roll number and Course number. So RollNum, CorNum -> Result
We can find that RollNum, CorNum combine form as candidate key.
Here we can see that a part of candidate keys are deriving other things which we called partial dependency. FD1 and FD2 are partial dependencies.
So this table is not in 2NF. To convert into 2NF find the closure where problem occurred and split the table with that result.
RollNum+ = { StuName } ;;; CorNum+ = { CorName } ;;;
These two should be separate tables and generally with candidate key other table formed.
Resultant tables which are in 2NF:
Table 1:
RollNum | StuName |
1 | Neeraj |
2 | Venkat |
3 | Phani |
Table 2:
CorNum | CorName |
3 | DBMS |
2 | OS |
1 | FLAT |
Table 3:
RoNum | CorNum | Result |
1 | 3 | A |
2 | 2 | C |
3 | 1 | B |
For table 1 candidate key is RollNum, for table 2 candidate key is CorNum since these tables has a single prime attribute we can say these two are in 2NF. Coming to the third table candidate key is RollNum and CotNum combine. But on this table there is only one functional dependency is existing. So this is also in 2NF.
i.e. 2NF is based on Full Functional Dependency. No partial keys are allowed. So in 2NF we checked for partial dependency and eliminated.
Important Note: Whenever you find a part of key on left hand side of FD, don’t confirm that it is partial dependency. Check right side also if right hand side is non-prime attribute then only it is partial dependency. If right hand side also prime attribute it is not a partial dependency.
Third Normal Form (3NF)
Third normal form says that there is no “Transitive Dependency”.
We know the rule of transitivity that, If A -> B and B -> C then A -> C. We can find the transitive dependency FD’s in such a way that, “Non-prime attribute derive something”. If any FD is like this we can say that it has Transitive dependency and we need to eliminate it to make it into 3NF.
We can check 3NF in other way also, formal definition of 3NF is:
Definition: A relational schema (table) is in 3NF if and only if every non trivial FD X -> Y
Either X is a super key or Y is a prime attribute (it is part of some candidate key). If this definition follows there is no chance of transitive dependency.
Example:
Student Table
StuID | StuName | RollNum | ClassCode | ClassName |
1 | Mishra | 12 | CS1 | Lect.Hall |
2 | Amit | 14 | CS2 | Lab |
3 | Jack | 16 | CS3 | Theorey |
Functional dependencies:
- StuID -> StuName, StuName, RollNum, ClassCode
- ClassCode -> ClassName
Here StudID is candidate key which can able to derive everything. So one and only prime attribute is StuID only. But we can see that 2nd FD i.e ClassCode -> ClassName in this ClassCode is a non-prime attribute which is deriving something. So this is not in 3NF.
To convert into 3NF find the closure where problem occurred to split the table.
ClassCode+ = { ClassName };
Resultant tables after splitting are
Student Table
StuID | StuName | RollNum | ClassCode |
1 | Mishra | 12 | CS1 |
2 | Amit | 14 | CS2 |
3 | Jack | 16 | CS3 |
In this table StuID is candidate key and only one Functional dependency existing which is StuID -> StuName, RollNum, ClassCode. So there is no problem this is in 3NF.
Class Table
ClassCode | ClassName |
CS1 | Lect.Hall |
CS2 | Lab |
CS3 | Theorey |
In this table ClassCode is candidate key and only one functional dependency existing which is ClassCode -> ClassName. So this table is also in 3NF
Boyce Codd Normal Form (BCNF)
To make sure zero % redundancy two scientists Boyce and Codd invented this BCNF. In BCNF result each FD determinants (left hand side attribute) must be a key.
Definition: A relational schema R is in BCNF if whenever a non-trivial FD X -> Y , X should be a super key.
Example:
IpAdd | PortNum | ProcessReq |
10.4.9.34 | 80 | Register Application form |
10.11.4.99 | 110 | Gmail message request |
10.1.11.111 | 25 | Remote User request |
Functional dependencies exist on this table are:
- IpAdd, PortNum -> ProcessReq
- ProcReq -> PortNum
Applying normalization means converting into BCNF. For that we first check 1NF, 2NF, 3NF.
By default every relational schema is in 1NF.
Before proceeding to next normal forms, we should find candidate keys. If we find candidate keys we get { IpAdd, PortNum } and { IpAdd, ProcessReq } are candidate keys. So prime attributes (part of candidate keys) are IpAdd, PortNum, ProcessReq. As per formal definition of 3NF, if right hand side has prime attribute, it is enough to say that it is in 3NF. Since all attributes are prime attributes we can say that table is in 3NF also. If already in 3NF, no need to check 2NF. So up to 1NF, 2NF, 3NF all are fine.
Now check for BCNF. According to the definition of BCNF left hand side should be key. So FD IpAdd, PortNum -> PorcessReq . Therefore AB is a key there is no problem.
Other FD PorcessReq -> PortNum, here this FD not deriving all attributes, since it’s not deriving everything ProcessReq is not a key. We can say that it is not in BCNF. To make it into BCNF,
ProcessReq+ = { ProcessReq, PortNum } is a separate table.
PortNum | ProcessReq |
80 | Register Application form |
110 | Gmail message request |
25 | Remote User request |
And { IpAdd, ProcReq} is other table.
IpAdd | ProcessReq |
10.4.9.34 | Register Application form |
10.11.4.99 | Gmail message request |
10.1.11.111 | Remote User request |
On table PortNum, ProcessReq, Functional Dependency is ProcReq -> PortNum, here ProcessReq is key, so satisfies BCNF. And on table IpAdd, ProcessReq, { IpAdd, ProcessReq } itself a key, so it also is in BCNF. But here we lost FD, { IpAddr, PortNum } -> ProcReq so called this is a not functional dependency preserving result even it is in BCNF.
Finally this BCNF guarantees that there is no redundancy and no problem of anomalies of inserting, updating and deleting.
Comment below if you have queries or found any information incorrect in above tutorial for normalization in dbms.
No comments:
Post a Comment