### Further Normalization

The first three normal forms were included in Codd's original formulation of relational theory, and in the vast majority of cases they're all you'll need to worry about. Just remember the jingle I learned in grad school: "The key, the whole key, and nothing but the key, so help me Codd."

The further normal formsBoyce/Codd, fourth, and fifthhave been developed to handle special cases, most of which are rare.

#### Boyce/Codd Normal Form

Boyce/Codd normal form, which is considered a variation of third normal form, handles the special case of relations with multiple candidate keys. In fact, for Boyce/Codd normal form to apply, the following conditions must hold true:

• The relation must have two or more candidate keys.

• At least two of the candidate keys must be composite.

• The candidate keys must have overlapping attributes.

The easiest way to understand Boyce/Codd normal form is to use functional dependencies. Boyce/Codd normal form states, essentially, that there must be no functional dependencies between candidate keys. Take, for example, the relation shown in Figure 2-20. The relation is in third normal form (assuming supplier names are unique), but it still contains significant redundancy.

##### Figure 2-20. This Relation Is in Third Normal Form but Not in Boyce/Codd Normal Form

The two candidate keys in this case are {SupplierID, ProductID} and {SupplierName, ProductID}, and the functional dependency diagram is shown in Figure 2-21.

##### Figure 2-21. This Is the Functional Dependency Diagram of the Relation in Figure 2-20

As you can see, there is a functional dependency {SupplierID} { SupplierName}, which is in violation of Boyce/Codd normal form. A correct model is shown in Figure 2-22.

#### Fourth Normal Form

Fourth normal form provides a theoretical foundation for a principle that is intuitively obvious: Independent repeating groups should not be combined in a single relation. By way of example, let's assume that the own-brand products sold by Northwind Traders come in multiple package sizes, that they are sourced from multiple suppliers, and that all suppliers provide all pack sizes. A completely unnormalized version of the Products relation might look like Figure 2-23.

##### Figure 2-23. This Is the Unnormalized Relation

Now, the first step in normalizing this relation is to eliminate the non-scalar PackSize attribute, resulting in the relation shown in Figure 2-24.

##### Figure 2-24. This Version of the Relation Shown in Figure 2-23 Is in Boyce/Codd Normal Form

Surprisingly, Figure 2-24 is in Boyce/Codd normal form, since it is "all key." But there are clearly redundancy problems, and maintaining data integrity could be a nightmare. The resolution to these problems lies in the concept of multi-valued dependency pairs and fourth normal form.

A multi-valued dependency pair is two mutually-independent sets of attributes. In Figure 2-23, the multi-valued dependency is {Product-Name}{PackSize}| {SupplierName}, which is read "Product multi-determines PackSize and Supplier." Fourth normal form states, informally, that multi-valued dependencies must be divided into separate relations, as shown in Figure 2-25. Formally, a relation is in fourth normal form if it is in Boyce/Codd normal form, and in addition, all the multi-valued dependencies are also functional dependencies out of the candidate keys.

#### Fifth Normal Form

Fifth normal form addresses the extremely rare case of join dependencies. A join dependency expresses the cyclical constraint "if Entity1 is linked to Entity2, and Entity2 is linked to Entity3, and Entity3 is linked back to Entity1, then all three entities must necessarily coexist in the same tuple."

To translate this into something resembling English, it would mean that if {Supplier} supplies {Product}, and {Customer} ordered {Product}, and {Supplier} supplied something to {Customer}, then {Supplier} supplied {Product} to {Customer}. Now, in the real world this is not a valid deduction. {Supplier} could have supplied anything to {Customer}, not necessarily {Product}. A join dependency exists only if there is an additional constraint that states that the deduction is valid.

It is not sufficient, in this situation, to use a single relation with the attributes {Supplier, Product, Customer} because of the resulting update problems. Given the relationship shown in Figure 2-26, for example, inserting the tuple {"Ma Maison", "Aniseed Syrup", "Berglunds snabbkop"} requires the insertion of a second tuple, {"Exotic Liquids", "Aniseed Syrup", "Berglunds snabbkop"}, since a new link, "Aniseed Syrup" to "Berglunds snabbkop", has been added to the model.

##### Figure 2-26. This Relation Is Not in Fifth Normal Form

Decomposing the relation into three distinct relations (SupplierProduct, ProductCustomer, and SupplierCustomer) eliminates this problem but causes problems of its own; in re-creating the original relation, all three relations must be joined. Interim joins of only two relations will result in invalid information.

From a system designer's point of view, this is a terrifying situation, since there's no intrinsic method for enforcing the three-table join except through security restrictions. Further, if a user should create an interim result set, the results will seem perfectly reasonable, and it's unlikely that the user will be able to detect the error by inspection.

Fortunately, this cyclical join dependency is so rare that the complications can be safely ignored in most situations. Where they can't, your only resource is to build the database system in such a way as to ensure the integrity of the multi-relation join.