The Structure of a Fact Table

Figure 7-1 shows a fairly simple example of a fact table associated with a customer order. The basic structure is the same as a table developed using the relational modelthe relation consists of a set. The attributes in a fact table can be divided into two types: dimension keys, which link the fact table to the dimension tables we'll discuss in the next chapter; and facts, which are the actual values to be measured.

Figure 7-1. A Simple Fact Table

Like dimensional sets, the set of attributes in a fact table can, in theory, be empty. In practice, however, it will almost always contain at least two attributes: either the combination of a dimension key and a fact, or, in some rare cases, two dimension keys.

Dimension keys are essentially the same as foreign keys in the relational model in that they are used to relate the fact table to dimensional tables, but their primary purpose is somewhat different. As we saw in Part I, one of the primary goals of relational design is to eliminate redundant data and the update anomalies that it causes.

But this requirement simply doesn't apply to dimensional databases because they are rarely updated. Dimensional tables do reduce redundancy in the fact table, but that's a side effect; their primary purpose is to provide as many descriptive attributes as possible to the user, who will use them as selection criteria when analyzing the data.

This "irrelevant redundancy" principle applies to fact attributes as well. Note that the OrderNumber attribute is a fact, not a dimension key. In a normalized database, the attributes shown in this single fact table would be split into two separate tables, one representing the order itself, and a separate table representing each line item. In a typical implementation, there would be a single row in the order table for each order, and a row in the line item table for each product that was part of the order. The CustomerID, EmployeeID and OrderDate attributes would be part of the order table, and would thus only be recorded once for each order. In the example fact table, each of these will be repeated for each product ordered as part of the order. (Note, by the way, that the OrderDate of the normalized database has become a dimensional key in the fact table. We'll examine this issue in more detail in the next chapter.)

It is typical of fact tables to collapse normalized tables in this way, and doesn't represent a design flaw. Remember that the tables in a dimensional database are rarely updated, so the issue of changing a particular data item in multiple places simply doesn't apply. There is some storage overheadin this example, the size of a date field times the average number of products per orderbut that's far less important than the cost of doing an extra table join if the user wants to specify a date or date range as a reporting criterion.