Chapter 4. Data Integrity

Creating a model of the entities in the problem space and the relationships between them is only part of the data modeling process. You must also capture the rules that the database system will use to ensure that the actual physical data stored in it is, if not correct, at least plausible. In other words, you must model the data integrity.

It's important to understand that the chances of being able to guarantee the literal correctness of the data are diminishingly small. Take, for example, an order record showing that Mary Smith purchased 17 hacksaws on July 15, 1999. The database system can ensure that Mary Smith is a customer known to the system, that the company does indeed sell hacksaws, and that it was taking orders on July 15, 1999. It can even check that Mary Smith has sufficient credit to pay for the 17 hacksaws. What it can't do is verify that Ms. Smith actually ordered 17 hacksaws and not 7 or 1, or 17 screwdrivers instead. The best the system might do is notice that 17 is rather a lot of hacksaws for an individual to purchase and notify the person entering the order to that effect. Even having the system do this much is likely to be expensive to implement, probably more expensive than its value warrants.

My point is that the system can never verify that Mary Smith did place the order as it's recorded; it can verify only that she could have done so. Of course, that's all any record-keeping system can do, and a well-designed database system can certainly do a better job than the average manual system, if for no other reason than its consistency in applying the rules. But no database system, and no database system designer, can guarantee that the data in the database is true, only that it could be true. It does this by ensuring that the data complies with the integrity constraints that have been defined for it.