Reviewing Entities

Now that you've begun to get a picture of the entities in the system and how they're related, it's time to start analyzing each entity in detail. For each entity, you'll want to identify the following information:

The relationship between the entity and the problem space
The work processes that create, modify, use, and delete the entity
Any other entities it might interact with or depend on
The business rules and constraints pertaining to the entity
The attributes of the entity

The Relationship between the Entity and the Problem Space

Identifying the relationship between the entity and the problem space is usually straightforward. "The Customer entity models the individuals and organizations who buy our products." The biggest problem here, I've found, is thinking up a sentence that isn't hopelessly tautological. "The Employees entity models the organization's employees" seems hardly worth saying.

If a relationship is modeled as an entity, things can be trickier since the entity won't map directly to the problem space. "A supplier can provide multiple products, and any given product might be provided by any number of suppliers. The Product Supplier entity models this relationship, as well as the Preferred Supplier status of any given Supplier for a specific Product."

Some things in the problem spacea Sales Order is probably the best exampleare modeled using one or more logical entities in the data model. I refer to these as composite entities. The sales order document is represented by both the Sales Order and Order Detail entities.

I have generally found it cleaner to handle composite entities as a single object for documentation purposes. For example, "The Sales Order and Order Detail entities represent a single order placed by a Customer. The Sales Order entity models the order itself, while the Order Detail items represent each product being ordered."

Work Processes that Affect the Entity

Although you might have already identified where data items are used during the work process analysis you performed earlier, it's useful to include that information in the entity documentation as well. That way, if it's necessary at some point to make a change to the structure of the entity, such as adding an attribute, there's a single place to identify all the processes that can be affected.

Identifying the processes that operate directly on an entity is also usually a straightforward process. Identifying those that indirectly interact with the entity might require more work. It might not be immediately obvious, for example, that the order entry process can modify a customer's default shipping method, or that a "Special Bonus" identified for a product category can affect the discount and thus the total value of a sales order. And yet these are exactly the kinds of interactions that are a maintenance programmer's nightmare if they're not carefully documented.

Most analysts document these interactions in the work processes analysis, which is obviously useful if your changes are to the processes themselves. Sometimes, however, the changes are to the model itself, either directly because of a change in the business environment or indirectly because a change to an existing process requires alterations in the model. In this case, it's far easier to scan the entity documentation for the specific entity you're changing than to sift through all the work processes to determine which might be affected by the change.

Think of including work process information in your entity documentation as cross-referencing, which it is. Like all cross-referencing, it can be tedious to implement and maintain, but it will make your life a lot easier in the long run.

Interactions between Entities

E/R diagrams are wonderful tools, but they're capable of showing only so much information. If the entities in your system have complex interactions that can't easily be represented on the diagram, it's important to document them in the entity descriptions. Even if you've added annotations to the diagram, you should expand on any interactions that aren't immediately apparent in the notation.

If the model is complex enough to warrant multiple entity diagrams and a given entity appears in multiple diagrams, it might be useful to list all of the entities to which it is related in the entity description. This tends to be the case with entities that provide look-up values in multiple places. A Courtesy Title entity, for example, that contains entries such as "Mr.", "Mrs.", "Dr.", and "Ms." might be referenced in a dozen places. If you need to make changes to the entity, it's useful to be able to find all the referencing entities in a single spot.

As a general rule, however, the E/R diagram provides adequate documentation for the interaction between entities. Only exceptional cases like those cited previously warrant additional information.

Business Rules and Constraints

The next bit of documentation required for an entity is to note any entity-level constraints that pertain to it. Any constraints that reference multiple attributes, such as the "Shipping Method and Special Instructions attributes cannot both be empty" rule in our example, should also be documented, and this is the time to do so.

Attributes

The final bit of documentation required for the entities is a list of attributes and their domains. In compiling the list, you'll want to start with the list of attributes you identified when going through the source documents, and then be sure to add any foreign keys that are required for referential integrity.

You'll also want to check that each entity has at least one candidate key that can be used to uniquely identify each instance. This will become the primary key for the table in the database schema. Remember that primary keys cannot contain Null values. Because of this, it might not always be possible to use an existing attribute or combination of attributes as a key. If this is the case, you'll need to add an arbitrary, system-generated identifier.

In our example, the Customer entity probably needs an artificial identifier like this. If we assume that a customer might be either a company or an individual, you might have an initial attribute list like that shown in Figure 12-6.

Figure 12-6. The Customer Entity Attribute List

Even if we set aside for the moment the issue that names are not unique, we still have a problem. If the customer is an individual, the Company Name will be null. If the customer is a company, the Individual Name will be null. Something is always going to be null. So these fields cannot be used as the candidate key even if we could assume that they uniquely identify the record, which we can't.

This leads me to the next problem with the Customer entity. Names are not unique. In our example, the entire list of attributes doesn't even guarantee uniqueness, since it's possible for two individuals with the same name to live at the same address. John Smith is not guaranteed to tell you that he's actually John Smith Jr. and not to be confused with his papa, who lives in the same house.

There is certainly no question that John Smith Sr. and John Smith Jr. are different people and should be represented by different records. However, the attributes that uniquely identify them are none of our business. Can you imagine being asked about your living arrangements when you place an order for groceries? "Pardon me, sir, but do you happen to have a relative of the same name residing with you? Just checking for our computer system." Not in the best tradition of customer service, really.

Fortunately, there's a simple answer: the Customer Number. And if the organization doesn't already have a method of assigning these numbers, both Microsoft Jet and Microsoft SQL Server provide a mechanism for assigning them automatically (the AutoNumber and Identity data types, respectively).

If you do use an arbitrary identifier, though, be sure to provide an alternate form of identification. You don't want to put a user in the situation of having to refuse an order because the customer has forgotten his or her customer number. Asking a customer, "Are you the John Smith who lives in Oakridge or the one who lives in Cincinnati?" is one thing, asking him to call back after he's found an old invoice is quite another.

The Customer entity is also an example of the second reason to use an arbitrary system identifier. Even if we could assume that the combination of name and address were sufficiently unique for our purposes, that still leaves way too many fields to copy everywhere. Remember that the primary key of an entity will be used as a foreign key in any entities that reference it. It's obviously much more efficient to duplicate a single attribute than five or six.