The other goal of the design phase is to ensure that the database is flexiblethat it can answer every question that might reasonably be asked of it. Flexibility is primarily determined by its completeness (obviously, no database system can provide data it doesn't contain) and only secondarily by its structure. But the ease with which questions can be answered is almost exclusively the result of the structure. The principle here is that it is easy to combine attributes and relations, but very difficult to take them apart.
For example, given the two relations shown in Figure 2-5, the FullName can be easily derived from the top relation with the statement:
TitleOfCourtesy & " " & GivenName & " " & Surname & _ ", " & Title
Figure 2-5. Concatenating Information Is Easy, but Extracting It from Composite Fields Is Hard
But retrieving only the LastName from the FullName field shown in the bottom relation would require manipulating the string itself:
Function GetLastname(FullName) As String Dim lastname As String 'strip off the Title lastname = Left(FullName, InStr(FullName, ",") - 1) 'strip off the TitleOfCourtesy lastname = Right(lastname, Len(lastname) _ _ InStr(lastname, " ")) 'strip off FirstName lastname = Right(lastname , Len(lastname ) - _ InStr(lastname , " ")) GetLastname = lastname End Function
This technique is also vulnerable to variations in the contents of the FullName field. For example, the name "Billy Rae Jones" is going to return "Rae Jones", when what you probably wanted was "Jones". Producing a list in the format LastName, FirstName could get very ugly.
The second principle involved in creating a data model that can effectively answer the questions asked of it is to avoid situations where answering the question requires evaluating the same information from multiple fields. Take the relations shown in Figure 2-6, for example, both of which model student enrollments.
Figure 2-6. These Two Recordsets Contain the Same Data
To answer the question "Which students are studying Biology this year?" using the first relation, you would have to search for the value "Biology" in three fields. The SQL SELECT statement would look like this:
SELECT StudentID FROM Enrollments WHERE Period1 = "Biology" OR Period2 = "Biology" OR Period3 = "Biology"
Using the second structure requires searching only a single field, Class:
SELECT StudentID FROM Enrollments WHERE Class = "Biology";
They both work, but the second one is obviously easier and less error-prone to code, not to mention easier to think about.
Avoiding redundancy and making it easy to retrieve the data are all you need to know about data modeling; the rest is just an attempt to formalize these two basic principles. But if you've ever done much (or any) data modeling, you'll know that, simple as they might be, these principles can be very slippery to apply. They're like a paper clip: The answer is perfectly obvious once you've seen it, but a little hard to come up with the first time you're faced with a bunch of loose paper and a bit of wire.