Preface

Relational databases are tricky beasts. Other kinds of commercial software are infinitely easier to understand. Word processors are really just high-tech typewriters, and it's pretty clear that the backspace key beats that little jar of white stuff cold. Spreadsheets present a familiar enough paradigm, even to non-accountants, and email is close enough to the postal system for the model to be comprehensible.

Databases are different. Other kinds of software have a real-world analogy. Sometimes, as in the Windows desktop, the analogy is a little tenuous, but the analogies are close enough; you can get there from here. But relational databases are completely artificial. They're like geometry: They can be used to build models of the real world, but they don't exist in the real world. When was the last time you poured some wine for you and your sweetie and went out on the front porch to watch the geometry frolic on the lake?

Now, I'm talking about databases here, not tables. Tables exist aplenty, from the telephone book to the dictionary. But relational databases? Nope. Unh-unh. You're not going to find them frolicking on the lake, either. The card files at the library, which contain author, title, and subject files, come close to being a database but they're still separate sets of data that are only correlated by the good graces of the local librarian.

This book is about designing database systems. My intention is to give you the knowledge you need to take a messy, complex, real-world situation and turn it into an effective database design. I assume that you have some development experience and generally know your way around a computer, but I don't assume that you have any background in databases.

After reading the book you still won't be able to watch the databases frolic on the lake, but if I've done my job well you'll be able to design and implement a relational model of the fish, the seagulls, and the effects of the plankton on them both.

The book is divided into four parts. Part I, "Relational Database Theory," covers the fundamental principles of the relational model. This is where the really ugly, theoretical stuff is. But don't worry; it will get easier. Part II, "Dimensional Database Theory," covers the same information for dimensional databases, a special type of relational database used for analysis. Part III, "Designing Database Systems," examines the analysis and design processwhat you should do to get from the real world to a reliable database system design. Finally, Part IV, "Designing the User Interface," discusses the most important aspect of a database system from a user's point of view: the user interface.

Although we'll talk about implementation issues in the next few hundred pages, this isn't a "how to program" book. There are a few coding examples, but I've kept them to a minimum, and you should be able to follow them even if you've never seen a programming language before. The database examples are based on the Northwind sample database that comes with Microsoft Access. (The version of Northwind that comes with SQL Server is very similar.) By the time you're finished reading this book, you'll have picked up most of what you need to get started building database systems, and you'll be ready to turn to one of the sources listed in the Bibliography for the finer points of programming style. And you'll be confident that your data architecture is sound and unlikely to get you into trouble later in your project.

A note on English usage: As you'll discover as you read this book, I'm a stickler for terminology. But that said, I don't think syntax ought to draw attention to itself. If an author writes "he or she" or (heavens forefend) "s/he," I'm busy thinking about gender politics and no longer paying attention to the text. If I read "the data are," I'm just as likely to be thinking about the nature of the English language as whatever the author is trying to say.

Now, the pronoun issue is fairly simple to work around. You'll find a great many repetitions of "the user" in this text. But the adoption of Latin terms into English is a more complex issue, particularly in a book about data.

For the record, I had a classical education, and I'm perfectly aware that in Latin, "data" is a plural noun, and ought to take a plural verb. I'm also aware that in the field of statistics, one still refers to a "datum," a single data point. But this isn't statistics, and I'm not writing in Latin. In English, we have a long history of adopting plural Latin nouns as corporate nouns, and in American English, those nouns take a singular verb. It's what we do when we speak, and it's what I've done in the text.

We say "the data is reliable" not "the data are reliable." (I have actually heard "the datums are reliable," but that's just sad.) This usage has been adopted by several influential publications, and I have adopted it here. Not because I don't know how Latin works, but because I've carefully considered the issue and decided to write American English as I, as a well-educated native American English speaker, speak it. Now ain't that just about enough on the subject?