Foreword

My life would be vastly simpler if I were not awash in data.

My PDA is bursting at the seams with lists, spreadsheets, schedules, and documents that collectively enumerate, tabulate, quantify, and qualify my entire life (it's quite humbling to realize just how much of my life can fit in my shirt pocket). In turn, that PDA gets synchronized with my laptop, whose 16-gig drive is packed to capacity not just with applications but also with yet more lists, spreadsheets, schedules, and documents. My laptop is but one node in a local network of about a dozen machines that run my home and my office, and in each of them—no surprise—you'll find more lists, spreadsheets, schedules, and documents. Some of this data is unstructured—I'm not so driven that I maintain an XML representation of my grocery list (well, not yet ?—but much of that data is structured, semantically deep, and intermingled: tax rec ords, code and related artifacts, catalogs of books and journals, reference materials for my in-progress manuscripts and presentations, catalogs of my CDs and associated play lists, and so on.

Step beyond this network, over which I have some degree of control, and it's frightening to think about all the data that people and organizations maintain about me: orders from Web sites; my college, military, and employment records; the government's tax records; and so on. Conservatively speaking, my digital life can probably be compressed into just a few gigabytes of memory. Multiply me by a few billion people, add data for everything other than personal information, and it's easy to see how some systems must cope with tera bytes or even petrabytes of data. At the other end of the spectrum of complexity, even embedded devices must typically manage semantically rich data: for example, you'd be surprised to realize how much structured data something as truly embedded as a pacemaker must manipulate.

In short, virtually every interesting software-intensive system surrounds or manipulates a set of persistent data.

But therein lie a number of challenges: how do I craft a data-intensive system so that I can grow quality software around it? How do I architect that system so that it is resilient to change, realizing that, in many domains, kinds of data change relatively slowly but particular instances of information and applications that manipulate that information change much more frequently? Also, how do I organize my development team so that different stakeholders can work together, since some will be more skilled at designing the data-intensive parts of my system and others will be more skilled at crafting the applications that surround that data?

Eric and Robert together have deep experience in building data-intensive systems, and that experience is evident in their writing. They have written a soundly pragmatic book that addresses these issues and many more head on. Speaking of head on, there is often an unfortunate collision of worlds between the traditional database designers and application designers on a development team, but as the authors demonstrate, the use of the Unified Modeling Language permits these otherwise disparate groups to communicate with one another. Development is indeed a team sport, and integrating the data-intensive parts of a system with its application-intensive parts is critical.

The presence (or absence) of a well-crafted software architecture as well as an intentional process are clear predictors in the success (or failure) of many complex systems. Eric and Robert have thus organized this book along a life-cycle flow, leading you from conceptual to logical and then to physical database design. By focusing on a single, very rich case study, coupled with important observations for the database team highlighted in the sidebars, the authors offer guidance that can help your software development team succeed.

Grady Booch
Chief Scientist
Rational Software Corporation
April 2001