The Workflow

The use of modeling to design databases far exceeds the use of modeling for applications and is generally done within most organizations with which we have worked and visited. The issue is that modeling a database is generally just that—modeling the database tables, columns, and relationships but not the entire database design. In this book, we look at many other parts of database design that can be modeled and how using the UML helps to model the entire database design. One part of database design that has not been covered very well in the past by the UML is the actual modeling of the database. Designers have primarily focused on using the UML for object-oriented application design and not for database design. With the use of the Profile for Database Design, this has helped open the UML to database design and database designers to the UML. Some tools like Rational Rose automate many of the stereotypes that we discuss so that the database designer does not need to understand the entire UML or even the stereotypes; designers can just take advantage of the UML without having to be experts.

We will model the database so that we can visualize and understand how the tables are structured and how they relate to other tables and views. Using the UML for the database design, we will also model constraints, triggers, schemas, indexes, stored procedures, and more. In the next chapter we will cover how to model the storage of the data, including tablespaces, databases, and partitioning.

Having a model of the database is important for many users. The database designer uses the data model in order to ensure database rules, including normalization, key migration, and others. The application developer can use the data model, especially if designed in the UML and mapped to the application model, to understand how the application accesses the database. This will help when the application developer is charged with building the data access software. A user interface (UI) developer also takes advantage of the data model to be sure that the UI picks up the available columns and uses the proper lengths, precision, and scale when designing the look and feel of the application. The analysts and end users (customers) use the data model to ensure that the data they believe is needed is captured in the database, in the correct format, to ensure fewer iterations.

During database design we begin to look at the specifics of the database so that we can decide what is being built and how to optimize it fully. This includes the type of server, operating system, database management software (DBMS) and version, and possibly other considerations. The selection of the DBMS will affect how optimization of the database is accomplished once the physical model is designed. Different DBMS systems consider different options for optimization, and the database administrator (DBA) works differently with their schemas depending on the DBMS chosen. For example, DB2 and Oracle are very sophisticated when it comes to storage, while SQL Server has been made more simplistic and doesn't offer nearly as many options, requiring fewer decisions to be made. Because of the different storage mechanisms used, the indexing specifically is treated differently, but there are many other options to consider, such as view support, triggers, stored procedure languages, and more.

Working as a Team

We have already discussed many ways that using the UML to do all of your modeling in one language is helpful in bringing the many teams involved in the development process together, but using the UML for modeling the database itself is helpful in its own right. The database is generally built by a team of data analysts, database designers, and database administrators along with others, and there must be a way for them to work together while building the database. The database teams themselves need to work together as a team, sharing artifacts, corporate and group standards, templates, diagrams, domains, and descriptions.

The standard entity-relationship (ER) notations do support the needs of the database team at this level, but the UML supports what ER notations support and beyond. The UML was built with ER in mind. ER modeling has existed for a long time, and when Booch, Rumbaugh, and Jacobson created the UML, they built it as a superset of ER notations. Since their intentions were somewhat different from those of database designers, supporting object-oriented development and not database design, some key needs of the database team were left out. With the addition of the UML Profile for Database Design, the UML supports the business models, the requirements models, and the logical and physical application and data models all in one language.

UML Packages

The constructs of the UML, beginning with the concept of packages, encourage sharing of information. By using packages a modeler can logically group information and break up the model artifacts in different ways for different uses. Often teams break up the model into standards packages and packages that different people or teams work on for different parts of the project, different phases (for example, development, testing, and production), and many other groupings that you can use to help make your job and communication among your team members easier (see Figures 7-1 and 7-2).

Figure 7-1. Tree browser showing a package structure

Figure 7-2. A diagram to visualize the elements residing inside the package

Packages can include different types of UML objects as well. A package can contain tables and classes, so that you can see visually and organizationally which tables map to which classes. Some modelers have logical design and database design packages so that they can easily see what is used during each phase of the project. Packages can be used as a good way to organize the models for version control, based on the package rather than on individual tables, for example. This way when you start using the model, you know exactly which elements are needed based on the package with which you begin. Having standard elements, such as domains, tables, classes, use cases, and others, stored in standards packages gives a way to understand the standards that can be used in new and existing models. Also, if you are using some sort of configuration management, you can version packages and be sure that they are maintained and not changed on a whim. Packages should be used to store standard modeling elements. The UML supports a derived relationship, which can be created from those standards to the elements that use them, helping to ensure that the elements are updated by changing the standard package. Some tools even support the changes automatically based on the association between the standard elements and the elements to which they are related.

UML Diagrams

The database design model represents the physical database design as it will be implemented in the DBMS. The database design is modeled in a new diagram type, called the database diagram, provided by the UML Profile for Database Design. Another advantage to the UML is that diagrams do not have to be typed; this means you can have elements of many different types on a diagram, or you can stereotype the diagram to a specific type and allow only those types of elements. You will often want a diagram that shows tables and to what table spaces they belong. Having them both on the same diagram allows for the ability to understand which tables are in which tablespaces and whether they are partitioned across multiple tablespaces as well.

A typed diagram also can be helpful. Having a diagram with only specific items allowed makes it easier for the modelers to see which elements can be used on the diagram. Some tools supply typed diagrams that give you only the elements that apply to that diagram according to the UML. For example, a diagram typed as <<Database Diagram>> would include tables, relationships, views, stored procedures, and domains on that particular diagram but would not allow for things like classes, objects, and other nondatabase elements.

Database Design

Designing the database through data modeling in the UML gives you the ability to capture many more items on the diagram visually than with traditional ER notations. You can model elements like domains, stored procedures, triggers, and constraints as well as the traditional tables, columns, and relationships. There are several reasons to model the database, including creation of a good design, enforcement of referential integrity, management of standards reuse, and communication of the database structures. With the models used for these many areas and more, it is a tremendous advantage to describe visually on the diagram as many things as possible.

Database design is an iterative process; using models makes the constant change easier to manage and understand, allowing you to make changes to the models prior to code generation so you can understand what the implications are based on each change and analyze whether the change should be made or something else should be chosen. For example, a change to one relationship or key column could affect several tables based on that one single change. Be cause of key migration and the cascading of changes based on the key migration, one small change can cause data loss and incompatibility.

In a perfect world, there would always be what is called model-driven development. In model-driven development, all changes are made to the model prior to being implemented directly in the database. By using model-driven development, you can eliminate most surprises. You can see what the changes will affect prior to making the changes in the database directly. The problem is in reality; say the DBA gets a call in the middle of the night about a problem in the database or a call in the afternoon about something that isn't working or data that is missing. What does the DBA do? Go to the model, make the change, and then implement it? That's what we would like to see, but that isn't always possible. The DBA has to get the job done and doesn't have the time or desire to work in the model first, so he or she just changes the code and worries about the model later. This can cause problems that won't be uncovered until later when the database and model are resynchronized, but this is something that really can't be avoided.