Chapter 5. Relational Algebra

In the previous chapters, we have looked at defining a particular kind of relation, called a base relation, which will be given a physical representation in the database. The relational model also supports the creation of several kinds of derived relations. A derived relation is a relation that is defined in terms of other relations rather than in terms of attributes. These named relations can be base relations or other derived relations, in any combination.

In the database schema, a base relation is represented by a table. Derived relations are represented by views in Microsoft SQL Server and queries in the Microsoft Jet database engine. For the sake of linguistic simplicity, I'll use the term "view," since it's the standard relational term and avoids any possible confusion with the process of querying, which is just close enough to be confusing. I'll also use the term "recordset" generically when I mean either a table (base relation) or a view (derived relation).

Views are defined in terms of the relational operations that are the subject of this chapter. Microsoft Access and the SQL Server Enterprise Manager provide a graphical interface for defining views, which are more commonly defined in terms of SQL SELECT statements.

SQL (usually pronounced "sequel") stands for Structured Query Language. It is a standard language for expressing relational operations. Both the Jet database engine and SQL Server support a dialect of SQL. Not, of course, the same dialectthat would be entirely too easy. Fortunately, the differences between the two implementations don't often affect the relational algebra that we'll be discussing in this chapter. Where the syntax differs, I'll give examples from both dialects.

The SQL SELECT statement is extremely powerful and more than a little complex, and a detailed examination of it is outside the scope of this book. There are some general references listed in the Bibliography. For our purposes, we can restrict ourselves to the basic structure, which has the following syntax:

SELECT <fieldList>
FROM <recordsetList>
    <joinType> JOIN <joinCondition>
WHERE <selectionCriteria>
GROUP BY <groupByFieldList>
HAVING <selectionCriteria>
ORDER BY <orderByFieldList>

The <fieldList> in the SELECT clause is a list of one or more fields to be included in the recordset resulting from the statement. The fields can be physically present in the underlying recordsets, or they can be calculated based on other fields. The <recordsetList> in the FROM clause is, as one might expect, a list of tables and views on which the SELECT statement is based. These are the only two clauses of the SELECT statement that must be included; all others are optional.

The JOIN clause defines the relationship between the recordsets listed in <recordsetList>. We'll be looking at joins in detail later in this chapter. The WHERE clause defines a logical expression, <selectionCriteria>, that restricts the data to be included in the resulting recordset. We'll look at restriction in detail later, as well.

The GROUP BY clause combines records having the same values in the specified list of fields to be combined into a single record. The HAVING clause is used to further restrict the rows returned after they've been combined by the GROUP BY clause. Finally, the ORDER BY clause causes the recordset to be sorted according to the fields listed in <orderByFieldList>.