Special Relational Operators

Various extensions to relational algebra have been proposed since the relational model was first formulated. We'll look at three that have been generally accepted: summarize, extend, and rename. We'll also look at three extensions provided by Microsoft: transform, rollup, and cube.

Summarize

The summarize operator does precisely what one would expect it to do: It produces records containing summary data grouped according to the specified fields. Summarization is extremely useful in any number of situations in which you want to examine data at a higher level of abstraction than is stored in the database.

The summarize operation is implemented using the GROUP BY clause of the SELECT statement. There will be one record returned for each distinct value in the specified field or fields. If more than one field is listed, groups will be nested. For example, consider the following statement:

SELECT Categories.CategoryName, Products.ProductName,
  SUM([Order Details].Quantity) AS SumOfQuantity
FROM Categories
INNER JOIN Products
  ON Categories.CategoryID = Products.CategoryID)
INNER JOIN [Order Details]
  ON Products.ProductID = [Order Details].ProductID
GROUP BY Categories.CategoryName, Products.ProductName;

This statement will return one record for each product in the Northwind database, grouped by category and containing three fields: CategoryName, ProductName, and SumOfQuantitythe total number of each product soldas shown in Figure 5-12.

Figure 5-12. The GROUP BY Clause Returns Summary Data

The fields listed in <fieldList> in the SELECT statement must be either part of the <groupFieldList> or an argument to a SQL aggregate function. SQL aggregate functions calculate summary values for each record. The most common aggregate functions are AVERAGE, COUNT, SUM, MAXIMUM, and MINIMUM.

Aggregates are another place where Nulls can bite you. Null values are included in the summarize operationthey form a group. They are, however, ignored by aggregate functions. This is usually only a problem if you're using one of the fields in the <groupFieldList> as the parameter to an aggregate function.

Extend

The extend operator allows you to define virtual fields that are calculated based on constants and values stored in the database. You create virtual fields simply by defining them in the <fieldList> of the SELECT statement, as follows:

SELECT [UnitPrice]*[Qty] AS ExtendedPrice
FROM [Order Details];

The calculations defining the virtual fields can be of arbitrary complexity. This process is so simple and fast, that there is rarely any justification for storing a calculated field in a table.

Rename

The final common operator is rename. The rename operation can be performed on either a recordset in <recordsetList> or on individual fields in <fieldList>. In the Jet database engine, a recordset renaming uses the following syntax:

SELECT <fieldName> AS <fieldAlias>
FROM <tableName> AS <tableAlias>

In SQL Server, the "AS" keyword is not necessary, as shown here:

SELECT <fieldName> <fieldAlias> FROM <recordsetName> <recordsetAlias>

Renaming is particularly useful when you're defining a view with a self-join, as shown in the following code:

SELECT Manager.Name, Employee.Name
FROM Employees AS Employee
INNER JOIN Employees AS Manager
  ON Employee.EmployeeID = Manager.EmployeeID;

This syntax allows you to keep each usage logically distinct.

Transform

The TRANSFORM statement is the first of the Microsoft extensions to the relational algebra that we'll examine. TRANSFORM takes the results of a summarize (GROUP BY) operation and rotates them 90 degrees. More often referred to as a crosstab query, this incredibly useful operation is only supported by the Jet database engine; it has not (yet) been implemented in SQL Server.

Yukon Note

Transact-SQL, the dialect supported by SQL Server, has been extended to support a version of the transform operation in Yukon.

The TRANSFORM statement has the following basic syntax:

TRANSFORM <aggregateFunction>
SELECT <fieldList>
FROM <recordsetList>
GROUP BY <groupByList>
PIVOT <columnHeading> [IN (<valueList>)]

The TRANSFORM <aggregateFunction> clause defines the summary data that will populate the recordset. The SELECT statement must include a GROUP BY clause and cannot include a HAVING clause. As with any GROUP BY clause, the <groupByList> can contain multiple fields. (In a TRANSFORM statement, the <fieldList> and <groupByList> expressions are almost always identical.)

The PIVOT clause identifies the field whose values will be used as column headings. By default, the Jet database engine will include the columns in the recordset alphabetically from left to right. The optional IN statement, however, allows you to specify column names, which will be listed in the order in which they're included in <valueList>.

The following TRANSFORM statement provides essentially the same information as the summarize example given previously, the results of which are shown in Figure 5-12.

TRANSFORM Count(Products.ProductID) AS CountOfProductID
SELECT Suppliers.CompanyName
FROM Suppliers
INNER JOIN (Categories INNER JOIN Products
    ON Categories.CategoryID = Products.CategoryID)
ON Suppliers.SupplierID = Products.SupplierID
GROUP BY Suppliers.CompanyName
PIVOT Categories.CategoryName;

The results of this TRANSFORM operation are shown in Figure 5-13.

Figure 5-13. The TRANSFORM Statement Rotates Results by 90 Degrees

Rollup

The summarize operator implemented using the GROUP BY clause generates records containing summary data. The ROLLUP clause, supported only by SQL Server, provides a logical extension to this operation by providing total values.

The ROLLUP clause is only available in SQL Server. It is implemented as an extension to the GROUP BY clause:

SELECT Categories.CategoryName, Products.ProductName,
  SUM([Order Details].Quantity) AS SumOfQuantity
FROM (Categories INNER JOIN Products
  ON Categories.CategoryID = Products.CategoryID)
INNER JOIN [Order Details]
  ON Products.ProductID = [Order Details].ProductID
GROUP BY Categories.CategoryName, Products.ProductName WITH ROLLUP;

Cube

The CUBE operator is also available only in SQL Server and is implemented as an extension to the GROUP BY clause. Essentially, the CUBE clause summarizes every column in the <groupByList> by every other column. It is conceptually similar to the ROLLUP operator, but whereas ROLLUP produces totals for each column specified in the <groupByList>, CUBE creates summary data for additional groups.

For example, if you have three fields in the <groupByList>A, B, and Cthe CUBE operator will return the following seven aggregates:

The total number of Cs.
The total number of Cs, grouped by A.
The total number of Cs, grouped by C within A.
The total number of Cs, grouped by B within A.
The total number of Cs, grouped by B.
The total number of Cs, grouped by A within B.
The total number of Cs, grouped by C within B.