Special Relational Operators
Various extensions to relational algebra have been proposed since the relational model was first formulated. We'll look at three that have been generally accepted: summarize, extend, and rename. We'll also look at three extensions provided by Microsoft: transform, rollup, and cube.
The summarize operator does precisely what one would expect it to do: It produces records containing summary data grouped according to the specified fields. Summarization is extremely useful in any number of situations in which you want to examine data at a higher level of abstraction than is stored in the database.
The summarize operation is implemented using the GROUP BY clause of the SELECT statement. There will be one record returned for each distinct value in the specified field or fields. If more than one field is listed, groups will be nested. For example, consider the following statement:
SELECT Categories.CategoryName, Products.ProductName, SUM([Order Details].Quantity) AS SumOfQuantity FROM Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID) INNER JOIN [Order Details] ON Products.ProductID = [Order Details].ProductID GROUP BY Categories.CategoryName, Products.ProductName;
This statement will return one record for each product in the Northwind database, grouped by category and containing three fields: CategoryName, ProductName, and SumOfQuantitythe total number of each product soldas shown in Figure 5-12.
Figure 5-12. The GROUP BY Clause Returns Summary Data
The fields listed in <fieldList> in the SELECT statement must be either part of the <groupFieldList> or an argument to a SQL aggregate function. SQL aggregate functions calculate summary values for each record. The most common aggregate functions are AVERAGE, COUNT, SUM, MAXIMUM, and MINIMUM.
Aggregates are another place where Nulls can bite you. Null values are included in the summarize operationthey form a group. They are, however, ignored by aggregate functions. This is usually only a problem if you're using one of the fields in the <groupFieldList> as the parameter to an aggregate function.
The extend operator allows you to define virtual fields that are calculated based on constants and values stored in the database. You create virtual fields simply by defining them in the <fieldList> of the SELECT statement, as follows:
SELECT [UnitPrice]*[Qty] AS ExtendedPrice FROM [Order Details];
The calculations defining the virtual fields can be of arbitrary complexity. This process is so simple and fast, that there is rarely any justification for storing a calculated field in a table.
The final common operator is rename. The rename operation can be performed on either a recordset in <recordsetList> or on individual fields in <fieldList>. In the Jet database engine, a recordset renaming uses the following syntax:
SELECT <fieldName> AS <fieldAlias> FROM <tableName> AS <tableAlias>
In SQL Server, the "AS" keyword is not necessary, as shown here:
SELECT <fieldName> <fieldAlias> FROM <recordsetName> <recordsetAlias>
Renaming is particularly useful when you're defining a view with a self-join, as shown in the following code:
SELECT Manager.Name, Employee.Name FROM Employees AS Employee INNER JOIN Employees AS Manager ON Employee.EmployeeID = Manager.EmployeeID;
This syntax allows you to keep each usage logically distinct.
The TRANSFORM statement is the first of the Microsoft extensions to the relational algebra that we'll examine. TRANSFORM takes the results of a summarize (GROUP BY) operation and rotates them 90 degrees. More often referred to as a crosstab query, this incredibly useful operation is only supported by the Jet database engine; it has not (yet) been implemented in SQL Server.
Transact-SQL, the dialect supported by SQL Server, has been extended to support a version of the transform operation in Yukon.
The TRANSFORM statement has the following basic syntax:
TRANSFORM <aggregateFunction> SELECT <fieldList> FROM <recordsetList> GROUP BY <groupByList> PIVOT <columnHeading> [IN (<valueList>)]
The TRANSFORM <aggregateFunction> clause defines the summary data that will populate the recordset. The SELECT statement must include a GROUP BY clause and cannot include a HAVING clause. As with any GROUP BY clause, the <groupByList> can contain multiple fields. (In a TRANSFORM statement, the <fieldList> and <groupByList> expressions are almost always identical.)
The PIVOT clause identifies the field whose values will be used as column headings. By default, the Jet database engine will include the columns in the recordset alphabetically from left to right. The optional IN statement, however, allows you to specify column names, which will be listed in the order in which they're included in <valueList>.
The following TRANSFORM statement provides essentially the same information as the summarize example given previously, the results of which are shown in Figure 5-12.
TRANSFORM Count(Products.ProductID) AS CountOfProductID SELECT Suppliers.CompanyName FROM Suppliers INNER JOIN (Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID) ON Suppliers.SupplierID = Products.SupplierID GROUP BY Suppliers.CompanyName PIVOT Categories.CategoryName;
The results of this TRANSFORM operation are shown in Figure 5-13.
Figure 5-13. The TRANSFORM Statement Rotates Results by 90 Degrees
The summarize operator implemented using the GROUP BY clause generates records containing summary data. The ROLLUP clause, supported only by SQL Server, provides a logical extension to this operation by providing total values.
The ROLLUP clause is only available in SQL Server. It is implemented as an extension to the GROUP BY clause:
SELECT Categories.CategoryName, Products.ProductName, SUM([Order Details].Quantity) AS SumOfQuantity FROM (Categories INNER JOIN Products ON Categories.CategoryID = Products.CategoryID) INNER JOIN [Order Details] ON Products.ProductID = [Order Details].ProductID GROUP BY Categories.CategoryName, Products.ProductName WITH ROLLUP;
The CUBE operator is also available only in SQL Server and is implemented as an extension to the GROUP BY clause. Essentially, the CUBE clause summarizes every column in the <groupByList> by every other column. It is conceptually similar to the ROLLUP operator, but whereas ROLLUP produces totals for each column specified in the <groupByList>, CUBE creates summary data for additional groups.
For example, if you have three fields in the <groupByList>A, B, and Cthe CUBE operator will return the following seven aggregates: