Next Chapter Return to Table of Contents Previous Chapter

CHAPTER 7: VECTORS AND MATRICES

Vectors and matrices are important components in mathematical computations. These are fixed-length 1-d and 2-d arrays, respectively, usually composed of numbers, with defined operations such as matrix multiplication, transposition, determinants, and so on. Vectors and matrices figure prominently in applications like coordinate transformations and in solving simultaneous equations. In this chapter, we'll look at how to create user-defined versions of these useful data structures.

Vectors and matrices have a special relationship. It's common to take 1-d slices of a matrix and call them vectors. For example, a row of a matrix is called a row vector. A column is known as a column vector. If the matrix is square (with the number of rows equal to the number of columns), you can think in terms of diagonal vectors, which span the diagonal of a matrix. Figure 7.1 shows a matrix with an example of each type of vector.

Figure 7.1 Vector types.

Since matrices are arrays, you might be tempted to implement them using the array-of-arrays technique described in Chapter 4. If you do, however, the result will be less than satisfactory--for several reasons. First, it's not possible to allow both dimensions to be determined at runtime. Second, even if you use the most compact statically-sized form, you will still have some unnecessary overhead. For example, each row of a 2-d array implemented with Sarray objects has its own length variable, virtual function table pointer, and pointer to its array elements, as illustrated in Figure 7.2. Most of these variables are redundant. You could keep just one of each for the entire matrix.

A worse problem lies in figuring out how to represent vectors associated with a matrix. Row vectors are easy, since each row is already an array. But what about column vectors, or diagonal vectors? There is no easy way to treat the elements of a column or diagonal as an array in its own right, since each element is stored in a separate row array. To further explore this fundam ental problem, we'll step back and take a look at how two-dimensional arrays are organized using the built-in C++ array types.

ROW-MAJOR AND COLUMN-MAJOR MATRICES

Consider the following two-dimensional 3x5 matrix:

float m[3][5];

In C++, this matrix is stored, row by row, as a 1-d array. Matrices stored in this way are called row-major matrices. Scanning down the 1-d representation, the column index varies the fastest, while the row index varies the slowest. Some languages, such as Fortran, use column-major matrices, where the matrices are laid out column by column. Here, the row index varies fastest, while the column index varies the slowest. Figure 7.3 shows the difference between these two types of organization.

Figure 7.2 A matrix implemented as an array of arrays.

The way a matrix is stored affects the way subscripting works. To find the ith row and jth column of a row-major matrix, the following subscript calculation is used:

m[i][j] <==> *(m + i*ncols + j); // For row-major matrices

For a column-major matrix, the calculation is:

m[i][j] <==> *(m + i + j*nrows); // For column-major matrices

In both cases, we need the size of only one dimension to compute the subscript. For row-major matrices, we need the number of columns, and for column-major matrices, we must know the number of rows.

The memory layout of a matrix directly affects what type of vectors can be easily obtained from the matrix. For a row-major matrix, it's easy to define row vectors, since the elements of each row are contiguous. Column vectors are more difficult to define, since the elements are non-contiguous. For column-major matrices, the opposite is true, with row-vectors being difficult to define and column vectors easy to define.

Actually, it's not all that difficult to define non-contiguous vectors. The secret lies in usng vector strides, our next topic.

ROW-MAJOR AND COLUMN-MAJOR MATRICES

Consider the following two-dimensional 3x5 matrix:

float m[3][5];

In C++, this matrix is stored, row by row, as a 1-d array. Matrices stored in this way are called row-major matrices. Scanning down the 1-d representation, the column index varies the fastest, while the row index varies the slowest. Some languages, such as Fortran, use column-major matrices, where the matrices are laid out column by column. Here, the row index varies fastest, while the column index varies the slowest. Figure 7.3 shows the difference between these two types of organization.

Figure 7.2 A matrix implemented as an array of arrays.

The way a matrix is stored affects the way subscripting works. To find the ith row and jth column of a row-major matrix, the following subscript calculation is used:

m[i][j] <==> *(m + i*ncols + j); // For row-major matrices

For a column-major matrix, the calculation is:

m[i][j] <==> *(m + i + j*nrows); // For column-major matrices

In both cases, we need the size of only one dimension to compute the subscript. For row-major matrices, we need the number of columns, and for column-major matrices, we must know the number of rows.

The memory layout of a matrix directly affects what type of vectors can be easily obtained from the matrix. For a row-major matrix, it's easy to define row vectors, since the elements of each row are contiguous. Column vectors are more difficult to define, since the elements are non-contiguous. For column-major matrices, the opposite is true, with row-vectors being difficult to define and column vectors easy to define.

Actually, it's not all that difficult to define non-contiguous vectors. The secret lies in usng vector strides, our next topic.

VECTOR POINTERS

It's possible to incorporate the ideas in the previous section into a new type of pointer, which we'll call vector pointers. These pointers are generalizations of ordinary pointers. An ordinary pointer always has a stride of one. That is, it's assumed the array elements are contiguous. Vectors may be non-contiguous (with evenly spaced elements, though), so a vector pointer can have a stride different than one. The following VecPtr class encapsulates these notions:

template<class TYPE>

class VecPtr {

private:

TYPE *p;               // Ordinary pointer

const unsigned stride; // Pointer stride

public:

VecPtr(TYPE *t, unsigned s=1)       : p(t), stride(s) { }

VecPtr(const VecPtr<TYPE> &s)       : p(s.p), stride(s.stride) { }

void operator=(const VecPtr<TYPE> &s) { p = s.p; }

void operator=(TYPE *t)               { p = t; }

void operator++()                     { p += stride; }

void operator++(int)                  { p += stride; }

void operator+=(unsigned n)           { p += stride * n; }

TYPE &operator[](unsigned i) const    { return p[stride*i]; }

operator TYPE *() const               { return p; }

TYPE &operator *() const              { return *p; }

#ifdef TYPE_IS_STRUCTURE

TYPE *operator->() const              { return p; }

#endif

};

This class exemplifies everything that's right about C++. It turns vector pointers into concrete data types that look virtually like ordinary pointers. The class is flexible in that, being a template, it can be used for any vector element type. Yet because the class is completely inlined, there is very little overhead (if any). This class can be as efficient as any hand-coded, low-level pointer implementation.

The basic purpose of the VecPtr class is to hide the stride calculations. For example, the overloaded '++' operator turns vector pointer incrementing into stride additions. This allows us to write code like:

for(int i = 0; i<n; i++) { *v = 0; v++; } 

rather than

for(int i = 0; i<n; i++) { *v = 0; v += stride; }

The pointer de-referencing operators `*' and '->' are overloaded to make vector pointers look even more like ordinary pointers. For instance, the expression *v returns the vector element that is currently being referenced. Likewise, v->m returns the member m of the element being referenced. For this to work, TYPE must be a structure; otherwise, the class won't compile. We've commented out the operator->( ) function using ifdef statements in the class declaration.

The subscript operator is overloaded to allow vector pointers to be used like the name of an array. For example:

for(int i = 0; i<n; i++) v[i] = 0;

Of course, a loop like this won't be as efficient as one that uses pointer incrementing, since it takes an extra multiplication for the subscript operation.

Optimizing Vector Pointers

Optimizing Vector Pointers

Vector pointers can be further optimized. Note that, when a value is added to a pointer, more is going on than just the addition. In fact, a multiplication takes place! This is because the compiler must turn pointers into machine addresses, which work in byte units. Thus, given some pointer p, which points to an object of type T, we have the following equivalence:

p + i <==> ((char *)p) + i * sizeof(T)

Vector pointers can be optimized by pre-multiplying the vector stride by sizeof(T). The stride then becomes a byte stride. In pointer arithmetic, the pointer can be cast as a char *, which saves the extra multiplication. For example, the following functions of the VecPtr class can be optimized this way:

template<class TYPE>

class VecPtr {

...

VecPtr (TYPE *t, unsigned s=1) : p(t), stride(s*sizeof(TYPE)) { }

void operator++()                   { (char *)p += stride; }

void operator++(int)                { (char *)p += stride; }

void operator+=(unsigned n)         { (char *)p += stride * n; }

TYPE &operator[](unsigned i) const  {

return *((TYPE *)((char *)p + stride*i));

}

...

};

This change may speed up vector pointers by as much as 20 percent. Note that the optimization is completely encapsulated in the VecPtr class. Outside the class, you continue to work with pointer strides in their normal vector element units.

The following example shows how to use vector pointers with matrices. A matrix is created using built-in arrays and is then initialized to be a unit matrix (with all elements zero except on the diagonal, where ones are stored.)

float m[3][3]; // Declare a 3x3 row-major matrix

// Set all values to zero by pointing to matrix with

// a vector that can access all elements. A stride of

// 1 will do the trick.

VecPtr<float> va(m, 1);

for (int i = 0; i<9; i++) va[i] = 0;

// Set diagonal elements to 1 with another vector pointer

VecPtr<float> vd(m, 4); // Stride is ncols + 1

for (int j = 0; j<3; j++) vd[j] = 1;

The resulting matrix would be:

100

010

001

Pointer Increment Problems

One unfortunate limitation with vector pointers is the inability to use expressions like *p++. Why can't you use this? Consider what the expression translates into in terms of operation function calls:

*p++ <==> (v.operator++()).operator*()

First, the post-increment operator is called, and the pointer--as it existed before the increment--is returned. This pointer is then de-referenced to return the element. However, there is no way to return the original pointer before it has been incremented. Only a copy can be returned. On the surface, the following operator++( ) function seems to work:

VecPtr<TYPE> operator++(int)

{

VecPtr before (*this);  // Make copy of object

p += stride;            // Do increment

return before;          // Return copy

}

However, what happen in the following (admittedly concocted) expression?

(v++)++; // Undefined result

Such constructs are actually undefined. That's because the result of a post-increment operation is not an 1-Value (it can be assigned to). So it's really illegal to try and increment the result again. Note that pre-increment operation are 1-Values and don't have this problem.

We've forced both forms of operator++( ) to return void. That way, expressions like (v++)++ and *p++ won't compile, so there's no danger of receiving erroneous results. Due to the void returns, both the prefix and suffix forms of operator++( ) can be implemented using the same prefix semantics. There's no way to use the operators in illegal ways.

DYNAMIC MATRICES

When you use the built-in array types, it's not possible to directly define matrices that have both their dimensions determined at runtime. Now that you know about pointers and strides, we can show you why this is true. In Chapter 4, we mentioned that, although the number of columns must be fixed, there is a way to make the number of rows variable. Here is one way to do it:

typedef float ArrNx5[5]; // Declare array-of-5-columns type

void f(unsigned n)

{

// Dynamically allocate nx5 matrix, referenced by p

ArrNx5 *p = new ArrNx5[n];

// Set lower-right element to 99

p[n - 1][4] = 99;

}

The variable p represents a pointer to the first element in the first row of the respective matrix. Even though this variable points to a floating point element, its type isn't float * or even float ** as you first might think. Instead, p is a pointer to floating point array of five columns. This can easily be seen by p's declaration.

A stride is implicit with a pointer to an array. In this case, p has the constant 5 as its stride. With this in mind, we have the following equivalence:

P[i] <==> *(p + i*5)

An ordinary pointer gets returned by subscript of a pointer to array. This pointer points to the first element of the ith row. The pointer can then be used in a subscript calculation to find the jth column of the row. Hence, double subscripts such as p[i][j] can be used.

Interestingly enough, the subscripting equation just shown is the same one used for vector pointer strides! The crucial difference is that, here, the stride is constant. Because of this, the compiler is able to easily plug in the constant stride in subscript and pointer increment calculations.

Contrast this with variable strides (such as in a dynamic matrix with the number of columns determined at runtime). Here, the compiler could not simply plug in values for the stride, because it doesn't know what the stride should be! The compiler would be forced to create a special data structure that carries the stride around with the pointer. This approach ought to sound familiar to you, since that's exactly what vector pointers do!

The conclusion is this: if you want to have dynamic matrices with both dimensions determined at runtime, you need to have vector pointers. Since C++ doesn't have vector pointers built in, it isn't able to handle dynamic matrices directly. However, nothing prevents you from creating your own dynamic matrices, as you'll now see.

A Simple Dynamic Matrix Class

From the preceding discussions, two basic techniques emerge for defining dynamically sized matrices:

1. Represent the matrix as a one-dimensional array, using row-major or column-major order (row-major ordering is easiest). Allocate this array dynamically.

2. Use vector pointers to conveniently access rows, columns, and diagonals.

The following SimpleMatrix class uses these techniques:

template<class TYPE>

class SimpleMatrix {

private:

TYPE *data;

unsigned nrows, ncols;

public:

SimpleMatrix(unsigned nr, unsigned nc);

SimpleMatrix(const SimpleMatrix<TYPE> &m);

~SimpleMatrix();

void Copy(const SimpleMatrix<TYPE> &m);

SimpleMatrix<TYPE> &operator=(const SimpleMatrix<TYPE> &m);

TYPE *operator[](unsigned r);

VecPtr<TYPE> RowPtr(unsigned r=0);

VecPtr<TYPE> ColPtr(unsigned c=0);

VecPtr<TYPE> DiagPtr();

VecPtr<TYPE> PtrToAll();

const TYPE *operator[](unsigned r) const;

VecPtr<const TYPE> RowPtr(unsigned r=0) const;

VecPtr<const TYPE> ColPtr(unsigned c=0) const;

VecPtr<const TYPE> DiagPtr() const;

VecPtr<const TYPE> PtrToAll() const;

unsigned NRows() const;

unsigned NCols() const;

unsigned NElems() const;

};

Note The complete code for the SimpleMatrix class is given on disk in the files simpmat.h and simpmat.mth. Test programs for this class are given in the files tstsmat.cpp and tstsmat2.cpp.

This class isn't a complete implementation of matrices. Its purpose is to show you the basic design and to point out some performance and safety problems. Later in this chapter, we'll create complete Vector and Matrix classes that are more robust.

The SimpleMatrix class stores the matrix data in a row-major 1-d array, pointed to by the member data, as shown in the following constructor:

template<class TYPE>

SimpleMatrix<TYPE>::SimpleMatrix(unsigned nr, unsigned nc)

// General constructor

{

nrows = nr; ncols = nc;

data = new TYPE[nrows*ncols];

}

The size of the matrix is stored in the variables nrows and ncols, which are also used in calculating vector strides. The functions RowPtr( ), ColPtr( ), and DiagPtr( ) all return vector pointers of their respective types:

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::RowPtr(unsigned r)

// Return a row vector pointer to row r

{

return VecPtr<TYPE>(data + r*ncols, 1);

}

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::ColPtr(unsigned c)

// Return a column vector pointer to column c

{

return VecPtr<TYPE>(data+c, nrows);

}

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::DiagPtr()

// Return a diagonal vector pointer. In case matrix

// isn't square, the smallest dimension is used to

// determine the diagonal stride.

{

unsigned dstride = ((nrows > ncols) ? ncols : nrows) + 1;

return VecPtr<TYPE>(data, dstride);

}

Also, the PtrToAll( ) function returns a vector pointer, with a stride of one, which allows you to access the matrix as though it were a one-dimensional array. This is useful for operations that apply to all elements, such as zeroing the matrix, scaling the matrix, and so on.

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::PtrToAll()

// Return pointer to all elements of the matrix

{

return VecPtr<TYPE>(data, 1);

}

Here is an example that uses these functions:

SimpleMatrix<int> m(3, 3);

int i;

// Set all elements to '4'

VecPtr<int> va = m.PtrToAll();

for(i = 0; i<m.NElems(); i++) va[i] = 4;

// Set row 1 elements to '2'

VecPtr<int> vr = m.RowPtr(1);

for(i = 0; i<m.NCols(); i++) vr[i] = 2;

// Set col 1 elements to '3'

VecPtr<int> vc = m.ColPtr(1);

for(i = 0; i<m.NRows(); i++) vc[i] = 3;

// Set diagonal elements to '1'

VecPtr<int> vd = m.DiagPtr();

for(i = 0; i<m.NRows(); i++) vd[i] = 1;

The resulting matrix would be:

134

212

431

Two-Dimensional Subscripting

The SimpleMatrix class overloads the subscript operator [ ] to return an ordinary pointer to the first element in row r.

template<class TYPE>

TYPE *SimpleMatrix<TYPE>::operator[](unsigned r)

{

return data + r*ncols;

}

By using a subscript on the returned pointer as well, two-dimensional subscripting can be achieved using a double subscript, as illustrated in Figure 7.5, and in the following code:

SimpleMatrix<float> m(3, 4); // Declare 3x4 matrix

m[1][2] = 99;

Figure 7.5 Two-dimensional subscripting.

You might wonder why the operator[ ]( ) function doesn't return a row vector pointer. For instance, we could have written:

// Alternative definition for []

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::operator[](unsigned r)

{

return VecPtr(data + r*ncols, 1);

}

This function is identical to RowPtr( ) (except for its name). Since row vectors themselves have subscripting defined, we can use the same double subscripting as we used earlier:

m[2][4] = 99; // Set lower-right element to 99

However, the result won't be as efficient as using the original form of the subscript function. The statement translates into:

// First call SimpleMatrix::operator[], then VecPtr::operator[]

(m.operator[](2)).operator[](4) = 99;

By substituting the bodies of the two different operator functions, we can transform the statement into the following equivalent pointer arithmetic:

*((data + 2*ncols) + 1*4) = 99;

Two multiplications take place: one to compute the start of row 2 and one to find column 4 of that row. Note that the multiplication 1*4 has to take place, even though just using the number 4 would suffice. That's because the number 1 represents the value stored in the stride variable for the row vector. Because a general vector pointer type is being used, we can't optimize for some vectors having a stride of 1. In the case where an ordinary pointer is returned, this optimization automatically takes place. A more efficient calculation results:

*((data + 2*ncols) + 4) = 99; // Optimized version

Tests show that returning an ordinary pointer from the matrix subscript function, rather than returning a vector pointer, yields speeds almost twice as fast for operations like matrix multiplication. Next, we'll show you how to define matrix multiplication.

Matrix Multiplication

Matrix multiplication is perhaps the most common operation done on matrices, and can be defined as follows: given two matrices a and b, one of size mxn and the other of size nxp, the result of multiplying the two matrices is a new matrix, c, of size mxp, where the elements of c are computed by the equation:

In other words, each row of a is paired with each column of b, and the corresponding elements of these vectors are multiplied and then summed. The sum becomes one element of the result matrix. Figure 7.6 illustrates this process.

The following function implements matrix multiplication:

void MatMult(SimpleMatrix<Number> &c,

const SimpleMatrix<Number> &a, const SimpleMatrix<Number> &b)

{

unsigned i, j, k;

for (i = 0; i < a.NRows(); i++) {

for (j = 0; j < b.NCols(); j++) {

Number sum = 0;

for (k = 0; k < b.NRows(); k++) sum += a[i][k] * b[k][j];

c[i][j] = sum;

}

}

}

We can also overload the '*' operator for matrix multiplication, as follows:

template<class TYPE>

SimpleMatrix<TYPE>

operator*(const SimpleMatrix<TYPE> &a, const SimpleMatrix<TYPE> &b)

{

SimpleMatrix<TYPE> r(a.NRows(), b.NCols());

MatMult(r, a, b);

return r; // Copy constructor called

}

Figure 7.6 Matrix multiplication.

This example multiplies two matrices:

SimpleMatrix<float> a(3, 2), b(2, 4), c(3, 4);

...

c = a * b;

Unfortunately, the operator "*" function leads to a lot of copying. First, the result matrix is created, the matrix multiply is performed by MatMult( ), and a copy of the result is returned. (We can't safely return a reference to the result.) Then, the assignment to c takes place, where yet another copy takes place. The Matrix class given later in this chapter avoids this problem.

As shown in MatMult( ), double subscripting is the obvious way to implement matrix multiplication. However, this implies a multiplication for each subscripting operation. Since we are accessing array elements in sequence, it might be faster to use pointer incrementing. Here's an alternative matrix multiplication function that uses vector pointers--and is roughly 25 percent faster than the double-subscripting method--for double-precision floating-point matrices:

template<class TYPE>

void MatMult1(SimpleMatrix<TYPE> &c,

const SimpleMatrix<TYPE> &a, const SimpleMatrix<TYPE> &b)

{

unsigned i, j, k;

VecPtr<const TYPE> ar = a.RowPtr();

VecPtr<const TYPE> ac = a.ColPtr();

VecPtr<const TYPE>br = b.RowPtr();

VecPtr<const TYPE>bc = b.ColPtr();

VecPtr<TYPE> cr = c.RowPtr();

VecPtr<TYPE> cc = c.ColPtr();

TYPE const *bstart = br;

for (i = 0; i < a.NRows(); i++) {

cr = cc; // Point to row i. Note: stride not copied

br = bstart;

for (j = 0; j < b.NCols(); j++) {

TYPE sum = 0;

ar = ac; // Point row i. Note: stride not copied.

bc = br; // Point column j. Note: stride not copied.

for (k = 0; k < b.NRows(); k++) {

sum += *ar * *bc;

ar++; // Next column

bc++; // Next row

}

br++; // Next column

*cr = sum;

cr++; // Next column

}

ac++; // Next row

cc++; // Next row

}

}

This function uses six vector pointers, two for each matrix. By incrementing a column vector pointer, we can advance to the next row of a matrix. Incrementing a row vector pointer advances us to the next column of the row. To understand MatMult1( ), it's important to realize that when vector pointer assignments take place only the underlying ordinary pointers are copied--not the strides. (You can verify this by looking at the definition for VecPtr given earlier.) Thus, we can safely assign a column vector pointer to a row vector pointer without confusing the strides.

There's a way to be even more efficient. Note that the row vectors always have a stride of one, so we can replace them by ordinary pointers. You may see speed improvements of 5 to 10 percent by doing this. The changes involved are:

Original Statement                   New Statement

VecPtr<const TYPE> ar = a.RowPtr();  const TYPE *ar = a.RowPtr();

VecPtr<const TYPE> br = b.RowPtr();  const TYPE *br = b.RowPtr();

VecPtr<TYPE> cr = c.RowPtr();        TYPE *cr = c.RowPtr();

You may wonder about the use of the const keywords in these statements. We'll look at these keywords next.

Constant Vectors and Matrices

True to the design principles explained in Chapters 2 and 3, we've made the VecPtr and SimpleMatrix classes work for constant objects. You'll notice that we supply two versions of the[ ]operator for SimpleMatrix:

template<class TYPE>

TYPE *SimpleMatrix<TYPE>::operator [](unsigned r)

// Called for non-const matrices

{

return data + r*ncols;

template<class TYPE>

const TYPE *SimpleMatrix<TYPE>::operator[](unsigned i) const

// Called for const matrices

{

return data + r*ncols;

}

The latter is a constant function, and is the one that's called for matrices declared to be constant. In this case, we shouldn't be able to modify any elements in the row that's returned, so a const TYPE * is returned (meaning a pointer to a constant), instead of just a TYPE*. We also use this strategy when returning vector pointers. For example, two ColPtr( ) functions are defined:

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::ColPtr(unsigned c)

// Called for non-const matrices

{

return VecPtr<TYPE>(data+c, nrows);

}

template<class TYPE>

VecPtr<const TYPE> SimpleMatrix<TYPE>::ColPtr(unsigned c) const

// Called for const matrices

{

return VecPtr<const TYPE>(data+c, nrows);

}

When returning a column vector from a constant matrix, we must somehow prevent the elements of the vector from being modified. To do this, we type the vector pointer to reference const TYPE elements, rather than TYPE elements. Note this is not the same as making the vector pointer itself constant. For instance, had we written const VecPtr<TYPE> instead of VecPtr<const TYPE>, we would have created a pointer that couldn't be modified. Thus, we couldn't do things like pointer incrementing, so the pointer would be somewhat useless.

Matrix Transposition

Another common matrix operation is to take the transpose of a matrix. The B transposition is accomplished by interchanging the rows and the columns. Figure 7.7 shows an example.

The following Transpose( ) function shows the obvious implementation of a matrix transposition.

template<class TYPE>

SimpleMatrix<TYPE> Transpose(const SimpleMatrix<TYPE> &m)

{

SimpleMatrix<TYPE> t(m.NCols(), m.NRows());

for (unsigned i = 0; i<m.NRows(); i++)

for (unsigned j = 0; j<m.NCols(); j++) t[j][i] = m[i][j];

return t; // Copy constructor called.

}

Figure 7.7 Transpose of a matrix.

With this implementation, though, an extra copy takes place when the transposed version is returned. We could circumvent this problem by modifying Transpose( ) to assume that the result matrix has already been built, and to pass it as a parameter:

template<class TYPE>

void SimpleMatrix<TYPE>

Transpose(SimpleMatrix<TYPE> &t, const SimpleMatrix<TYPE> &m)

{

for (unsigned i = 0; i<m.NRows(); i++)

for (unsigned j = 0; j<m.NCols(); j++) t[j][i] = m[i][j];

}

Of course, this design assumes that you have created the result matrix with the proper size, an error-prone activity.

When it's acceptable to modify the matrix in place, there is another way to accomplish matrix transposition. Transposing a matrix is like changing it from row-major to column-major ordering. Instead of actually moving elements around, you can simply view the matrix as though it were column major. Here are the steps needed to make this work:

1. Generalize the vector operations--such as RowPtr( ), ColPtr( ), and operator[ ]( )--to use row and column strides, rather than the number of rows and columns directly. Maintain the row and column strides in separate variables.

2. Interchange the number of rows and columns, and interchange the row and column strides.

The basic idea here is to change what used to be row vectors into column vectors, and vice versa. In a row-major matrix, the row stride is one and the column stride is the number of columns in the matrix. If the matrix is transposed in place, the row stride becomes equal to the number of rows in the new matrix (which is the same as the number of columns in the original matrix), and the column stride becomes one. Assuming the variables rowstride and colstride were added to the SimpleMatrix class, here is a new in-place Transpose( ) function, along with modified RowPtr( ), ColPtr( ), and operator[ ]( ) functions:

template<class TYPE>

void SimpleMatrix<TYPE>::Transpose()

// Transpose a matrix in place

{

// Interchange nrows and ncols

unsigned temp = nrows;

nrows = ncols;

ncols = temp;

// Interchange row stride and column stride

temp = rowstride;

rowstride = colstride;

colstride = temp;

}

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::RowPtr(unsigned r)

{

return VecPtr<TYPE>(data + r*colstride, rowstride);

}

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::ColPtr(unsigned r)

{

return VecPtr<TYPE>(data + c*rowstride, colstride);

}

template<class TYPE>

VecPtr<TYPE> SimpleMatrix<TYPE>::operator[](unsigned r)

{

return VecPtr<TYPE>(data + r*colstride, rowstride);

}

Notice the generalized calculations used to determine the row and column vectors. Remember that, in these functions, either rowstride or colstride is equal to one. The other variable is equal to the number of columns in the original, non-transposed matrix. Another important point is that the operator[ ]( ) function can no longer return an ordinary pointer because there's no guarantee that a row vector will have a stride of one. The matrix might be transposed.

This technique eliminates the need to actually move the matrix elements around. Of course, this only works if you want to transpose a matrix in place. If you need to leave the original matrix alone, you must make a copy. In this case, doing the actual interchange between rows and columns is just as efficient. However, as we'll discuss next, a more sophisticated matrix class can be defined that eliminates the data movement, even when the original matrix is to be left untouched.

MORE ROBUST VECTORS AND MATRICES

The SimpleMatrix class introduced you to techniques for defining dynamic matrices. However, some aspects of SimpleMatrix are not entirely safe, and can lead to a lot of redundant copying.

The SimpleMatrix class uses vector pointers as a fundamental part of its design. While vector pointers are fast and efficient, there is no way to do range checking on subscripts, since vector pointers do not keep track of the bounds of the vectors. This isn't necessarily a problem. Since range checking can easily reduce the performance of your code by 50 percent, you don't always want to use it. However, the capability should still be provided.

A more insidious problem is that, by their very nature, vector pointers induce aliasing. In the SmartPtr and String classes, aliasing was made safe by using reference counting. The same can be done for vectors and matrices. By allowing safe sharing, we can also eliminate a lot of the redundant copying that takes place.

Along these lines, we'll next present two classes, Vector and Matrix, that are extensions of the VecPtr and SimpleMatrix classes and that incorporate the techniques of range checking and reference counting. Fundamental to the design of these two classes is that, strangely enough, each matrix object is actually composed of a 1-d vector object! The purpose of the Matrix class is to interpret the vector data as though it were two-dimensional.

Because vector objects will use a shared representation, it follows that matrices will, as well. This shared representation allows us to easily support efficient submatrix and matrix transpose operations. As you study the following design, note how much overhead is caused by the shared representation, and how much that overhead is compensated for by more efficient matrix operations.

THE VECTOR CLASS

The Vector class is similar in many ways to the String class described in Chapter 6. (You should review that chapter now if you haven't already read it.) The Vector class uses the same letter-envelope reference counting scheme, having an underlying vector representation class VecRep:

template<class TYPE> class Vector; // Forward declaration

template<class TYPE>

class VecRep {

private:

friend class Vector<TYPE>;

unsigned alloclen;

unsigned refcnt;

TYPE data[1];

VecRep(unsigned d);

void *operator new(size_t n, unsigned d);

void operator delete(void *p);

public: // So we can set up null_rep

static VecRep<TYPE> null_rep;

VecRep();

~VecRep();

};

The complete code for the VecRep class is given on disk in the files vector.h and vector.mth.

The VecRep class dynamically allocates the elements of the vector in the same way that the StrRep class allocates the text for strings: by overloading the new and delete operators (not shown here). The difference is that vector elements can be of any type, not just characters, and these types might have constructors and destructors. As is true in StrRep, the vector elements are initially allocated as characters so that we can easily keep the reference count next to the elements in memory. Thus, we must construct each vector element by hand from this array of characters. This task is handled by the VecRep constructor, using the same in-place construction technique covered in Chapter 4:

template<class TYPE>

VecRep<TYPE>::VecRep(unsigned d)

// Constructor to initialize a shared vector rep

{

if (this != &null_rep) {

// We only want to do the following if we're

// not referencing null_rep

alloclen = d;

refcnt = 1;

// We must call the default constructor for

// all but the first element

TYPE *q = data + 1;

for(unsigned i = 1; i<d; i++, q++) new(q) TYPE;

}

}

Note that the first element of the vector already has its constructor called implicitly by the VecRep constructor, because VecRep is defined with one element allocated. The VecRep destructor works exactly in reverse of the constructor:

template<class TYPE>

void VecRep<TYPE>::~VecRep()

// Destructor to explicitly destroy all elements

// of the vector, except the first

{

TYPE *q = data + 1;

for(unsigned i = 1; i<alloclen; i++, q++) q->TYPE::~TYPE();

}

With VecRep objects, we must track the number of elements in the vector so that we know how many times to call the constructor and destructor. To do this, the alloclen member is stored with each VecRep object.

As is true with strings, we use a null-vector representation called null_rep, which allows us to elegantly handle empty vectors--particularly those caused by memory allocation problems. Unlike all other VecRep objects that must be allocated dynamically, null_rep must be allocated statically. You must allocate one null_rep object per vector element type somewhere in your program. A macro is supplied to make this easier:

#define INITNULLVEC(TYPE) VecRep<TYPE> VecRep<TYPE>::null_rep;

With the VecRep class in hand, the Vector class can be defined:

template<class TYPE> class Matrix; // Forward declaration

template<class TYPE>

class Vector {

protected:

friend class Matrix<TYPE>;

VecRep<TYPE> *rep; // Pointer to shared vector data

TYPE *start;       // Pointer to logical start of data

unsigned len;      // Number of logical elements

unsigned stride;   // Stride (offset to next logical element)

int Alloc(unsigned n);

void Bind(const Vector<TYPE> &v);

void Unbind();

void NewBinding(const Vector<TYPE> &v);

public:

Vector(unsigned n = 0, const TYPE *s = 0);

Vector(const Vector<TYPE> &v);

Vector(const Vector<TYPE> &v, SliceType styp,

unsigned n=0, unsigned str=1, unsigned ofs=0);

~Vector();

#ifndef NO_RANGE_CHECK

unsigned CheckIndx(unsigned i) const;

#endif

TYPE &operator[](unsigned i);

const TYPE &operator[](unsigned i) const;

void CopyN(const TYPE *src, unsigned n);

void Copy(const Vector<TYPE> &v);

void Share(const Vector<TYPE> &v);

void SetElements(const TYPE &x);

Vector<TYPE> &operator=(const Vector<TYPE> &v); 

Vector<TYPE> &operator=(const TYPE &x);

int IsNull() const;

int IsUnique() const;

Vector<TYPE> Clone() const;

int EnsureUnique();

unsigned Length() const;

unsigned Stride() const;

// Low-level hooks

VecPtr<TYPE> All();

VecPtr<const TYPE> All() const;

};

The complete code for the Vector class is given on disk in the files vector.h and vector.mth. Test programs for vectors are supplied in tstvec.cpp, tstvec2.cpp, and tstvec3.cpp.

The Vector class has four data members: rep, a pointer to the vector representation object; start, a pointer to the logical start of the vector elements; len, the logical length of the vector; and stride, the vector's logical stride. The start member is analogous to the text member of the String class. Normally, start points to rep->data, but in case the vector is a subvector, it may point at some offset from rep->data.

The Vector class has functions virtually identical to those in the String class to support lazy copying--such as Bind( ), Unbind( ), NewBinding( ), Share( ), EnsureUnique( ), Clone( ), and IsUnique( ). The main difference is that lazy copying isn't automatically performed in the Vector class. With the String class, we want to ensure that, when modifications are made to a string, the string is unique in order not to affect any other strings sharing the same text. With vectors, the design is just the opposite. Because vectors are going to be used to access shared slices of matrices, we want vector modifications to affect the underlying shared matrix data. Functions like EnsureUnique( ) are not used automatically, but are provided in case you do want to ensure a unique copy before modifying data.

The Vector class has three constructors. The first allocates a new vector representation of a specified length, and can optionally copy data from a low-level array into it:

template<class TYPE>

Vector<TYPE>::Vector(unsigned n, const TYPE *s)

// Constructor to allocate space for an n element vector,

// and, if s != 0, to copy low-level array into it.

// If n == 0 or allocation fails, we bind to null rep,

// and do no copying.

{

if (Alloc(n) && s) CopyN(s, n);

}

This constructor is also used as the default, with n and s defaulting to 0. In this mode, the vector created will reference the null vector representation null_rep, as shown in the following Alloc( ) function called by the constructor:

template<class TYPE>

int Vector<TYPE>::Alloc(unsigned n)

// Allocates a block of n elements for this vector.

// If n == 0 or allocation fails, we bind to the null rep.

// ASSUMES TYPE has a default constructor.

// Returns 0 if we've bound to null_rep otherwise, returns 1.

{

if (n) {

rep = new(n) VecRep<TYPE>(n);

}

else {

rep = &VecRep<TYPE>::null_rep;

rep->refcnt++;

}

start = rep->data;

if (rep != &VecRep<TYPE>::null_rep) {

len = n;

stride = 1;

return 1;

}

else {

len = 0;

stride = 1;

return 0;

}

}

The Alloc( ) function is similar to the one used in the String class, with the overloaded new operator for VecRep handling the dynamic allocation chores. Also, to handle any allocation problems, we use a zero-length vector that references null_rep. Note that new vectors initially have a stride of one, and that the vectors are fixed length (with the logical length set to the allocated dimensioned length). Here are some ways to create vectors:

Vector<float> v(10); // Create a new vector of 10 elements

Vector<int> w;       // Create a null vector

Like the String copy constructor, the Vector copy constructor actually uses share semantics:

template<class TYPE>

Vector<TYPE>::Vector(const Vector<TYPE> &v)

// Copy constructor that shares all of the vector v

{

Bind(v);

len = v.len;

stride = v.stride;

start = v.start;

}

The copy constructor's use of share semantics is all-important, since this allows us to return vectors from functions by value, without having to copy all of the vector elements.

The assignment operator also uses share semantics:

template<class TYPE>

Vector<TYPE> &Vector<TYPE>::operator=(const Vector<TYPE> &v)

// Share semantics used for assignment

{

if (this != &v) Share(v); // Note trap for assignment to self

return *this;

}

The Share( ) function is very similar to the copy constructor, except it calls NewBinding( ) instead of Bind( ) because the vector is presumably already bound to a vector representation:

template<class TYPE>

void Vector<TYPE>::Share(const Vector<TYPE> &v)

// Used to shares data with vector v

{

NewBinding(v);

len = v.len;

stride = v.stride;

start = v.start;

}

Subvectors

Copying Vectors

Accessing Vector Elements

THE MATRIX CLASS

The extended Matrix class adds safe, efficient, matrix transposition and submatrices to the basic SimpleMatrix design, and is defined as follows:

template<class TYPE>

class Matrix {

protected:

Vector<TYPE> data;

unsigned nrows, ncols, rowstride, colstride;

public:

Matrix(unsigned nr=0, unsigned nc=0, const TYPE *s = 0);

Matrix(const Matrix<TYPE> &m);

Matrix(const Matrix<TYPE> &m, SliceType styp,

unsigned sr=0, unsigned sc=0,

unsigned nr=0, unsigned nc=0);

void Copy(const Matrix<TYPE> &m);

void Share(const Matrix<TYPE> &m);

Matrix<TYPE> Transpose();

Matrix<TYPE> &operator=(const Matrix<TYPE> &m);

Matrix<TYPE> &operator=(const TYPE &x);

#ifndef NO_RANGE_CHECK

unsigned CheckRow(unsigned i) const;

unsigned CheckCol(unsigned i) const;

#endif

Vector<TYPE> operator[](unsigned r);

const Vector<TYPE> operator[](unsigned r) const;

TYPE &operator()(unsigned r, unsigned c);

const TYPE &operator()(unsigned r, unsigned c) const;

Vector<TYPE> Row(unsigned r, SliceType styp=SHARED);

Vector<TYPE> Col(unsigned c, SliceType styp=SHARED);

Vector<TYPE> Diag(SliceType styp=SHARED);

Vector<TYPE> All(SliceType styp=SHARED);

const Vector<TYPE> Row(unsigned r, SliceType styp=SHARED) const;

const Vector<TYPE> Col(unsigned c, SliceType styp=SHARED) const;

const Vector<TYPE> Diag(SliceType styp=SHARED) const;

const Vector<TYPE> All(SliceType styp=SHARED) const;

// Mid-level hooks. Use these at your own risk:

VecPtr<TYPE> RowPtr(unsigned r=0);

VecPtr<TYPE> ColPtr(unsigned c=0);

VecPtr<TYPE> DiagPtr();

VecPtr<TYPE> PtrToAll();

VecPtr<const TYPE> RowPtr(unsigned r=0) const;

VecPtr<const TYPE> ColPtr(unsigned c=0) const;

VecPtr<const TYPE> DiagPtr() const;

VecPtr<const TYPE> PtrToAll() const;

int IsNull() const;

int IsUnique() const;

Matrix<TYPE> Clone() const;

int EnsureUnique();

unsigned NCols() const;

unsigned NRows() const;

unsigned RowStride() const;

unsigned ColStride() const;

int IsRowMajor() const;

int IsColMajor() const;

int IsSquare() const;

};

The complete code for the Matrix class is given on disk in the files matrix.h and matrix.mth. Test programs are provided in the files tstmat.cpp, tstmat2.cpp, tstmat3.cpp, and tstmat4.cpp.

The Matrix class has five data members. The elements of a matrix are stored in the 1-d Vector object data, which of course means the matrix data can be shared. This data is interpreted as a 2-d array, with nrows number of rows and ncols number of columns. The two other members, rowstride and colstride, keep track of the current row and column stride of the matrix. These members are useful when the matrix is transposed and also for submatrices, as you'll see. Figure 7.9 shows the layout of a typical matrix.

Figure 7.9 Layout of a typical matrix.

Constructing and Assigning Matrices

Creating Submatrices

Transposing Matrix Object

Extracting Vectors from a Matrix

Subscripting with the Matrix Class

Non-Contiguous Submatrices

Constructing and Assigning Matrices

The first Matrix constructor allocates and constructs a brand new matrix by calling the Vector constructor for data and then setting up the matrix size and strides:

template<class TYPE>

Matrix<TYPE>::Matrix(unsigned nr, unsigned nc, const TYPE *s)

: data(nr * nc, s)

{

if (data.IsNull()) {

nrows = 0; ncols = 0; colstride = 1; rowstride = 1;

}

else {

nrows = nr; ncols = nc; colstride = nc; rowstride = 1;

}

}

Note how the possibility of null matrices (due either to passing zero sizes or memory allocation problems) is handled properly.

The copy constructor and overloaded assignment operators use share semantics:

template<class TYPE>

Matrix<TYPE>::Matrix(const Matrix<TYPE> &m)

: data(m.data)

{

if (data.IsNull()) {

nrows = 0; ncols = 0; colstride = 1; rowstride = 1;

}

else {

nrows = m.nrows; ncols = m.ncols;

colstride = m.colstride; rowstride = m.rowstride;

}

}

template<class TYPE>

Matrix<TYPE> &Matrix<TYPE>::operator=(const Matrix<TYPE> &m)

// Share semantics used for assignment.

{

if (this != &m) Share(m); // Note trap for assignment to self

return *this;

}

template<class TYPE>

void Matrix<TYPE>::Share(const Matrix<TYPE> &m)

{

data.Share(m.data);

nrows = m.nrows; ncols = m.ncols;

colstride = m.colstride; rowstride = m.rowstride;

}

Here's an example of these functions:

float one_d_arr[4] = {0, 1, 2, 3};

Matrix<float> m(3, 5);            // Elements left uninitialized

Matrix<float> w(2, 2, one_d_arr); // Elements set to {{0, 1},{2, 3}}

Matrix<float> z(w);               // Copy constructor called

m = z;                            // m now reassigned to z's data

Creating Submatrices

The Matrix class can support submatrices, which SimpleMatrix couldn't do. These submatrices can either overlay an existing matrix (with the data being shared) or be separate copies. Figure 7.10 shows an example of overlaying a 3x5 submatrix onto a 6x7 matrix.

One of the Matrix constructors is used to construct submatrices. This constructor is in a sense the two-dimensional analog of the subvector constructor of the Vector class. The submatrix can either be shared or copied, depending on the value of styp:

template<class TYPE>

Matrix<TYPE>::Matrix(const Matrix<TYPE> &m, SliceType styp,

unsigned sr, unsigned sc,

unsigned nr, unsigned nc)

// Constructor that constructs a submatrix of matrix m

// If styp==SHARED, it means to share the data with m

// NOTE: If sharing, we initially create a null vector, and

// then we immediately rebind to shared vector. If copying,

// a row-major submatrix is created, and if nr == 0,

// then m.nrows is used. If nc == 0, then m.ncols is used.

: data((styp == SHARED) ?

0 : ((nr ? nr : m.nrows) * (nc ? nc : m.ncols)))

{

unsigned start_ofs, n;

if (nr) nrows = nr; else { nrows = m.nrows; sr = 0; }

if (nc) ncols = nc; else { ncols = m.ncols; sc = 0; }

if (styp == SHARED) { // Sharing

// When sharing, strides are always the same as parent strides

colstride = m.colstride;

rowstride = m.rowstride;

// Calculate length of underlying vector and offset.

// Remember: either colstride or rowstride == 1.

if (rowstride == 1) {

n = colstride * (nrows-1) + ncols;

}

else {

n = rowstride * (ncols-1) + nrows;

}

start_ofs = sr * colstride + rowstride * sc;

data.Share(Vector<TYPE>(m.data, SHARED, n, 1, start_ofs));

}

else {

if (data.IsNull()) {

nrows = 0; ncols = 0; colstride = 1; rowstride = 1;

}

else {

// Set up as a row-major matrix

colstride = ncols; rowstride = 1;

// Copy data from shared submatrix into allocated space.

// Note that the constructor is called recursively here

// to create a shared submatrix used for the copy.

Copy(Matrix<TYPE>(m, SHARED, sr, sc, nrows, ncols));

}

}

}

Figure 7.10 A submatrix of a matrix.

When making a submatrix copy, we use the same type of recursive call to the constructor that we used in the constructor for subvectors in the Vector class.

Here's an example of a 3x5 submatrix that shares its data with a 6x7 matrix, starting at row and column (2, 2), as illustrated earlier in Figure 7.10 :

Matrix<float> m(6, 7);

...

Matrix<float> subm(m, SHARED, 2, 2, 3, 5);

The last four parameters of the submatrix constructor, which define the origin and size of the submatrix, default to zero. In this case, the submatrix encompasses all of the original matrix. This approach can be used to create an actual copy of the matrix--which the copy constructor can't do--since it uses share semantics exclusively. For example:

Matrix<float> copy_of_m(m, COPIED);

The two variables rowstride and colstride provide the key to allowing a submatrix to share data with its parent. For normal row-major matrices, the column stride is always equal to the number of columns in the matrix. However, that isn't necessarily true for submatrices. Consider, for instance, the 3x5 submatrix shown earlier in Figure 7.10. You can see that its column stride should be the number of columns of the parent 6x7 matrix, and thus should be 7 instead of 5.

If one of the matrices is transposed or yet another submatrix is taken of a submatrix, the matter is further complicated. However, by judicious use of rowstride and colstride, the strides are never confused. Here's the basic rule: the row and column strides of a submatrix are always the same as their parents' strides and must always reflect the physical layout of the data. The real data lies somewhere up the chain of parents and is organized in row-major order. Of course, if one parent is transposed, the row and column strides will be interchanged.

Transposing Matrix Object

Data sharing allows us to define an efficient matrix transposition function for Matrix objects by merely interchanging nrows and ncols and interchanging rowstride and colstride:

template<class TYPE>

Matrix<TYPE> Matrix<TYPE>::Transpose()

// Returns a transposition of this matrix. Does not have to move

// the elements of the matrix around, just merely changes

// interpretation between row-major and column-major ordering.

{

Matrix<TYPE> t(*this); // Remember, copy constructor shares

// Interchange number of rows and cols

unsigned temp = t.nrows;

t.nrows = t.ncols;

t.ncols = temp;

// Interchange row stride and column stride

temp = t.colstride;

t.colstride = t.rowstride;

t.rowstride = temp;

return t;

}

Unlike the similar optimized Transpose( ) we showed for the SimpleMatrix class, this function does not need to assume that the matrix is to be transposed in place. Thus, the original matrix is untouched. Here's an example of this function:

Matrix<float> m(3, 5); // Create a 3x5 matrix

...

Matrix<float> t;       // Create a null matrix

t = m.Transpose();

Because both assignment and copy construction use share semantics, we can write statements like the last one without worrying about undue copying taking place. If the matrix being transposed is large, we can indeed avoid a lot of data movement.

Of course, you don't get something for nothing. Clever use of rowstride and colstride for matrix transposition does have its price. The vector extraction functions, to be given next, aren't as efficient as they could be; they can't assume anything about the vector strides. For instance, a row vector may or may not have a stride of one, so we can't use an ordinary pointer to it.

Extracting Vectors from a Matrix

As is true with the SimpleMatrix class, the functions RowPtr( ), ColPtr( ), DiagPtr( ), and PtrToAll( ) all are defined for the Matrix class. We'll refer to these as vector extraction functions. These functions use the modifications we suggested earlier--to handle the potential transposition of the matrix--by using rowstride and colstride rather than nrows and ncols directly. By using rowstride and colstride, the extraction functions also work for submatrices.

Here are the RowPtr( ) and ColPtr( ) functions for the Matrix class. The other extraction functions are defined similarly. Also note that const versions can be defined for constant matrices (not shown):

template<class TYPE>

VecPtr<TYPE> Matrix<TYPE>::RowPtr(unsigned r)

// Returns vector pointer to row r

{

return VecPtr<TYPE>(data.start+CHECKROW(r)*colstride, rowstride);

}

template<class TYPE>

VecPtr<TYPE> Matrix<TYPE>::ColPtr(unsigned c)

// Returns vector pointer to column c

{

return VecPtr<TYPE>(data.start+CHECKCOL(c)*rowstride, colstride);

}

In these functions, range checking is handled by calls to the macros CHECKROW( ) and CHECKCOL( ). These macros are the two-dimensional counterparts to the CHECK( ) macro used in the Array classes (described in Chapter 4), and are defined in the range.h header file. The macros do nothing if NO_RANGE_CHECK is undefined.

The range checking is done when the vector pointers are created. But remember that, when you use the vector pointers, nothing prevents you from going out of bounds. For this reason, another set of vector extraction functions are defined. These functions, which return fully range-checked Vector objects, are Row( ), Col( ), Diag( ), and All( ). Here are examples of Row( )and Col( ):

template<class TYPE>

Vector<TYPE> Matrix<TYPE>::Row(unsigned r, SliceType styp)

// Return a row slice of a matrix

{

return Vector<TYPE>(data, styp,

ncols, rowstride, CHECKROW(r)*colstride);

}

template<class TYPE>

Vector<TYPE> Matrix<TYPE>::Col(unsigned c, SliceType styp)

// Return a column slice of a matrix

{

return Vector<TYPE>(data, styp,

nrows, colstride, CHECKCOL(c)*rowstride);

}

Notice that, by using the styp parameter, the vectors returned can either share their data with the original matrix or can be separate copies. The default action is to share the data.

Subscripting with the Matrix Class

To return a row vector, the subscript operator is overloaded for the Matrix class. As such, operator[ ]( ) is the same as Row( ), except the returned vector always shares its data with the matrix:

template<class TYPE>

Vector<TYPE> Matrix<TYPE>::operator[](unsigned r)

// Row subscripting operator for non-const matrices.

// This routine does the same thing as Row(r, SHARED).

{

return Vector<TYPE>(data, SHARED,

ncols, rowstride, CHECKROW(r)*colstride);

}

Since vectors have their own subscript operator function defined, you can write code like the following:

Matrix<float> m(3, 5);

m[2][4] = 99; // Set lower-right element to 99

Thus, you can use the same double-subscripting syntax that's used for built-in arrays. There's only one problem. The SimpleMatrix class returned an ordinary pointer from the first subscript. Here, though, a full vector object is returned, and this object must be constructed. The construction process, while fairly efficient due to share semantics, still takes time. In fact, a matrix multiply routine that uses Matrix objects with double-subscripting is more than six times slower than its SimpleMatrix counterpart! And you'll only get this performance level when range checking is turned off. If you turn on range checking, the performance could be more than ten times slower.

You can circumvent this problem by defining operator[ ]( ) to return a vector pointer rather than a full vector. This will yield a more efficient subscript operation (perhaps only twice as slow as the fastest version), but you lose the ability to do range checking on the row vector. Of course, you can eliminate the subscripting altogether and instead use vector pointer incrementing. Doing so will yield speeds as fast as those for the SimpleMatrix class. This is an important benefit because you can obtain improved the performance even though you now have generalized matrices (such as transposed submatrices).

Another alternative is to define a different style of subscript operator. For example, you can overload the ( ) operator (normally used for function call syntax) to do two-dimensional subscripting:

template<class TYPE>

TYPE &Matrix<TYPE>::operator()(unsigned r, unsigned c)

{

return data.start[CHECKROW(r)*colstride + CHECKCOL(c)*rowstride];

}

...

Matrix<float> m(3, 5); // Construct matrix

...

m(2, 4) = 99;          // Set lower-right element to 99

The equation used in the operator( )( ) function is a generalization of the two-dimensional subscript equation used for row-major matrices. Normally, you multiply the row index by the number of columns and then add the column index, as in:

m(r, c) = *(m + r*ncols + c)

However, in the general case, the matrix might be transposed and might be a submatrix. Thus, we have to multiply both the row index and the column index by the appropriate row and column strides. Note that one of the strides will be one, so in effect we'll have either

m(r, c) = *(m + r*colstride + c) // If row major

or

m(r, c) = *(m + r + c*rowstride) // If column major

Thus, one of the multiplications in operator( )( ) is redundant. That's the price we pay for generality. Even so, expressions like m(i,j) have been timed to be two to three times faster than their m[i][j] counterparts.

The expression m(i,j) is actually the conventional notation used in mathematics for subscripting. Because of this, some people prefer to use the two-dimensional operator over its double-subscripting m[i][j] counterpart. Others prefer the more conventional C++ syntax for subscripting. Regardless of your personal preference, the mathematical notation will be significantly faster unless you define the [ ] operator for matrices to return a vector pointer rather than a full vector. In that case, the results are fairly even.

Non-Contiguous Submatrices

A matrix is said to be contiguous if you can access all of the elements using a single 1-d vector. Normal matrices created by the Matrix class will always be contiguous, regardless of whether they are transposed. However, its possible for submatrices that are shared with a parent to be non-contiguous. Figure 7.11 shows some examples of contiguous and non-contiguous submatrices, assuming a row-major organization.

From Figure 7.11, the following rule emerges for determining whether a submatrix is contiguous: either the number of columns must equal the column stride or the number of rows must be one. For column-major matrices (not shown in Figure 7.11), the number of rows must equal the row stride or the number of columns must be one. The function All( ), which attempts to return a 1-d vector to all the elements in a matrix, makes these tests for contiguousness. If the matrix isn't contiguous, All( ) returns a null vector because it isn't possible to access all the elements with a single vector:

Figure 7.11 Some non-contigous submatrices.

template<class TYPE>

Vector<TYPE> Matrix<TYPE>::All(SliceType styp)

// Returns a 1-d vector with all of the data for the matrix.

// If it isn't possible for the 1-d data to be contiguous

// (as will be the case for some submatrices), then a

// null vector is returned.

// ASSUMES either colstride or rowstride == 1.

{

if ((rowstride == 1 && nrows > 1 && ncols != colstride) ||

(colstride == 1 && ncols > 1 && nrows != rowstride))

return Vector<TYPE>(); // Return a null vector

return Vector<TYPE>(data, styp, nrows * ncols, 1, 0);

}

As an example of handling non-contiguous submatrics, here is an overloaded assignment operator that sets all the elements of the matrix to a specified value. This is done by doing the assignments row by row, using vector pointers:

template<class TYPE>

Matrix<TYPE> &Matrix<TYPE>::operator=(const TYPE &x)

// Set all elements in the matrix to the value x.

// Assumes a worst case non-contiguous submatrix.

{

// tc = column vector pointer to rows of t

// tr = row vector pointer to columns of t

VecPtr<TYPE> tc(data.start, colstride);

VecPtr<TYPE> tr(data.start, rowstride);

for (unsigned i = 0; i<nrows; i++) {

tr = tc; // Remember, strides aren't assigned!

for (unsigned j = 0; j<ncols; j++) {

*tr = x;

tr++;  // Next column

}

tc++;       // Next row

}

return *this;

}

DESIGN CRITIQUE

In this chapter, you've seen in detail how to define generalized dynamic vectors and matrices. In our design, we were careful to remove all potential inefficiencies, without sacrificing generality. However, any time you make something general, performance is bound to suffer. This is manifested in basically one form: vector pointers must be used rather than ordinary pointers. Because of their generality, vector pointers with strides of one may be less efficient than ordinary pointers (depending on how well your compiler optimizes), even though the two are logically equivalent.

However, by using such tricks as vector pointers and reference counting, we can define certain operations to be much more efficient than the obvious ways would suggest. Consider, for instance, matrix transposition, which can be done by merely swapping some variables. Returning matrices from functions also becomes very efficient, making the code less clumsy. Never does an unnecessary copy of matrix elements take place.

There are some overriding questions: What is the cost of generality? Do the potential speed improvements offset the potential slowdowns? The answers, of course, depend on the particular application (and your compiler). However, in most non-trivial applications, the Vector and Matrix classes will probably be just as as efficient as simple, static built-in matrices.

An interesting test is to compare matrix multiply times for built-in statically sized matrices, SimpleMatrix objects, and Matrix objects. Table 7.1 shows the fastest times possible for all three types (using all the tricks described in this chapter). The results were obtained by multiplying two 30x30 double precision floating-point matrices together, 100 times, and taking the average. The tests were performed using 32-bit pointers on a 486 33-MHz machine.

As you might expect, statically sized matrices yielded the fastest results. Using a generalized matrix resulted in a 32 percent drop in speed, which isn't bad considering the flexibility that we gain. Interestingly enough, the general Matrix class performed slightly better than the SimpleMatrix class (probably due to some compiler optimization quirk). Their performance is close enough to suggest that the main performance penalty is due mostly to the use of dynamically sized matrices and not the fact that, in the case of the Matrix class, submatrices and efficient transposition are supported.

What is the space cost of generalized matrices? Figure 7.9 given earlier shows the memory layout of a Matrix object. Note that, in addition to the matrix elements themselves, a Matrix object has four data members plus a Vector object with four data members--one of which points to a VecRep object using two data members. Thus, an independent, unique matrix has an overhead of 10 data members (probably in the range of 20 to 40 bytes, depending on the implementation). This may be significant if you have a lot of small 3x3 or 4x4 matrices. However, for larger matrices, the overhead becomes less important, especially in light of the flexibility gained. And, of course, if the matrix data is shared, the overhead is reduced by two members plus the elements shared.

Table 7.1 Relative performance of different matrix types.

Type                      Time     Performance Relative to Fastest

of Matrix              in Seconds       (Smaller Is Better)

Statically sized

 built-in 2-d arrays      0.0470               1.0

SimpleMatrix objects      0.0660               1.40

Matrix objects            0.0621               1.32

Go to Chapter 8 Return to Table of Contents