CHAPTER 4: BASIC ARRAY DESIGN

In Chapters 4 through 7, we'll take an in-depth look at arrays. You might find it surprising that, even though an array is basically a simple data structure, there are many issues involved in it's design. C++ has built-in support for arrays, but this support is fairly rudimentary and is often more low-level than you might like. It can be advantageous to create user-defined arrays, so that's what we'll be doing in this set of chapters. Chapter 4 focuses on basic one-dimensional, homogenous arrays, with some discussion of multi-dimensional arrays. Other kinds of arrays will be covered in the next three chapters.

We begin by providing a brief review showing how arrays can be implemented using the built-in facilities of C++. Along the way, we'll explain some of the terminology associated wth arrays. Although some of this review may seem elementary, it provides an important foundation for the the design of the array classes that we'll show later in this and other chapters.

AN ARRAY CLASS HIERARCHY

The design for arrays given in this chapter centers around a hierarchy of classes, as shown in Figure 4.2. All of these classes are templates so that we can use them for any type of array element. The Array class is at he top of the hierarchy. This class serves as the common interface to all the array types, which are as follows:

Fixed-length arrays, including those that are statically and dynamically sized.

Variable-length arrays, where internal indexes keep track of current versus dimensioned lengths. Both statically and dynamically sized arrays are supported.

Dynamic, resizable arrays, whose dimensions can grow and shrink on demand.

BASE ARRAY CLASS

The Array class has the following basic features:

Storage for the array elements is allocated elsewhere.

All copying is done via element-by-element assignment.

The logical length and dimensioned length are differentiated, although only the logical length is actually stored.

Subscript checking is controllable at compile time.

To make functions that use arrays the most general, Array-typed pointers or references can be used for the parameters. This allows any type of array in the hierarchy to be passed to the functions.

Figure 4.2 The Array Class Hierarchy.

Here is the Array class definition:

template<class TYPE>

class Array {

protected:

TYPE *data;    //  Pointer to actual data

unsigned len;  //  Logical length

public:

Array(TYPE *m, unsigned n);

virtual ~Array();

virtual void CopyN(const TYPE *s, unsigned n);

void Copy(const Array<TYPE> &s);

Array<TYPE> & operator=(const Array<TYPE> &s):

#ifndef NO_RANGE_CHECK

unsigned CheckIndx(unsigned i) const;

#endif

TYPE &operator[] (unsigned i);

const TYPE &operator[] (unsigned i) const;

unsigned Length() const;

virtual unsigned DimLength() const;

TYPE *Data();

const TYPE *Data() const;

}:

The code for the Array class is given on disk in the files array.h and array.mth.

Aliased Data

An Array object doesn't actually hold array elements; it stores a pointer to them. We assume this data is allocated elsewhere. For example, we might wrap an Array object around a low-level built-in array so that we can provide subscript checking for the elements. Here's how to declare such a "wrapper array":

int low_level_array[3] = {10, 2, 4};

Array<int> wrap_array(low_level_array, 3);

The Array( ) constructor is used to initialize the wrapper:

template<class TYPE>

Array<TYPE>::Array(TYPE *m, unsigned n)

: data(m), len(n)

// Nothing more to do

Note that we don't provide a default constructor for Array. We chose not to because you shouldn't really declare Array objects directly (unless you are trying to create a wrapper array). Instead, you should be declaring objects from classes derived from Array. True to form, however, we do provide a virtual destructor, even though it doesn't do anything. Some of the derived array classes will use dynamic allocation for the array data, and we'll need to override the destructor to de-allocate this memory.

Copying Arrays

One important member function is CopyN( ), which allows us to copy data from a low-level array into an Array object:

template<class TYPE>

void Array<TYPE>::CopyN(const TYPE *s, unsigned slen)

unsigned clen = DimLength ( );

if (clen > slen) clen = slen;

for (unsigned i = 0; i<clen; i++) data[i] = s[i];

In CopyN( ), we only copy as many elements as the source has, and never more than the target array is dimensioned to handle. The length of the array never changes, as illustrated in Figure 4.3. Also, the copying is carefully done, element-by-element (rather than by using something like memcpy( )). This approach allows any overloaded assignment operator that TYPE might have to be called correctly. For example, the statement data[i]=s[i] is actually translated into:

data[i].operator=(s[i]);

Figure 4.3 Copying fixed-length arrays.

It is extremely important to strictly adhere to element-by-element assignment during copying, because it allows the appropriate assignment operator to be called, which may be necessary to avoid aliasing.

The CopyN( ) function is used by the Copy( ) function, which in turn is used by the overloaded assignment operator:

template<class TYPE>

void Array<TYPE>::Copy(const Array<TYPE> &s)

CopyN(s.data, s.Length());

template<class TYPE>

Array<TYPE> &Array<TYPE>::operator=(const Array<TYPE> &s)

if (this != &s) Copy(s);

return *this;

Note that the assignment operator traps the case where we try to assign an array to itself; thus, we prevent redundant copying.

Subscript Checking

Array objects, unlike low-level arrays, allow us to provide range checking on subscripts. This can be accomplished by overloading the subscript operator. For performance reasons, it's also useful to allow range checking to be turned on and off at will. How can we accomplish this? Three methods are possible:

1. Allow range checking to be controlled at runtime, either for all arrays at once, or for individual arrays. The checking can be controlled by using internal flags inside the arrays or by using virtual functions.

2. Allow range checking to be controlled only at compile time by using conditional macros.

3. Provide direct access to the array data through an access function, allowing the overloaded subscript operator to be bypassed (and thus the range checking).

With runtime control of range checking, either a flag must be tested or a virtual function call must take place each time a subscript is used. The flag test or the function call allow us to determine whether the index should be checked. Here's one way this could be done:

TYPE &Array::operator[](unsigned i)

if (range_check) i = CheckIndx(i);

return data[i]:  //  Use low-level subscripting

This approach leads to runtime overhead--even when range_check is false--and the overhead can be significant for tight loops. By using a conditional macro at compile time instead, we can turn off range checking and thus have no overhead at runtime. In conjunction with the macro approach, we can also allow direct access to the array elements in order to bypass the overloaded subscript operator. This approach allows us to turn off range checking locally in code where speed is critical (and where the code has been thoroughly tested). The two Data( ) functions (one for non-const arrays, one for const arrays) are supplied for this purpose. Here are the Data( ) functions, followed by an example of their use:

template<class TYPE>

TYPE *Array<TYPE>::Data()

return data;

template<class TYPE>

const TYPE *Array<TYPE>::Data() const

return data;

void f(Array<int> &arr)

// Some function that's been thoroughly tested for

// proper bounds on array indexes

int *a = arr.Data( );  //  Get to the low-level array

...

a[17] = 42;

...

void g(const Array<int> &arr)

int *a = arr.Data( );  //  Const form of Data() is used

int x = a[17];         //  Legal access

a[0] = 1;              //  Illegal access

...

In this chapter, we'll use the conditional macro approach combined with the raw data access functions. This combination allows us to use range checking if we need it, but there will be no residual overhead if we don't need range checking. Although this approach isn't as flexible as allowing runtime control of range checking, it meets one of our main design goals: we shouldn't have to pay for something that we're not going to use.

We'll now show how to implement the conditional macro approach. The overloaded subscript operator looks like the following:

template<class TYPE>

TYPE &Array<TYPE>::operator[ ](unsigned i)

return data [CHECK (i)];

The operator function checks the incoming index by calling the macro CHECK( ), which is conditionally defined as follows:

#ifdef NO_RANGE_CHECK

#define CHECK(i) i

#else

#define CHECK(i) CheckIndx(i)

#endif

The macro symbol NO_RANGE_CHECK is used to control range checking. If this symbol is defined, then Check( ) becomes a no-op. Otherwise, a call to the function CheckIndx( ) is made, which actually does the range checking:

template<class TYPE>

unsigned Array<TYPE>::CheckIndx(unsigned i) const

if (i >= len) i = HandleRangeErr(i, len);

return i;

HandleRangeErr is a pointer to a function that's designed to handle out-of- range subscripts. We've provided a default error handler that prints an error message and exits the program:

unsigned (*HandleRangeErr)(unsigned i, unsigned sz)

= DefaultRangeErrHandler;

unsigned DefaultRangeErrHandler(unsigned i, unsigned sz)

cout << "Subscript " << i << " out of range (0. "

<< (sz-1) << ")\n";

exit(1);

return 0; //  Here to prevent compiler warnings

We could also define alternate handlers. All we would need to do is point to them via the HandleRangeErr function pointer.

Note Future versions of C++ promise more elegant error handling facilities that could be used in place of the function pointer technique.

There is also an alternate subscript operator function, which is used to provide read-only subscripting of const arrays:

template<class TYPE>

const TYPE &Array<TYPE>::operator[](unsigned i) const

// Subscripting operator for constant arrays.

return data[CHECK(i)];

Note carefully the two uses of the const keyword. These keywords allow us to read the value of an element of a constant array, but they don't allow us to write to that element. For example:

int low_level_array[3] = {10, 2, 4};

const Array<int> myarray(low_level_array, 3);

int x = myarray[2]; //  OK

myarray[2] = 17;    //  Illegal

Without the const version of the subscript operator, even reading from a constant array element would be disallowed.

Logical versus Dimensioned Length

The functions Length( ) and Dimlength( ), respectively, return the logical and dimensioned length of an array:

template<class TYPE>

unsigned Array<TYPE>::Length() const

return len;

template<class TYPE>

unsigned Array<TYPE>::DimLength( ) const

return len;

Notice that the dimensioned length by default is the same as the logical length, but a derived class can override the virtual DimLength( ) function to change this. To save space, we avoid actually storing the dimensioned length in the Array object--in case both the logical and dimensioned length are the same (as it will be for fixed-length arrays). Again, our arrays have no more overhead than is absolutely necessary.

FIXED-LENGTH ARRAYS

With the base Array class defined, it's easy to derive fixed-length array classes. These are arrays that have the same logical length and dimensioned length. We'll define the two classes Darray and Sarray, which support dynamically and statically sized arrays, respectively. Here is the Darray class:

template<class TYPE>

class Darray : public Array<TYPE> {

public:

Darray(unsigned n);

Darray(const Darray<TYPE> &s);

Darray(const Array<TYPE> &s);

Darray(const TYPE *s, unsigned n);

virtual ~Darray();

};

The constructors of Darray work by allocating the array elements using new, and by passing a pointer to these allocated elements to the base class constructor. Here are two constructors that illustrate this:

template<class TYPE>

Darray<TYPE>::Darray(unsigned n)

//  Constructs an array of n dynamically allocated elements

: Array<TYPE>(new TYPE[n], n)

//  Nothing else to do

template<class TYPE>

Darray<TYPE>::Darray(const Darray<TYPE> &s)

// Copy constructor. Allocates enough room, and

// copies data from the source.

: Array<TYPE>(new TYPE[s.Length()], s.Length())

Copy(s);

The destructor de-allocates the array elements:

template<class TYPE>

Darray<TYPE>::~Darray()

// De-allocates dynamically allocated array data

delete[] data;

In contrast, the Sarray class uses a low-level array internally, which is sized and allocated at compile time:

template<class TYPE, unsigned N>

class Sarray : public Array<TYPE>  {

protected:

TYPE storage[N]; //  Statically sized array data

public:

Sarray( );

Sarray(const Sarray<TYPE, N> &s);

Sarray(const Array<TYPE> &s);

Sarray(const TYPE ^*s, unsigned n);

Sarray<TYPE, N> &operator=(const Array<TYPE> &s);

};

In addition to the usual TYPE parameter, Sarray has the parameter N, which determines the size of the array. This parameter is used to declare an embedded low-level array of N elements that we've called storage. At construction time, the data pointer in the base Array class is linked to storage, as shown in the Sarray default constructor:

template<class TYPE, unsigned N>

Sarray<TYPE, N>::Sarray( )

// Default constructor creates an array of length N

: Array<TYPE>(storage, N) // Link data to storage

//  Nothing more to do

The other Sarray constructors are defined similarly. Note that no destructor is defined for the Sarray class, since we don't need to delete the statically allocated elements. (A default destructor is created by the compiler, though, which calls the TYPE::~TYPE( ) destructor for each element in the array.)

The code for the Darray and Sarray classes is given on disk in the files darray.h, darray.mth, sarray.h, and sarray.mth. Two test programs are supplied: tstarr.cpp and tstarr2.cpp.

Fixed-Length Array Assignments

Static versus Dynamic Arrays

Fixed-Length Array Assignments

The Sarray class has an overloaded assignment operator function:

template<class TYPE>

Sarray<TYPE.> &Sarray<TYPE>::operator=(const Array<TYPE> &s)

if (this != &s) Copy(s);

return *this;

We need to make two important points about this operator function. First, by passing a base class Array reference for the source of the assignment, we allow any object from the Array class hierarchy to be assigned to a Sarray object, such as a Darray object. Second, notice that the body of this function is identical to the Array class version. So why did we need to redefine it?

Recall how assignments in derived classes work: without an overloaded assignment function, the additional derived class members are copied member by member. Then, the base class members are copied either memberwise or with an overloaded assignment in the base class. Figure 4.4 illustrates this scenario. If we had no assignment function in Sarray, we would create a problem for the the storage member of Sarray. We want storage to be copied only once. Without the Sarray assignment function, storage would be copied as part of the default derived class assignment behavior, and then copied again via the Copy( ) function inside the Array base class assignment function. Defining an assignment function in Sarray prevents this double copying.

Figure 4.4 A derived class assignment scenario.

class Base   {                     class Derived : public Base  {

protected :                        protected :

char *p, *q;                       char *r;

public :                           public :

Base() :                          Derived();

~Base();                          ~Derived();

Base &operator=(const Base &x);  };

};

You may have noticed that the Darray class doesn't have an assignment operator function. Why not? The Darray class has no additional data members defined, so the base Array class assignment function will handle everything properly. This is a subtle point. So, for consistency, you may want to define a Darray assignment function. (It would look identical to the Sarray version.) We haven't done so here because we want to minimize the amount of code generated by the Darray class. Also, to save space, Darray was not given a default constructor. It's not clear what the default constructor should do anyway. Should it make an array with no elements, or an array with a default length of, say, 10 elements?

Static versus Dynamic Arrays

Dynamic arrays are versatile because their dimensions can be determined at runtime. This same versatility makes dynamic arrays less efficient than those sized statically, since time is spent at runtime allocating and de-allocating the array data. This level of efficiency may be critical when arrays are used inside functions that are called frequently. For instance:

void f()

// Some function called frequently

Darray<int>  myarr(10);  //  Allocated on heap

...

// Destructor called automatically to de-allocate array

Contrast this with the static allocation of the array:

void g()

// Some function called frequently

Sarray<int, 10> myarr;  //  Allocated on stack

...

// Array vanishes as stack is unwound

In the static array version, the array is allocated on the stack, which involves a simple manipulation of the stack pointer. The dynamic array version involves a much more involved heap management process. In addition, space overhead is needed for heap management. And don't forget the simple fact that a heap is required. If you want the fastest, most efficient arrays, and you aren't using a heap in your program for other purposes, you might want to use statically sized arrays.

Of course, a tradeoff is involved here. Since the size of the static array is a template parameter, a new class is generated every time you declare a static array of a different size. For instance, the following code causes three classes to be generated (one base class for character arrays in general, and two derived classes for different-sized arrays):

typedef  Sarray<char, 10> tenchar;

typedef  Sarray<char, 20> twentychar;

For this reason, you should limit the number of different-sized static arrays that you declare in your application (and limit the number of arrays of different element types). Note that dynamic arrays do not have this problem, since the array length is passed as a parameter to the constructor.

It's important to understand the difference between sizing an array and allocating it. For instance, statically sized arrays don't have to be statically allocated. You can allocate them dynamically, as in:

Sarray<int, 10>  *p = new Sarray<int, 10>;

We might do this, for instance, to free up memory in the static data segments of our program, without having to implement both the static and dynamic array classes.

VARIABLE-LENGTH ARRAYS

A variable-length array is one where the logical length of the array is different from the dimensioned length. With a variable-length array, we can add elements to the array (thereby increasing the logical length) and remove elements (thereby decreasing the logical length).

As is true with fixed-length arrays, variable-length arrays can be either dynamic or static. Both of these types of arrays have much in common--in particular, the code that handles adding and removing elements. So it makes sense to make a general variable-length array class, and then derive specific dynamic and static versions. For instance, we've created the Varray class (which is derived from the Array class) and two classes derived from Varray--DVarray and SVarray. The Varray class is defined as follows:

template<class TYPE>

class Varray : public Array<TYPE>  {

protected :

unsigned dimlen;

public :

Varray(TYPE *m, unsigned dim_sz);

void CleanUpElements(unsigned s, unsigned f);

virtual void CopyN(const TYPE *s, unsigned slen);

Varray<TYPE> &operator=(const Array<TYPE> &s);

virtual unsigned Dimlength() const;

virtual unsigned InsertNAt(unsigned p, const TYPE *s, unsigned n);

unsigned InsertAt (unsigned p, const TYPE &s);

unsigned InsertAt (unsigned p, const Array<TYPE> &s);

unsigned Concatenate (const TYPE *s, unsigned n=1);

unsigned Concatenate (const Array<TYPE> &a);

unsigned Add (const TYPE &s);

unsigned DeleteAt (unsigned p, unsigned n=1);

unsigned Shrink (unsigned n);

};

The code for the Varray class is given on disk in the files varray.h and varray.mth. Two test programs are supplied: tstvarr.cpp and tstvarr2.cpp.

As is the case with the Array class, the Varray class doesn't dictate how the array elements are allocated. Instead, it aliases the elements through the following constructor:

template<class TYPE>

Varray<TYPE>: :Varray(TYPE *m, unsigned dim_sz)

: Array<TYPE>(m, 0)

dimlen = dim_sz;

The base Array constructor is called to do the aliasing and, surpisingly, a length of zero is passed. This represents the logical length of the newly constructed array. The dimensioned length is recorded in the additional data member dimlen. To differentiate between the logical length and dimensioned length, the DimLength( ) virtual function is overridden to return dimlen:

template<class TYPE>

unsigned Varray<TYPE>: :DimLength() const

return dimlen;

It may seem backwards to have the base class keep track of the logical length and the derived class keep track of the dimensioned length. But this arrangement gives the system of array classes its flexibility and economy.

With the Varray class, you can add single or multiple elements either to the end of the array (that is, after the current logical length) or somewhere in the middle. The InsertNAt( ), InsertAt( ), Concatenate( ), and Add( ) functions handle these chores. The DeleteAt( ) and Shrink( ) functions allow you to delete one or more elements from the middle or end of the array, respectively. All of these functions manipulate the logical length of the array. We'll take a look at these functions later. First we need to discuss an important consideration: user-controlled allocation.

In-Place Construction

Some subtle issues are involved in adding and removing elements from a variable-length array. These issues revolve around two questions: when and where does an array element get constructed? When and where does it get destructed? To have safe, efficient, robust code, we must ensure that the constructors and destructors get called the proper number of times, in the right order, and only when necessary.

With fixed-length arrays, the construction and destruction of elements is quite simple: when the array elements are allocated--either by using new or through a static declaration--the default constructor of the element type is called for each element. When the array is to be destroyed, the destructor is called for each element. For example:

void f( )

MyType *darr = new MyType[17];  //  Mytype constructor called 17 times

MyType sarr[42];  //  MyType constructor called 42 times

. . .

delete[] darr;  //  MyType destructor called 17 times

//  At end of function, MyType destructor called 42

//  times for the elements of sarr

If you inspect the Darray and Sarray classes, you'll see that their array elements are constructed and destructed in this way.

For variable-length arrays, things get trickier. When a variable-length array is first constructed, it has a logical length of zero. Although the space for the array has been allocated, no constructed elements are stored there. That is, the array storage consists initially of unconstructed elements. As elements are added to the array, the constructor must be called for those elements. Figure 4.5 shows a variable-length array with four constructed elements. When elements are removed, the destructor must be called for each removed element. When it is time to destroy the array itself, if there are remaining elements in the array, the destructor should be called for each one.

Thus, the problem boils down to your ability to construct and destruct elements at will. You must be able to do the construction and destruction at the place where the element occurs in the array. This is the tricky part. Normally, when you construct an object, either the compiler and linker control the address of the object (for statically allocated objects), or the heap manager controls the address (via the new operator). Fortunately, C++ provides the tools to let you control the address of the object yourself. For instance, you can override the new operator and pass in the address of the object to be constructed:

Figure 4.5 A variable-length array.

void *operator new(size_t, void *p)

return p;

Actually, the new operator itself won't cause the constructor for the object to be called, but the compiler will insert the constructor call when it sees the new operator. The address of the object to be passed to the constructor (the this pointer) is returned by the new operator. Now you can see why the simple overloaded new operator works: it merely has to return the same address passed in. Note that the first parameter of new (the size of the object, which is required by the C++ syntax) isn't needed by the function. Normally, the size is used to tell the heap manager how many bytes to allocate. But, in this case, the space is assumed to be allocated already.

Note An overloaded new operator function--such as the one we've just given--is sometimes called a placement new operator; it places a new object at a user-defined address. Using a placement new operator to construct an object at a specified address is called in-place construction.

Figure 4.6 illustrates how we could construct the fourth element of an array of Compound numbers (see Chapter 3), and initialize it to the value <9 0/1> using the placement new operator. The address of the second element is passed to new, and then the default constructor is called, as in the following code:

// Declare array of 6 compound numbers, initially unconstructed:

char myarr[6*sizeof(Compound)];

...

new(&myarr[3]) Compound(9);  //  Construct fourth element

Figure 4.6 In-Place construction of an array element.

Note how the array elements are initially allocated as characters, rather than Compound numbers. That's done to make the array have unconstructed bits initially. If we had used

Compound myarr[6];

then the Compound default constructor would have been called for all six elements, which isn't what we want.

You can also call the default constructor (or any other type of constructor, for that matter) with the placement new operator. For example:

new(&myarr[3]) Compound;  //  Default constructor called

With these ideas in mind, the following helper function templates are used by the Varray class to construct the elements of a variable-length array:

template<class TYPE>

void DefConstruct(TYPE *p)

// This function constructs an element at place p.

// Calls the default constructor.

new(p) TYPE;

template<class TYPE>

void DefConstructElements(TYPE *t, unsigned len)

// Default constructs the elements of low- level array t,

// assumed to have been properly allocated

for (unsigned i = 0: i<len; i++) DefConstruct(t++):

template<class TYPE>

void CopyConstruct(TYPE *p, const TYPE &x)

// Copy constructs object at address p, using

// object x for the copy

new(p) TYPE(x):

template<class TYPE>

void CopyConstructElements(TYPE *t, const TYPE *s, unsigned len)

// Here, we assume raw space for t has been properly allocated,

// and we're ready to copy construct the elements of s into them

for (unsigned i = 0; i<len; i++) CopyConstruct(t++, *s++):

The CopyConstruct( ) function template assumes TYPE has a copy constructor. If that's not the case, you can override the function template for a specific type. The overriding function can call the default constructor, and then do an assignmenv. For example:

void CopyConstruct(MyType *p, const MyType &x)

new(p) MyType;  //  Call default constructor

*p = x;         //  Do assignment

In cases where TYPE doesn't have a constructor at all, such as for built-in types, you can simply do an assignment, as shown in the following overridden CopyConstruct( ) function for integers:

void CopyConstruct(int *p, const int &x)

*p = x;  //  Do assignment

Explicit Destructor Calls

Calls to destructors are normally handled automatically by the compiler. You can control when a destructor is called if you use the delete operator. However, this assumes that the object to be destroyed resides on the heap. But what do you do if the object resides as an element of an array? The solution is to make an explicit call to the destructor. Here's an example:

Compound *p = (Compound *) (&myarr[1]);  //  Point to element

...

p->Compound::~Compound( );  //  Explicit destructor call

The explicit destructor call requires that you know the address of the object being destructed, and you can only call the destructor through a pointer. You should never call the destructor for the same object twice, and never for an object that resides on the heap (since it won't be freed properly unless delete is called). The Varray class uses an explicit destructor call in the function CleanUpElements( ):

template<class TYPE>

void Varray<TYPE>::CleanUpElements(unsigned s, unsigned f)

// Cleans up elements from start s to finish f-1, by calling

// destructors explicitly

TYPE *p = data+s;

for (unsigned i=s; i<f; i++, p++) p->TYPE::~TYPE( );

This template function does have a problem, though. What if TYPE has no destructor? In theory, the compiler can simply ignore the explicit destructor call, since nothing needs to be done. In practice, if TYPE is a built-in type--such as int or float--rather than a user-defined class, many compilers will give you an error. For example:

int *p;

p->int::~int( );  //  Some earlier compilers can't handle this

The ability to use "user-defined class syntax" for built-in types--such as in explicit calls to destructors--was added to C++ 3.0. Many C++ 2.1 compilers will have troubles in this area. To solve this problem, a special Destruct( ) function template can be defined, as follows:

template<class TYPE>

void Destruct(TYPE *p)

// Calls the destructor for the object pointed to by p.

// ASSUMES the object was constructed using the placement

// new operator.

p->TYPE::~TYPE( );

Then, if the array you are using is for a built-in type, you can override the function template to do nothing. For example, here's how you can override Destruct( ) for ints:

void Destruct(int *p)

// Do nothing

By using appropriate versions of Destruct( ), we can rewrite CleanUpElements( ) to work for any type, as follows:

template<class TYPE>

void Varray<TYPE>::CleanUpElements(unsigned s, unsigned f)

TYPE *p = data+s;

for (unsigned i=s; i<f; i++) ::Destruct(p++);

Note The code for the placement new operator (and for associated special constructor and destructor helper functions) is given on the disk in the files placenew.h and placenew.mth.

Inserting Elements into an Array

Elements are inserted into a variable-length array by first shifting array elements down to make room for the new elements (unless the elements are added to the end), and then copy-constructing the new elements in place. Figure 4.7 illustrates the process. The workhorse function is InsertNAt( ):

template<class TYPE>

unsigned Varray<TYPE>::InsertNAt(unsigned p, const TYPE *s, unsigned n)

// Inserts the data pointed to by s into the array. Up to n

// elements are inserted (truncating if necesary), starting at

// position p (counted by zero). If p >= len, then the data

// is concatenated to the end.

// Returns number of elements inserted.

unsigned i;

if (n > dimlen - len)  { // Keep things in range

n = dimlen - len;

if (p >= len) { // We're concatenating

p = len;

else {

// Make room for inserted data somewhere in the middle

memmove(data+p+n, data+p, (len-p)*sizeof(TYPE));

// Copy in source

len += n;

for(i = 0; i<n; i++) {

// Copy-construct the elements in place (they're presumed

// to be unconstructed at this time.)

::CopyConstruct(data+p+i, s[i]);

return n;

Figure 4.7 Inserting elements into a variable-length array.

The important feature of this function is that the newly inserted elements are created element-by-element, using a copy-construction process. The other insert functions use InsertNAt( ) to do most of the work:

template<class TYPE>

unsigned Varray<TYPE>::Add(const TYPE &s)

// Add an element onto the end of the array

return InsertNAt(len, &s, 1);

template<class TYPE>

unsigned Varray<TYPE>::InsertAt(unsigned p, const Array<TYPE> &a)

// Insert an array of elements into position p of the array

return InsertNAt(p, a.Data( ), a.Length( ));

template<class TYPE>

unsigned Varray<TYPE>::Concatenate(const Array<TYPE> &a)

// Add array a onto the end of this array

return InsertAt(len, a);

template<class TYPE>

unsigned Varray <TYPE>::Concatenate(const TYPE *s, unsigned n)

// Concatenate an array of n elements, (from low-level memory),

// onto the end of the array.

return InsertNAt(len, s, n);

Deleting Elements from an Array

Deleting elements from an array is the opposite of inserting elements, as illustrated in Figure 4.8. First, the destructor is called for each element to be deleted, and the remaining elements are then shifted up to take the p place of the deleted elements. The DeleteAt( ) function does all the work:

template<class TYPE>

unsigned Varray<TYPE>::DeleteAt(unsigned p, unsigned n)

// Deletes up to n elements from array at position p. If p

// is out of range nothing happens. If n == 0, nothing happens.

// Returns number of elements deleted.

long pel;  //  long is used to prevent overflows during adds

unsigned pe;

if (p < len && n != 0) {

// We may be chopping off the end, so keep in range

pel = p + n;

if (pel > len) pel = len;

pe = unsigned(pel); // Guaranteed to be in range

n = pe - p;

// Now, clean up each element deleted

CleanUpElements(p, pe);

// Now, move elements up to take their place

memmove(data+p, data+pe, (len-pe)*sizeof(TYPE));

len -= n;

} else n = 0;

return n;

Figure 4.8 Deleting elements from a variable-length array.

A special purpose function to delete elements from the end of the array, called Shrink( ), uses DeleteAt( ):

template<class TYPE>

unsigned Varray<TYPE>::Shrink(unsigned n)

// Shrink the array by n elements.

// Return number of elements we've shrunk.

if (n > len) n = len;

return DeleteAt(len - n, n);

Copying Variable-Length Arrays

As is true with all the array classes, an overloaded assignment operator is provided for the variable-length arrays. This assignment operator can use all the different types of arrays as the source:

Varray<TYPE> &Varray<TYPE>::operator=(const Array<TYPE> &s)

// Note that source of assignment can be any array type.

if (this != &s) Copy(s);

return *this;

The assignment operator calls the Copy( ) function, which in turn calls the virtual CopyN( ) function to do the actual copying. The details for copying into a variable-length array are different from those for copying into a fixed-length array. For this reason, the Varray class overrides CopyN( ).

When you copy data into a variable-length array, you might not only be adding elements to the array, but you might also be overwriting existing elements. A copy-construction process is used for adding new elements, whereas an assignment operation takes place for existing elements. Also, if the source array is smaller than the target array, elements must be deleted from the target array. Figure 4.9 illustrates these cases. The following CopyN( ) function for Varray handles the details:

template<class TYPE>

void Varray<TYPE>::Copy N(const TYPE *s, unsigned slen)

// Copies as much data as possible from s into this

// array, truncating if need be. Copying is done via

// element-to-element assignment.

unsigned i, tlen;

// First, copy into the constructed portion

tlen = len;

if (tlen > slen) {

Shrink(tlen-slen); // We might be getting smaller

tlen = slen;

for (i = 0; i<tlen; i++) data[i] = s[i];

len = tlen;

// Now, copy into the unconstructed portion via concatenation

if (tlen < slen) Concatenate(s+i, slen-tlen);

The Concatenate( ) function is cleverly called to handle the elements that must be copy constructed.

Dynamic Variable-Length Arrays

Now that we've created the Varray class, we can derive more specific dynamic and static versions. Here is the DVarray class for dynamic, variable-length arrays:

template<class TYPE>

class DVarray : public Varray<TYPE> {

public:

DVarray (unsigned n);

DVarray (const DVarray<TYPE> &s);

DVarray (const Array<TYPE> &s);

DVarray (const TYPE *s, unsigned n);

virtual ~DVarray();

DVarray<TYPE> &operator=(const Array<TYPE> &s);

};

Figure 4.9 Copying variable-length arrays.

(a) Source longer than target

(b) Source shorter than target

The DVarray class allocates the array elements from the heap, but keeps those elements unconstructed until needed. This is done by creating an array of characters, which is then typecast into an array of TYPE. For the copy constructor, the array elements from the source array are constructed and copied, again by clever use of the concatenate function. (Remember that the array initially has a logical length of zero.)

template<class TYPE>

DVarray<TYPE>::DVarray(const DVarray<TYPE> &s)

//  Copy constructor. Note that we first allocate an

//  array of unconstructed bits. Using concatenate

//  here effectively does the element-by-element copy

//  construction for us.

: Varray<TYPE> ((TYPE *)new char[s.DimLength( )*sizeof(TYPE)],

s.DimLength( ))

Concatenate(s);

The other constructors work in a similar manner. The destructor performs the reverse process. First, any constructed elements remaining in the array are destroyed, and then the storage is freed from the heap:

template<class TYPE>

DVarray<TYPE>::~ DVarray()

//  Clean up after all constructed members

CleanUpElements(0, len);

//  Delete the storage

delete[] (char *)data;

Note the typecast to char* in the delete call. Had we not used this typecast, the destructor for the array elements would be erroneously called--not only for unconstructed elements, but also for elements that had just been destroyed.

The code for the DVarray class is given on disk in the files dvarray.h and dvarray.mth.

Static Variable-Length Arrays

The SVarray class, used for static variable-length arrays, is similar to the DVarray class:

template<class TYPE, unsigned N>

class SVarray : public Varray<TYPE>  {

protected:

char storage[N*sizeof(TYPE)];

public:

SVarray( );

SVarray(const SVarray<TYPE, N> &s);

SVarray(const Array<TYPE> &s);

SVarray(const TYPE *s, unsigned n);

virtual ~SVarray( );

SVarray<TYPE, N> &operator=(const Array<TYPE> &s);

};

As was true with the DVarray class, the storage is allocated as an array of characters and then typecast to an array of TYPE, as the following copy constructor shows:

template<class TYPE, unsigned N>

SVarray<TYPE, N>::SVarray(const SVarray<TYPE, N> &s)

//  Copy constructor. Note that storage is presumed

//  to be unconstructed bits coming in. Using concatenate

//  here effectively does the element-by-element copy

//  construction for us.

: Varray<TYPE>((TYPE *)storage, N)  //  Link to storage

Concatenate(s);

The destructor must clean up any remaining constructed elements, but of course no call to delete is needed:

template<class TYPE, unsigned N>

SVarray<TYPE, N>::~SVarray( )

//  Clean up all constructed members

CleanUpElements(0, len);

The code for the SVarray class is given on disk in the files svarray.h and svarray.mth.

RESIZABLE ARRAYS

A resizable array is a more sophisticated type of array. Not only is a resizable array variable length (in terms of logical length), but its allocated, dimensioned length can also vary. By definition, resizable arrays are dynamically sized. In fact, we can derive a resizable array class, RVarray, from DVarray:

template<class TYPE.>

class RVarray : public DVarray<TYPE> {

protected:

int grow_by;

public:

RVarray(unsigned n, int gb);

RVarray(const RVarray<TYPE> &s);

RVarray(const Array<TYPE> &s);

RVarray(const TYPE *s, unsigned n);

virtual void CopyN(const TYPE *s, unsigned n);

RVarray<TYPE> &operator=(const Array<TYPE> &s);

virtual unsigned InsertNAt(unsigned p, const TYPE *s, unsigned n=1);

int Realloc(unsigned new_dimlen, int keep=1);

int GrowBy(unsigned amt);

int FreeExtra( );

void ChgGrowByInc(int gb);

};

Resizable arrays work as follows: if the array runs out of room while elements are being added, the array grows to a larger size. This is accomplished by allocating a new, larger space, for the array, copying all of the existing elements into the new space, and then deleting the old space. To avoid frequent resizing, the increment for the new size is always above a certain threshold, determined by the grow_by variable, which can be controlled by the user at construction time and through the ChgGrowByInc( ) function. If grow_by is zero, then the array can't be resized.

Realloc( ) is the workhorse function of the RVarray class:

template<class TYPE>

int RVarray<TYPE>::Realloc(unsigned new_dimlen, int keep)

//  NOTE: If an error occurs, the data is left intact, and a

//  0 is returned. Otherwise it returns a 1 to indicate success.

if (grow_by == 0) return 0;

char *new_data = new char[new_dimlen * sizeof(TYPE)];

if (new_data == 0) return 0;

dimlen = new_dimlen;

if (keep) {

if (new_dimlen < len) {

//  Array is getting shorter. We must call the

//  destructor on all orphaned elements.

CleanUpElements(new_dimlen, len);

len = new_dimlen;

//  Copy old data into new space

memmove(new_data, data, len*sizeof(TYPE));

else {

//  We're not keeping any of the data, so clean it up

CleanUpElements(0, len);

len = 0;

delete[] (char *)data;  //  Delete old storage place

data = (TYPE *)new_data;

return 1;

Realloc( ) redimensions the array to the new length of new_dimlen. The keep option flag determines whether the existing elements are to be kept (when the array grows because we're adding elements) or whether the elements can be destroyed (when we're about to replace the contents of the array with data from another array). If the new size is to be smaller, the elements to be chopped off must be properly destroyed as well. Elements to be destroyed are handled through the CleanUpElements( ) function, which makes sure the appropriate destructor is called for each element.

The RVarray class overrides both the CopyN( ) and InsertNAt( ) functions of the Varray class, in order to allow resizing. Unlike all the other types of arrays, copying to a resizable array may change that array's dimensioned length if the source array is longer. Figure 4.10 illustrates the process. After the potential resizing has taken place, the Varray version of CopyN( ) is called, as shown in the following function:

template<class TYPE>

void RVarray<TYPE>::CopyN(const TYPE *s, unsigned n)

if (n > dimlen) Realloc(n, 0);

Varray<TYPE>::CopyN(s, n);

The new InsertNAt( ) function works in a similar fashion, first by handling any necessary resizing, and then calling the Varray version of InsertNAt( ):

template<class TYPE>

unsigned RVarray<TYPE>::InsertNAt(unsigned p,const TYPE *s,unsigned n)

//  Inserts the data pointed to by s into the array. Up to n

//  elements are inserted (truncating if necesary), starting at

//  position p (counted by 0). The array will grow in size

//  if needed and if grow_by != 0. The size it will grow to

//  will be the larger of grow_by and the room_needed.

//  If p >= len, then the data is concatenated onto the end.

//  Returns number of elements inserted, or 0 if error.

unsigned room, needed;

room = dimlen - len;

if (n > room && grow_by != 0) {

needed = n - room;

if (needed < grow_by) needed = grow_by;

//  Grow the array by needed amount. Note that the

//  growth may fail, in which case the array stays

//  the same size, with data intact.

GrowBy(needed);

return Varray<TYPE>::InsertNAt(p, s, n);

Figure 4.10 Copying resizable arrays.

Although the resizing of RVarrays takes place automatically, you can manually resize the array through the GrowBy( ) and FreeExtra( ) functions.

template<class TYPE>

int RVarray<TYPE>::GrowBy(unsigned amt)

//  Grows the dimensions of the array by the specifed amt.

return Realloc(dimlen+amt);

template<class TYPE>

int RVarray<TYPE>::FreeExtra( )

//  Frees all unused space in the array.

return Realloc(len);

The code for the RVarray class is given on disk in the files rvarray.h and rvarray.mth. Two test programs are supplied: tstrva.cpp and tstrva2.cpp.

MULTI-DIMENSIONAL ARRAYS

The Array classes allow you to use any type for the elements. So what will happen if you use an array itself as the element type? The result is an array of arrays, also called a two-dimensional array. Continuing the process further, you can use a two-dimensional array as the element type for a three-dimensional array, and so on. We'll now take a detailed look at the ways we can accomplish this. Note that the techniques shown in this chapter represent just one way to create user-defined, multi-dimensional arrays. In Chapter 7, we'll look at alternative methods--in the context of two-dimensional matrices.

Note In this book, we'll denote a multi-dimensional array of n dimensions as an n-d array.

Static Multi-Dimensional Arrays

Dynamic Multi-Dimensional Arrays

Other Approaches to Multi-Dimensional Arrays

Static Multi-Dimensional Arrays

It is particularly easy to declare user-defined n-d arrays if we fix the sizes statically. This approach involves the use of nested template instantiations of the Sarray class. (Realize that we are not nesting the template definitions themselves, only instantiations of them.) In this example, we define an array type that holds three rows by five columns of integers:

typedef Sarray<int, 5> IntArr5;

typedef Sarray<IntArr5, 3> IntArr35;

Note how the IntArr5 type, used for each row of the array, is used as a parameter to the IntArr3x5 type. The layout of the resulting 2-d array is given in Figure 4.11. We've shown all of the memory used, including the virtual function table pointer for Sarray objects. Note that all of the memory cells shown are actually laid out in memory linearly, contrary to what's shown.

Figure 4.11 A 2-d array implemented with sarray objects.

In this example, we'll declare and use such an array:

IntArr3x5 myarr;

myarr[2][4] = 17;  //  Set the lower-right corner element

Take a look at the double subscripting. The first subscript a[2] causes element 2 of the array myarr to be returned. This element happens to be an array itself, and the second subscript [4] causes element 4 of this secondary array to be returned. The double-subscripting results in the following code, where we call appropriate subscript operator functions from the IntArr5 and IntArr3x5 classes:

//  First call IntArr3x5: :operator[] , then IntArr5: :operator[ ]

(a.operator[] (2) ).operator[] (4) = 17;

With this arrangement, we are able to use the same double-subscripting syntax that we would use with a built-in two-dimensional array. Here, though, we can have range checking done on the subscripts of our user-defined array--by virtue of the overloaded subscript operators.

We can take this approach one step further and define 3-d arrays from 2-d arrays. Here's a 2x3x5 integer array type and an example of using it:

typedef Sarray<lntArr3x5, 2> IntArr2x3x5;

IntArr2x3x5 b;

b [1][2][4] = 17;  //  Set lower-right corner element

Defining even higher-dimensional arrays works in a similar fashion.

You can get more sophisticated and create 2-d static arrays whose logical dimensions vary at runtime by using SVarray instead of Sarray, as in the following code:

typedef SVarray<int, 5> IntArr5;

typedef SVarray<IntArr5, 3> IntArr3x5;

IntArr3x5 a;  //  Logical dimensions start out 0x0

a.Concatenate( IntArr5 () );  //  Add a row

a[0].Concatenate (42);        //  Add a column to row 0

a[0][0] = 17;  //  Change top-left corner element

a[0][1] = 25;  //  Error : second subscript out of bounds

a[1][0] = 17;  //  Error : first subscript out of bounds

This code illustrates the difference between the allocated dimensionality and the logical dimensionality of a 2-d array. Each IntArr3x5 array has 3x5 elements allocated, but starts out with a logical dimensionality of 0x0. In effect, the array has no rows (and no columns for that matter). To use the array, we must add rows, and to each row we must add columns. Vhe total number of rows that can be added is 3, and the total number of columns per row is 5. Each row can actually have a different number of logical columns (but only up to 5), as shown in Figure 4.12. (Again, the actual linearity of the data isn't shown.) In any case, each array actually has 3 rows of 5 columns allocated for its use.

Although there is a considerable amount of power here, the allocated dimensions of the array must be determined at compile time. But what if we want to determine the dimensions at runtime? For instance, suppose we sometimes need 5x100 arrays, but at other times only 3x5 arrays? In cases like this, we may want to defer the sizing until runtime, and only allocate space for 5x100 arrays when necessary.

Dynamic Multi-Dimensional Arrays

By using a technique similar to the one used in the previous section, we can define n-d arrays that have their allocated sizes determined in part at runtime. It's quite easy to size the first dimension at runtime, but unfortunately the other sizes must be set at compile time. For example, with a 2-d array, we can determine the number of rows to be allocated at runtime, but we must fix the number of columns. This method involves using a Darray object instead of a Sarray object for the final dimension. For each row, we can use either Sarray or SVarray objects:

Figure 4.12 A jagged 2-d array implemented with SVarray objects.

typedef SVarray<int, 5> IntArr5;

typedef Darray<IntArr5> IntArrNx5;  //  Note use of Darray

IntArrNx5 a(10);  // Make a 10x5 array

IntArrNx5 b(3);  //  Make a 3x5 array

We can get even more exotic by allowing the number of logical rows to vary after the array has been constructed. For example, we could use DVarray instead of Darray:

typedef SVarray<int, 5>  IntArr5;

typedef DVarray<IntArr5> IntArrNx5;  //  Note use of DVarray

IntArrNx5 a(10);  //  Logical dimensions start out 0x0

a.Concatenate(IntArr5( ));  //  Add a row

a[0].Concatenate(42);       //  Add a column

a[1][4] = 17;  //  Subscript error!

Getting even fancier, we can use an RVarray instead of a DVarray so that we can change the allocated number of rows at runtime:

typedef Sarray<int, 5> IntArr5;

typedef RVarray<IntArr5> IntArrNx5;  //  Note use of RVarray

//  Make an array that's initially allocated 3x5, but that

//  can grow by 2 row increments.

IntArrNx5 a(3, 2);         //  Logical size is 0x0

a.Concatenate(IntArr5());  //  Add three rows

a.Concatenate(IntArr5());

a.Concatenate(IntArr5());  //  Logical size is now 3x0

a.Grow();  //  Allocate room for two more rows

Figure 4.13 shows what the memory layout of such an array might look like after adding some elements.

One thing we can't do, even using the RVarray class, is determine the number of columns to be allocated at runtime. That is, we can't implement the rows themselves with RVarray. (We could use SVarray, though.) Why not? Let's try it:

typedef RVarray<int> IntArrM;

typedef RVarray<IntArrM> IntArrNxM;

//  Starting with 10 rows, but how many columns?

IntArrNxM a(10);  //  Compiler error!

Figure 4.13 A resizable 2-d array declared with the RVarray class.

We could never declare objects of type IntArrNxM. RVarray doesn't have a default constructor, but one would be needed to set up each row. This default constructor would have to specify a default number of columns. But what should the default be? Even if we do supply a default, we're in essence fixing one dimension of the array, which puts us right back where we started.

It's interesting to note that built-in 2-d arrays have a similar problem. For example, it's quite common to pass a 2-d array to a function, where the number of rows is variable, but the number of columns is fixed:

void f(int (*a)[10], int nr)

//  Pass in an array of nr rows and 10 columns

for (int i = 0; i<nr; i++) a[i][0] = 1;  / /  Set first col to 1's

Unfortunately, it's not possible to pass a 2-d array to a function where the columns dimension is also specified by a parameter. The reason involves the way built-in 2-d arrays are implemented using pointers, a topic we'll discuss in Chapter 7.

Other Approaches to Multi-Dimensional Arrays

The inability to specify the number of columns dynamically is a major limitation to the nested template approach to declaring 2-d arrays. And even when we allow the number of rows to vary, using the RVarray class, a lot of pointers and length variables are redundant and add overhead. Fortunately, there are ways around these problems. In Chapter 7, you'll see an alternative design to 2-d arrays that uses a more compact representation, which also allows both dimensions to be determined at runtime.

USER-DEFINED ARRAYS VERSUS BUILT-IN ARRAYS

Let's step back and see how efficient the Array class hierarchy is compared to using built-in arrays. At a minimum, each array object requires space for a pointer to the array data, the array data itself, the length of the array, and a pointer to a virtual function table. Thus, the minimum space overhead, per array, is two pointers plus an integer. If you want variable-length arrays, an extra integer is required to store the dimensioned length. For resizable arrays, add another integer to store the grow_by increment. In any case, only when you need more sophistication do you pay for it.

Since each class in the Array hierarchy is actually a template, class methods will be replicated for each type of array element used. A specific Array class will be generated for any given array element type, plus one or more derived classes. With resizable arrays, four classes are generated per element type: Array<TYPE>, Varray<TYPE>, DVarray<TYPE>, and RVarray<TYPE>. The potential for code explosion does exist here, but it is mitigated by three facts:

You may not have very many different types of objects being stored in arrays for a single application.

Many of the class methods are short enough to be inlined, and often do nothing more than refer to other methods or return a data member value.

You only pay for classes you need. For instance, no RVarray<TYPE> class will be generated if you're not using resizable arrays.

Subscripting will be slower with range checking, and in tight loops the reduction in speed may be significant. For instance, sorting an array with range checking turned on is typically twice as slow as it would be with range checking off. Even with range checking off, subscripting will be slightly slower than it would be if we used low-level arrays directly. The reason: we must access the array elements indirectly through the this pointer of the Array object. A clever compiler might preload a pointer to the elements into a register before a loop is performed. Thus, the subscripting in the loop would be as efficient as possible. Unfortunately, many compilers aren't that clever.

Even though subscripting with range checking is slower, we've made it as efficient as possible by carefully designing the arrays so that the overloaded subscript operator functions are not virtual. Also, the subscript operator functions need only be specified once, in the Array class. The result is minimum speed and space overhead.

ALTERNATIVE ARRAY DESIGNS

We chose to make the base Array class a template. That means whenever we instantiate a derived class array template, at least two classes will be generated: the base class and the derived class. We could eliminate generating a base class every time by using a non-template Array class. This can be done by using a void pointer to point to the array elements, rather than a TYPE pointer. For example:

class Array  {  //  Note, not a template

protected:

void *data;    //  Point to any type of array data

unsigned len;  //  Length of array

int size;      //  Size of an array element

. . .

};

The trouble with this approach lies in array subscripting. Because the type of data isn't known, we must store the size of the array element and use that size during subscripting operations. Worse, we must use a typecast to get the actual type of the array element. This leads both to performance and maintenance problems. We might decide not to define subscripting in the Array class, instead leaving that up to the derived classes (which are templates with specified element types). But then we couldn't pass generic Array pointers or references, rendering functions that use arrays less versatile.

Go to Chapter 5 Return to Table of Contents