An array is an *aggregate *object, which is composed of other objects, its *elements. *The *dimension *of an array is the number of objects that it contains. Its *size *is given by multiplying the dimension by the size of each element.

*Topics assumed from previous book: *arrays and subscripting [3.14], array arguments [5.4].

In C, the declaration of one variable, such as line in

static char line[10] = "msg";

consists of five components:

static storage class (optional)

char type specifier

line[10] declarator

= "msg" initialization (optional)

; semicolon (of course)

Strictly speaking, the storage class and the type specifier can appear in any order, but declarations are clearer in a conventional order:

**Rule 3-1: **Storage class (if any) should precede the type specifier.

The examples will henceforth assume this rule.

For another variation, several variables can be declared in one declaration:

static char line_a[10], line_b[10];

static char line_a[10];

static char line_b[10];

It is a common error to assume that the following initialization applies to both line_a and line_b:

static char line_a[10], line_b[10] = "msg";misleading

**Rule 3-2: **If a variable has an initialization, its declaration should have a source line to itself.

Setting aside the above complications, we will consider a declaration to consist of an optional storage class, a type specifier, a declarator, and an optional initialization. Now let us restrict our consideration to the non-optional parts, the type specifier and the declarator. For the rest of this section, when we refer to a "declaration," we will be referring only to the mandatory parts.

short i;

short m[2];

the declarator m[2] contains the variable name m plus a part of the type. The type consists of what is left when the variable name is "scratched out" of the declaration, namely short [2]. In this sense, we can say that a declaration is a "sandwich" composed of a variable name and a type. The type is the "bread" wrapped around the variable name "meat" in the middle:

short m[2] is the declaration

m is the name

short [2] is the type

In C, the representation of an array consists of a sequence of contiguous objects (the *elements *of the array), running from low addresses to high addresses.

The meaning of an array declaration depends upon context. A parameter (typically declared with no stated dimension) is understood by C to be a pointer rather than an array (see Section 3.4). And a non-defining declaration of an external array, such as

extern short em[];

is an allowable form which declares an array of unknown size. (The size is determined by the defining instance, but a separately compiled function has no built-in way of determining that size.)

For an array whose size *is *known from the declaration (externally defined, or declared to be auto or static), we can define a useful macro for the dimension:

#define DIM(a) (sizeof(a)/sizeof(a[0]))

This book's system of properties uses different names for aggregate properties than for scalar properties; this will allow us to say some simpler things about pointers later. An array is either *complete *(all scalar elements contained within the array are defined) or *incomplete *(one or more elements are not defined).

Some arrays have other properties that are more important than "complete" or "incomplete." A "string," for example, requires defined characters only up to a nul (i.e., '\0') terminator. The characters after the nul terminator can be total garbage, and the array still has its defining property satisfied, namely being a string.

char s[10]; /* : string */

For any array, there are boundary values to be considered at the lowest and highest subscripts. These will always be important in determining valid access ("avoiding out-of-bounds reference"). Every array algorithm requires verification of the proper handling of the subscript boundaries.

A second issue concerns the "one-too-far" value required to terminate an ordinary for loop that walks over an array. As discussed in Section 2.7, if subscripts can be unsigned integers, there is always a valid "one-too-far" subscript value at the high end, but none at the low end.

In many cases, the defining property for an array is more restrictive than mere completeness. Let us now consider an array as an object to be *sorted. *In doing so, we will need a notation for a *subarray, *i.e. a contiguous sequence of elements of an array. The notation a[low:high] will be used to mean the sequence of elements from a[low] to a[hi]. And the notation a[*] will mean "the entire array a." There are, of course, no such notations in C itself, but the notations will help us to speak more clearly about array algorithms.

To focus for now upon the general problem of sorting the elements of an array, the basic invariant property of such an array is that it contains individual-element values, the set of which remains unchanged from the start to the finish of the sorting process. In other words, the set of values remains invariant, although the positions of those values will be altered as the sort progresses. This is analogous to the sorting of a deck of cards; the values written on the cards do not change, but the ordering does change. To be precise, the array is a "multiset," which is like a set but individual values can occur more than one time. A deck of bridge cards is a proper set; each card appears only once (considering suits as distinct, as of course they are in bridge). A deck of Canasta cards is a multiset; each card appears several times in the deck.

We need to define one more property of an array. An array a is said to be *sorted in ascending order *if, for each i from 1 to DIM(a)-1, a[i] is greater than (or equal to) each element of a[0:i-1]. (An array containing only one element will be considered to be sorted, trivially.) More generally, a[low:high] is sorted (in ascending order) if, for each i from low+1 to high, a[i] is greater than (or equal to) each element of a[low:i-1].

The only allowable operations upon multiset arrays are those that perform a re-ordering without altering the actual values in the array. We have seen one such operation, in the form of the SWAP macro. Now we will examine the operation of *sorting an array*. For simplicity, we will restrict ourselves to an array of *short *integers. The "sorting" operation will take the form of a function with two arguments: an array parameter a designating the array to be sorted, and an integer n designating the number of elements in the array to be sorted. The function is supposed to arrange the n elements of a such that a[0:n-1] is sorted.

Assertions and run-time validation

Assertions and run-time validation

In designing an algorithm to accomplish an operation such as sorting, the key design principle is to formulate a loop invariant which is true at each step of the computation, and which guarantees that the loop has attained its goal when the loop terminates.

The first sorting algorithm we will consider is called *insertion sort. *Using i as the loop subscript, the invariant of the insertion sort is "the subarray a[0:i-1] is sorted." The loop terminates when i reaches the "one-too-far" value, n. When it does, the invariant becomes "the subarray a[0:n-1] is sorted," which is the goal of the loop.

We will use the notation "*name => property*" to mean "*name *now has this *property.*" Describing the loop invariants in program comments is often very useful documentation; we could, for example, write

for (i = 1; i < n; ++i)

{

/* a[0:i-1] => sorted */

inserta[i]into its correct place

}

/* a[0:n-1] => sorted */

To insert a[i] into the correct place, we copy it into a temporary (t), thus freeing up the location a[i]. Then, for each element of a[0:i-1] which is greater than t, we move it one position to the right, then insert t into its ordered position. (This is the insertion sort algorithm from Bentley [1984a].) Implemented as a function, this is what our algorithm looks like:

isort:

/* isort - sort array a (dimension n) using insertion sort

*/

#include "local.h"

void isort(a, n)

short a[]; /* array to be sorted; a[0:n-1]: multiset */

index_t n; /* dimension of a */

{

index_t i;

index_t j;

short t;

for (i = 1; i < n; ++i)

{

/* a[0:i-1] => sorted */

t = a[i];

for (j = i; j > 0 && a[j-1] > t; --j)

a[j] = a[j-1]; /* a[0:n-1] => !multiset */

a[j] = t; /* a[0:n-1] => multiset */

}

/* a[0:n-1] => sorted */

}

We have placed comments in our code that indicate the intended invariant conditions, but unfortunately in real life, errors in creating and maintaining the program may cause such comments to be false and misleading. A more reliable way of stating such properties is to insert *executable assertions* in the code.

tst_sort:

/* tst_sort - returns YES if array a is sorted

* specialized "short" version

*/

#include "local.h"

bool tst_sort(a, n)

short a[];

index_t n;

{

index_t i;

if (n <= 0)

return (NO);

for (i = 1; i < n; ++i)

{

/* a[0:i-1] => sorted */

if (a[i] < a[i-1]) /* compare adjacent elements */

return (NO);

}

/* a[0:n-1] => sorted */

return (YES);

}

According to our guidelines, we would not test this assertion inside the loop of isort. The time to test the assertions would become many times greater than the time to execute the code itself. But we could replace the final comment in isort.c

/* a[0:n-1] => sorted */

with this executable assertion:

asserts(tst_sort(a, n-1), "a is sorted");

Now we will consider the same underlying data structure, with a new algorithm, "quicksort."

For consistency, we will create a function with a calling sequence similar to that above:

void qksort(a, n)

short a[];

index_t n;

Given an array or subarray, we pick the middle element and call it the *pivot.* Then the array is rearranged: those elements less than the pivot are moved to the left of the pivot, and those greater than the pivot are moved to its right. For convenience, we start by swapping the pivot with the initial element. The lowest subscript is lo, the highest subscript is hi, the subscript of the rightmost "less-than-pivot" element is lastlo, and the subscript of the element currently being tested is i. (This quicksort is somewhat simpler than most; it is adapted from Bentley [1984a].) During the loop of the sort, the invariant looks like this:

When the loop terminates, this is the picture:

Swapping a[lo] with a[lastlo] gives

Thus we end up with two smaller subarrays, a[lo:lastlo-1] (all of whose elements are strictly less than a[lastlo]) and a[lastlo+1:hi] (all of whose elements are greater than or equal to a[lastlo]). If the array is trivial (only one element), then both subarrays are empty; there is no work to do, and the function returns immediately. If there is more than one element, at least one of the subarrays is non-empty. The function calls itself recursively for both subarrays. When it returns, the array a[lo:hi] is sorted.

There is little to add here except to assert tst_sort after completion of the function. The function is so short that assertions during the algorithm would be needlessly complex in comparison.

qksort:

/* qksort - sort array a (dimension n) using quicksort */

#include "local.h"

/* iqksort - internal function to partition a[lo:hi] */

static void iqksort(a, lo, hi)

short a[]; /* array to be sorted; a[lo:hi]: multiset */

index_t lo; /* lowest subscript */

index_t hi; /* highest subscript */

{

index_t mid = lo + (hi-lo)/2; /* : {lo:hi} */

index_t i; /* : {lo+1:hi+1} */

index_t lastlo; /* : {lo:hi-1} */

short tmp;

SWAP(a[lo], a[mid], tmp);

lastlo = lo;

for (i = lo + 1; i <= hi; ++i)

{

if (a[lo] > a[i])

{

++lastlo;

SWAP(a[lastlo], a[i], tmp);

}

}

SWAP(a[lo], a[lastlo], tmp);

if (lastlo != 0 && lo < lastlo - 1)

iqksort(a, lo, lastlo - 1);

if (lastlo + 1 < hi)

iqksort(a, lastlo + 1, hi);

}

/* qksort - external entry point */

void qksort(a, n)

short a[]; /* array to be sorted; a[0:n-1]: multiset */

index_t n; /* number of elements */

{

if (n > 1)

iqksort(a, 0, n - 1);

asserts(tst_sort(a, n), "a is sorted");

}

The computation of mid is chosen to avoid a possible overflow. In the simpler expression

(lo + hi) / 2

the sum is susceptible to overflow for large values.

In Plum and Brodie [1985], optimization of this function is discussed at length. It is important to note that this recursive version of quicksort is vulnerable to some array values that cause the depth of the recursion stack to almost reach the number of elements in the array.

*Topics assumed* *from previous book*: string [2.4].

In C, a *string* is an array of char objects, containing a nul terminator '\0' that marks the end of the current string value. A *string constant* is written with double-quotes, like

"hello, world\n"

C does not require that every array of char'S must be a string; indeed, the only built-in knowledge of "string" semantics is that string constants receive a nul terminator. Nonetheless, a char array is so frequently used to hold a string that we will identify a special property to signify a nul terminated string: an array has the *string* property if all the characters up to the first nul terminator are defined.

char a1[2] = {'a', 'b'};

is not a string, because there is no nul terminator within a1. On the other hand,

char a2[2] = "a";and

char a3[2] = {'a'};

are both strings, according to the properties of string constants and of static array initializers.

*Topics assumed from first book:* pointers as function parameters [7.3], pointers and arrays [7.4].

In this book, we will follow the usage of declaring pointer parameters to scalars with the "asterisk" notation and declaring pointer parameters to arrays with the "empty brackets" notation. Thus the function definition

fn(a, b)

char *a;

char b[];

fn(table)

char table[] [20];

fn(table)

char (*table)[20];

Relying upon the foregoing context, we will henceforth refer to *array parameters* when talking about pointer parameters which point to the initial elements of arrays. Similarly, *string parameters* are pointer parameters which point to the initial char in a string.

Further generalizing, we will use the term *array pointer* for any pointer used to point to array elements. (The term "pointer into array" might be more precise, but the brevity of "array pointer" is worthwhile because the term will be used so often in discussing array algorithms.) Remember, it would be inaccurate to call such a pointer a "pointer to array" -- an array pointer points to single elements, not to an entire array.

In order to talk sensibly about the object being referenced we will need some new terminology. Because of the way C references arrays through pointers, the array being accessed is not simply the "indirect object" of the pointer (*p), because *p is just the individual array element pointed to by the pointer. We define the *target* of an array pointer to be the array object being accessed through the pointer. We will use the notation p[*] to mean "the entire array to which the pointer provides access."

int i;

int *p = &i;

printf("%d\n", p[12000]);wild and dangerous usage

short a[10];

short *pa = a;

the indirect object of p (*p) is the element a[0]. The target of (p[*]) is the entire array a.

For single-dimension arrays, there is no ambiguity; in the example

short a[10];

short *pa = a;

it is clear that a is intended to be the target of pa. But what of this example:

short b[10] [10];

short *pb2 = b[2];

One concession to common usage of C: The assignment

p = &a[n];

is taken to mean the same thing as

p = a + n;

which means that the entire array a is the target of the pointer.

One further notational convenience: The properties of the target can be attributed to the pointer itself. For example, "pointer p is a string" means that the target of p is an array with the "string" property. This allows more conventional and convenient documentation:

/* p: string */

/* p[*]: string */

What are the *practical* implications of all these distinctions? There are two: determining the properties of the target of the pointer, and providing the possibility of range-checking the use of the pointer (i.e., avoiding "run-away pointers").

char *strcpy(s1, s2)

char s1[]; /* s1:!NULL, sizeof(s1[*]) > strlen(s2) */

char s2[]; /* s2:string */

Section 2.6 discussed how scalar data objects could have a range associated with their declaration; the range lasts throughout the lifetime of the variable. For pointers, the situation is different; a pointer can be used to point into several different arrays as the computation progresses. Therefore, we will consider that the range of a pointer variable is dynamically set by each operation upon the pointer.

When a pointer is assigned an expression value involving an array target, the address range of the target determines the range of the pointer. When a pointer is assigned an expression value involving a pointer, the left-hand-side pointer acquires the range of the right-hand-side pointer. (By the rules of C, the right-hand-side can have only one array or pointer operand, so the source of range information is unambiguous.)

Before we say exactly what the range of the pointer is, we have to consider the possible "one-too-far" value of an array pointer. In a loop like

for (p = a; p < a + DIM(a); ++p)

process*p;

If a pointer receives a value outside its range, we will consider that pointer to be *undefined*.

static char m[10][10] = {0}; /* hypothetical address range: 2000:2099 */

char *p;

p = m[0]; /* p == 2000; range {2000:2010} */

/* p can access all of m[0], plus "one-too-far" */

p = (char *)m; /* p == 2000; range {2000:2100} */

/* p can access all of m, plus "one-too-far" */

p = m[2]; /* p == 2020; range {2020:2030} */

/* p can access all of m[2], plus "one-too-far" */

p = p + 1; /* p == 2021; range {2020:2030} */

/* value of p changes, but not its range */

++p; /* p == 2022; range {2020:2030} */

/* again, value changes, but not range */

p += 8; /* p == 2030; range {2020:2030} */

/* p is now "one-too-far"; valid value, */

/* but cannot be used for indirection */

++p; /* p is now out-of-range; "undefined," strictly */

These rules for the range of pointers are, at the least, useful guidelines for the writing of reliable programs. They also can be enforced by compilers and interpreters which do dynamic bounds-checking [3-1]. They reflect the actual practice of working C programmers who write reliable programs that do not make out-of-bounds references in their use of pointers. The existence of precise rules such as these means that C is a language which can have both the expressive freedom of pointers plus the reliability of bounds-checking.

strfit:

/* strfit - copy s2 to s1, chopping to n chars

* return YES if sufficient space, no if not

*/

#include "local.h"

bool strfit(s1, s2, n)

char s1[];

char s2[];

size_t n;

{

size_t i;

for (i = 0; i = n-1 && s2[i] != '\0'; ++i)

s1[i] = s2[i];

s1[i] = '\0';

return (s2[i] == '\0');

}

An array of strings can be a handy format for the storage of simple tables. Consider

static char meas_codes[][6] =

{"each", "box", "lb");

If we have a string to look up in this table, we can use a simple loop like this:

for (i = 0; i = DIM(meas_codes); ++i)

if (strcmp(s, meas_codes[i]) == 0)

found a match