Pointers are both powerful and dangerous in C. Programs can perform arbitrary manipulations of their data space in C. This is essential for many of the system-level uses of C. Because C has this expressive power, a style of programming oriented toward portability and reliability must place restrictions on the uses of pointers.
Topics assumed from first book: Pointers [7].
The first rule for reliable use of pointers requires that the programmer fully understand the declaration syntax of C:
Rule 4-1: In each pointer assignment, the right-hand-side value must have exactly the same ("converted") pointer type as the left-hand-side.
Recall from Section 2.9 that the "declared" type of the right-hand-side may differ from the "converted" type. In the assignment
the "declared" type of s is char [10] and the "converted" type is char *, so the assignment is proper. Here, by contrast, is an improper assignment:
On some systems, pointers to char have a different representation than pointers to int. On many systems, the address of s could be an odd number which would cause a hardware trap if we attempted to reference an int at that location. These are some of the reasons why compilers enforce the agreement of pointer types.
We will, therefore, take some time now to attend to the exact form of pointer declarations in C.
As we saw in Section 3.1, a declaration in C is a "sandwich" of variable name and type; the declaration
says that the variable m has type short [2] (array of 2 short's).
Turning now to pointers, the asterisk (*) when used as a declarator-symbol means "pointer-to." Thus,
declares pc to be a pointer to objects of type char, or as it is more commonly said, a pointer to char. (The phrase "pointer to type" really means "pointer to an object of type type.") According to the "sandwich rule," the declaration is composed of the variable name (pc) and the type (char *). For another example,
declares the name ppc to have the type char ** (pointer to pointer to char):
Another way of stating the "sandwich rule" is to list the possibilities:
One important aspect of the declaration rules of C is that each declarator-symbol (* [] ()) has a "precedence" that is strictly analogous to the precedence of the same symbol used as an operator. In other words, the parentheses and square brackets "bind tighter" in declarations than the asterisk does. As these examples get more complicated, we expand our interpretation of the "sandwich rule" to consider the type of expressions, not just names, like this:
Since the square-brackets binds more tightly than the asterisk, the sandwich rule shows that a[i] is a char *. Or, in other words, a is an array of char *'s. Here is the contrasting case:
Here the parentheses cause the declarator *pa to be taken as a unit; the declaration says that *pa is an array of N char's. That is, pa is a pointer to an array of N char's.
Question [4-1] Just to be sure that you understand the "sandwich rule," jot down the variable name and type for the following declarations.
Having discussed the declaration of pointers, let us turn to some of the operators associated with them. Taking first the "address-of" operator (unary &), we note that it can be applied only to things that have an address, namely lvalues that are not declared with register storage class. (Bit-fields, which we have not yet discussed, are also not addressable.) If c is a char variable, then the expression &c evaluates to the address of c. The type of the expression &c is char * (pointer to char). To be more general, if an lvalue x has type T, then the expression &x has the type T *.
Thus, &c is a pointer expression, producing a value that is suitable for assignment to a pointer of type char *, such as pc.
Let us turn back to initializations for a moment. In any initialized declaration, the variable name designates the object that is being initialized. Thus in the declaration
what gets initialized is the variable pc, not the declarator *pc. We could emphasize the point by writing the declaration as
which is fine as far as the compiler is concerned, but rather nonstandard for human readers. The types are correct for the initialization; the name pc and the expression &c have the same type, char *. It would, however, be a type mismatch (and a serious bug) to write
The type of c is just char, and a char expression should not be assigned to a char * variable.
For a more complicated example, consider
Here the pointer pc is being initialized to point to the character named c. And ppc is being initialized to point to the pointer named pc. A picture may help; hypothetical address values are being used for clarity:
What is the type of the expression &pc? The type is char ** (pointer to pointer to char), by the rules above. And this is an appropriate type to assign to ppc, because its type also is char **.
Now we will consider the indirection operator (unary *). This is, of course, the same character (asterisk) which means "pointer to" in a declaration. It has a different, but very compatible, meaning when used as "indirection" in an expression. Using the same example above, the expression *pc means "the thing that pc points to," or "indirect pc," for brevity. Assuming the initialization given above, *pc is an lvalue which could be used interchangeably with the variable c.
If an expression x has the type T *, then the expression *x has the type T. (Adding an asterisk to an expression removes an asterisk from its type.) Therefore, the type of *pc is simply char; *pc is a char lvalue. We can properly assign a char value to *pc, just as we could assign *pc to a char variable.
Similarly, *ppc has the type char *, and (with the initialization above) could be used interchangeably with the variable pc.
One reason why agreement of pointer types is so important is that in several implementations of C, pointers to different types have different internal representations, sometimes even different sizes. In these environments, it is especially important to be sure that all uses of pointer types follow the agreement rules strictly.
Sometimes we have a need to print the numeric value of a pointer itself, e.g. for debugging purposes. ANSI C envisions a new printf format for printing pointer values [4-2]. In the meantime, we suggest a macro PR_PTR(p) which will print pointer p in a fashion which will work in almost all current implementations.
If we were to use this macro to print the two pointers declared above, we could do it like this:
And the output would look like this:
Consider once again
Assuming the same addresses as before, the memory looks like this after the initialization:
We emphasized earlier that it is vital to be clear about the type of pointer variables and expressions. It is also important to be clear about their values.
In high-level languages like C, determination of the value of an expression depends upon whether the expression is an lvalue (something with storage). The value of an lvalue is found by going to the storage and loading the value that is found there. Thus in the example above, the value of c (assuming ASCII) is 49, the numeric value of the constant '1'. Similarly, the value of pc is the value that is found in the storage for pc, namely 104. And the value of ppc is currently l12.
On the other hand, an rvalue (an expression that is not an lvalue) simply is a value: the value of '1' is 49, assuming ASCII. This concept of value is regular and straightforward. It is the the concept that is followed in most high-level languages, including Basic, Fortran, and Cobol. It is, however, different from the behavior of assemblers, in which the value of a symbol is its address.
Consider the expression &x, where x is some lvalue expression. The expression &x is an rvalue: it has a value, but no storage for itself. Thus the value of &pc is (in this example) l 12, the address of pc. The value of &c is 104.
If x is an expression of type "pointer to something," the expression *x is an lvalue, and its value is found in the storage that x is pointing to. Thus, *pc has the value 49 (assuming ASCII).
Question [4-3] Not all of the following "expressions" are valid. For each valid expression, write its value. Write "INVALID" otherwise.
Question [4-4] The function call scanf( "%c", &c) will read one character into c. What is the value being passed as the second argument to scanf?
______
Which of these expressions would accomplish the same result?
In the Question above, the scanf function was told where to store something by being passed a pointer value (i.e. an expression of type "pointer to char"). We now look at short_order, a function which also expects pointer values for arguments:
If the main program contained these declarations
we could call short_order in several ways:
All of these would accomplish the same result.
But short_order(i1, i2) is incorrect, because the values passed to short_order must be pointer values (specifically addresses of short integers).
C is indeed a "call by value" language, but the arguments may be pointer values which tell the function where something is located.
Having discussed the types and values of simple pointers, we now turn to their semantic properties. A NULL pointer is one which contains the unique NULL value. A pointer becomes NULL when the integer zero is assigned to it. A NULL pointer always compares equal to integer zero. An undefined pointer is one that is not NULL, and yet is not pointing to an object of a compatible type (e.g., "garbage," uninitialized). Thus, a defined pointer is either pointing to valid storage, or else is NULL. And when we say that a pointer is non-NULL, we imply that it is defined. So a non-NULL pointer is one that points to valid storage. We can illustrate the importance of these concepts by considering various errors in calling short_order.
Undefined pointers: Suppose we invoked short_order like this:
Then, without initializations or assignments, the values of ps1 and ps2 are undefined (uninitialized, "garbage"). We have, in effect, asked short_order to rearrange the contents of two random memory locations!
pointers: Suppose instead that we invoked short_order like this:
In the short_order function, the local variables pi and pj will receive the value NULL from the calling function. We then encounter a problem evaluating *pi and *pj. The problem is that NULL is not a proper value for indirection, because a NULL pointer does not point to any valid object. A strict checking environment could complain; most will attempt to access memory location zero, producing unpredictable results. So NULL pointers are also no good for the arguments to short_order.
The pointer parameters of short_order are used both to access already-existing values and also to modify the storage that is pointed to. We can refer to such pointer parameters as in-out-pointer parameters. (An in-pointer would be used only to access existing values, but not to modify anything. An out-pointer would be used only to modify something, but not to examine any existing value.) Let us define an incomplete pointer as a non-NULL pointer whose target contains one or more undefined scalars. In other words, an incomplete pointer is being used to access an object which is not totally defined. A complete pointer, then, is one whose target is totally defined. It is all right for an outpointer to be incomplete, since the called function does not care what is already in the pointer's target. The safest default assumption regarding in-pointers and in-out-pointers is that they must be complete, since they must access the values that are in their target. In the short_order example, both pointers must be complete in order for the function call to be reliable.
It is an important part of the definition and documentation of a function to indicate what it expects of its pointer arguments.
Rule 4-2: The default requirement for pointer parameters is that they must point to storage that is entirely defined. Whenever a pointer parameter can accept something else, this should be explicitly stated on that parameter's declaration comment.
There are further requirements for reliable programming with pointers. A pointer can become undefined if the object that it is pointing to should "disappear" during the lifetime of the pointer. Here is a simple example.
The event which causes the pointer pi to become "undefined" is the return from function f1. The specific term for a pointer in this undefined state is a dangling pointer; the pointer points to storage no longer allocated. It is the programmer's responsibility to be sure that the program contains no instances of dangling pointers. (There are, however, techniques by which a strict-checking environment can detect dangling pointers [4-5].)
Rule 4-3: A function in which the address of an automatic variable is assigned to a non-automatic pointer must contain a comment to that effect. In any function with such a comment, each return from the function is an event requiring verification that no dangling pointers are left.
Further consideration of dangling pointers with respect to the free function will be found in Section 7.1.
Remember that an "array pointer" (i.e., "pointer into array") is one used to access an array, not a "pointer to array." When declared as a parameter, an array pointer can be declared either with an asterisk or with empty brackets: char *s and char s[] are equivalent from the compiler's point of view. To make distinctions for the human reader, however, we can use char s[] to indicate an array pointer as opposed to a pointer to scalar. (However, a non-parameter pointer must be declared with the asterisk.)
Several languages provide string variables, and the compiler automatically keeps track of their current length. C does not. Loosely speaking, a string in C is a nul-terminated array of chars. More strictly, string is a particular property for a character array or character array pointer, namely the property of being nul-terminated. And a string constant, of course, always has the property string. Henceforth, when we use the noun "string," we are referring to a character array pointer, a character array, or a string constant, each with the property string.
There are advantages and disadvantages from this scheme in C. The main disadvantage is that more responsibility lies with the programmer for correct handling of strings. One advantage is that the compiler and the generated code are both simpler than in languages with built-in strings. Also, any appropriate region of memory can be treated as a string (which is useful in low-level programming).
Let us look at the string functions in the standard library, with an eye to the reliability requirements of each function.
The parameter s must be a string. If it is not, most implementations of strlen will wander through the memory looking for a nul terminator. The "length" of a string, as always in C, is the number of characters prior to the nul terminator; thus,
returns the value 5.
The parameters s1 and s2 must both be strings. The result is less than zero, zero, or greater than zero according to whether string s1 compares less than, equal to, or greater than string s2. Thus
returns a positive number.
The result cannot be guaranteed to be portable if either string contains a character that is not in the machine's native character set, since the comparison may be a signed comparison on some machines. For example,
will yield a positive result if characters are signed, and a negative result if characters are not signed. Even for characters in the native character set, the collating sequence may be different on different machines.
The s2 parameter must be a string. The s1 parameter must specify some storage into which the string can be copied, so it must be non-NULL. In general, strcpy should be used only when the program makes certain that the string length of s2 is less than the size of the array being referenced by s1. Otherwise, an out-of-bounds assignment will result.
The function strcpy returns the (original) value of s1, the address where the characters have been stored.
The parameter s2 must be a string. The parameter s1 must be a string and its target (the storage that it accesses) must be larger than the sum of the string lengths of both parameters.
The parameter s1 is returned (the address of the new catenated string).
Invoking
will return a pointer to the second character in the string. If there is no occurrence of character c in string s, strchr returns a NULL pointer. Thus, the returned value is either a string (pointer) or a NULL pointer.
The behavior is the same as for strchr, except that the a pointer to the last occurrence (or ) is returned. Thus,
will return a pointer to the third character of the string.
Parameters s1 and s2 must both be strings. The function returns a pointer to the first occurrence in s1 of any character in s2. Thus
will return a pointer to the fourth character of the first string. If no characters from s2 are in s1, strpbrk returns a NULL pointer.
Both parameters must be strings. The function returns the length of the initial segment of string s1 which consists entirely of characters from string s2. Thus
will return the number 2.
Both parameters must be strings. The function returns the length of the initial segment of string s1 which consists entirely of characters not from s2. Thus
will return the number 3.
This function splits string s1 into "tokens," more specifically, strings of characters delimited by occurrences of the characters in string s2. Calling strtok with a non-NULL argument will produce a pointer to the first token. Calling strtok with a NULL first argument will cause it to point to the next token from s1. Each token receives a nul terminator. Here is an example:
Three output lines will be printed:
The returned value is a string pointer, or else NULL.
One uses string comparisons so often in C that macros are sometimes useful. The local.h header defines three:
Besides being easier to type, the macros allow the possibility of various optimizations of the comparison. For example, in many classes of problems, strings usually differ in their first character. Each macro could be revised, to optimize speed at the expense of code space, something like this:
An array can be treated with semantics other than "string," if that is more convenient. For example, an alternative property for an array of char's is a property known as nul-padded: either the array is completely filled with (defined) non-nul characters, or else all characters after the first nul are also nul.
Similarly, one can maintain a blank-padded array: the array is completely filled with (defined) non-nul characters, of which the final sequence of blanks is considered "padding." Blank-padded arrays are often used in data-processing contexts where C is interfaced with other business-oriented software.
The string library contains the functions strncat, strncmp, and strncpy, which involve yet another array property. This one lacks a conventional name, but we can call it the strn property: An array with this property is either complete up to a nul-terminator, or complete up to the end of the array. In other words, if there is space for a nul-terminator, then one is present. Otherwise, the entire array is filled with (defined) non-nul characters. Here are the descriptions of the strn functions:
strncat copies at most n characters of the array s2 onto the end of string s1. Fewer than n characters are copied if a nul terminator is encountered first in s2. The resulting contents of s1 are always nul terminated, and the address of the resulting string is returned.
The arrays s1 and s2 are compared. The comparison stops when corresponding positions differ, or a nul terminator is reached, or all n characters of both arrays are equal. The returned value is less than zero, zero, or greater than zero according to the sense of the comparison.
The contents of array s2 are copied into array s1. At most n bytes are copied. If the first n bytes of s2 include a nul terminator, the remainder of s1[0:n-1] is nul-padded. Otherwise, s1[0:n-1] is complete but not a string (no nul terminator).
Recent versions of C also include some new functions called the "memory" functions. These operate upon arrays of bytes with no regard for any special meaning of nul. In some environments this allows a more efficient implementation, since some machines have "block move" and "block compare" instructions.
These functions make use of a new concept in recent C, the generic pointer. This is a pointer which is much like a char * in that it can reference any particular byte, but its type allows it to be a "pointer to any data." Thus, you can assign any kind of data pointer to a generic pointer, or assign a generic pointer to any kind of data pointer. No pointer operations are allowed upon generic pointers, however. They are mostly useful for passing pointers to functions that can appropriately accept "pointer to any data." Inside the function, it is necessary to copy the generic pointer into some other pointer before doing pointer operations. In (draft) ANSI C, the generic pointer is a void *, but there are good reasons to use a defined type name instead. We will use the name data_ptr for the generic pointer.
One important reason to use a defined name such as data_ptr is that non-ANSI compilers will not know what a void * is. In these environments, we will need to define data_ptr as char *.
For maximum portability in non-ANSI environments, we should also cast each assignment (or argument passing) when going to or from generic pointers. This, of course, is unfortunate, since the original reason for generic pointers was to avoid so many nuisance casts. (If your compiler does not have lint-like checking of function arguments versus parameters, and it provides the same representation form for all varieties of pointers, you could safely eliminate the casts.) As long as we are still putting casts on function arguments, they might as well be (char *) casts, since this will work fine in both old and new environments.
Here, then, is the description of the memcpy function, using generic pointer notation. (In order to refer to the contents of the target of a generic pointer, we will refer to it as if it pointed to an array of char'S.)
The array s2[0:n-1] is copied into s1[0:n-1]. The pointer s1 is returned.
All the "memory" functions are easy to implement in portable C, if your implementation does not already have them in the library. Here, for example, is a version of the memcpy function:
For an example using memcpy, suppose that your compiler does not have structure assignments. You can efficiently copy one structure into another like this:
An even handier way to handle structure assignments is to define a macro:
This way, if your compiler does support structure assignments, you can change the definition like this:
The macro STRUCTASST is defined in the local.h file described in Chapter 1.
Here are the definitions of the other "memory" functions:
The array s[0:n-1] is searched for a match to the byte specified by c. If one is found, a pointer to the matching byte is returned; otherwise, NULL is returned.
The arrays s1[0:n-1] and s2[0:n-1] are compared byte-by-byte. If they are identical, zero is returned. Otherwise, a number less than zero or greater than zero is returned, according to the sense of the comparison.
All the bytes of s[0:n-1] are filled with the byte value specified by c. The pointer s is returned.
One often has the problem of converting an algorithm that uses subscripts into one that uses pointers (usually for greater efficiency). This works best when each subscript is used sequentially (with increments or decrements). Each subscripted reference like a[i] becomes a pointer reference like *p_a. Each increment or decrement of a subscript becomes an increment or decrement of a pointer instead. Instead of comparing subscripts, we will compare pointers.
If, for example, we converted the qksort function of Section 3.2 to use pointer references instead of subscript references, it would look something like this:
Nothing has changed in the algorithm. All the invariants are the same as before. We have just replaced the subscripts with pointers.
In C, a multi-dimensional array is really an array of arrays: if scores is declared via
scores consists of two elements, each of which is an array of three short integers. If, for instance, we were to typedef the name trio to stand for an array of three shorts, then scores could equivalently be declared to be an array of two trios.
As we saw earlier for pointers, there is a complementary relationship between array operations and array declarations. The "sandwich" rule tells us that the type of scores is short [2][3]. The same rule says that the type of the array scores[0] is short [3].
This becomes of particular importance when passing multidimensional arrays to functions, because what actually gets passed to the function is the address of the initial subarray. Consider a function weight which accepts n-by-three arrays of scores, and weights each column by one, two, and three.
The weight function should be passed the address of an array of three-element arrays. By C pointer parameter rules, the parameter declaration
is understood as
In other words, a is actually a pointer to three-element arrays. (The first form indicates the usage more clearly, but the second form reveals more clearly the nature of the parameter a.) Using the array scores above, we could call the function like this:
or like this:
The two forms are equivalent, and the address of the array scores[0] is what gets passed. As discussed in Section 3.4, the range of pointer values that is associated with the argument is {&scores[0][0]:&scores[2][0]}, that is, all the elements of scores, plus the "one-too-far" address.
Note that in the called function, the compiler must be told the bounds of all dimensions except the major dimension. If the function needs to know the major dimension, it must be passed explicitly, as in weight above. Just as with single-dimensional arrays, there is no way to induce C to reveal the major dimension to the called function. It is certainly a mistake to use sizeof(a) to infer the dimension; sizeof(a) is only the number of bytes in a pointer, and has no relation to the size of the actual array.
An array of pointers is a compact way to store a table of strings. In Section 3.5 we saw a table that looked like this when stored in a two-dimensional array:
We could store the same information using an array of string pointers:
The ("declared") type of meas_codes[i] is char *, i.e., pointer to char. Each pointer meas_codes[i] is thus a proper argument to strcmp in the same loop that we used before:
A very simple lookup operation like this one sometimes has its practical uses. Recent C compilers provide a function called getenv ("get environment") which searches in a system-specific environment for a given name and returns a pointer to the definition associated with that name. One of the most common uses is as a way for a program to determine what type of terminal is being used:
will give us the answer, if the information is available. And if we were porting a program to a machine with no operating system, and hence no "environment," the getenv function would be one of the easiest ones to write, especially if all the "environment" definitions were constant:
The function assumes that the array env_names will contain a NULL pointer as its last entry. This is a very common technique, since it eliminates the need for a separate variable to communicate the dimension of the array.
Topics assumed from previous book: command line arguments [7.8].
The arguments to the main function are passed as an array of string pointers, along with a count which tells the number of non-NULL pointers in the array. Both the count and the array pointer are parameters of main, so their names can be whatever you like. The count is conventionally named argc or ac, and the array pointer is conventionally argv or av.
As with any array, the array accessed through av can be traversed either with a subscript or with a pointer. However, most programmers seem to make fewer mistakes when they use a subscript, as in the following function:
If we were to invoke args from the command line like this
most systems would pass the quoted string as one argument, giving this printout:
In most implementations, each function has an address, just as data objects do. Thus a pointer-to-function (or "function pointer") is typically an address-sized variable which can contain the address of a function. Declaring that pfn points to a function is equivalent to declaring that (*pfn) (the thing that pfn points to) is a function:
says that (*pfn) is a function returning double, or that pfn is a pointer to a function returning double. The parentheses around *pfn are needed; the declaration
says that fnpd returns a pointer to double.
No operators are defined for function pointers, except for assignment, comparison, and (of course) indirection. And the only things that can be assigned to function pointers are the addresses of functions having the appropriate type:
The address-of operator can be omitted; indeed, some current compilers prohibit it. We can write
This is the more familiar style, once you understand that it is the address of strcmp that is being assigned to cmpfn.
The only useful thing to do with a function pointer is to call the function that it is pointing to. To call (*cmpfn) (the function that cmpfn is pointing to), with arguments s1 and s2, we write
Function pointers are useful for dispatch tables, i.e. tables of different functions to call. Instead of implementing a dispatch table like this
we could put the addresses of all the functions into an array:
In other words, dispatch[code] is a function pointer (of the defined-type pvoidfn), so *dispatch[code] is the function that it points to.
The typedef for pvoidfn ("pointer to function returning void'') makes the code more readable and less error-prone.
Another common use for function pointers is to provide a function as a parameter of another function. For example, the short-integer version of qksort could be made into a general-purpose sort function by passing a pointer to the comparison function, rather than programming the comparison operation into the function as we did in qksort. Thus, we will add a cmpfn parameter. The comparison function will be called with the addresses of two array elements, and it must return a value less than, equal to, or greater than zero, according to whether the first array element sorts before, equal to, or later than the second array element. (If the array elements are strings, the strcmp function would behave exactly in the right way.)
We will also need a new parameter to tell the size in bytes of the elements in the array, because the general-purpose version will not know the specific data type of the array elements. Accordingly, the interface will now look like this:
Where earlier we had to re-compile a new sort function for each different array type, now one function will work for all array types. To sort an array of strings, call
That is, we pass the starting address a, the number of array elements DIM(a), the size of each element sizeof(a[0]), and the function to use for comparisons strcmp. Again, what gets passed to qsort is the address of the strcmp function.
Or to go back to the sorting of an array of shorts, we would need a little function to accomplish the short comparison.
It is tempting to use a subtraction
as a quick way to produce the three-way (positive, zero, negative) result, but this would be prone to overflow, as we discussed in Section 2.3. Then to sort an array of shorts, we would invoke
Because of this ability to parameterize the sort algorithm into a single function, it can become part of a general-purpose function library. (The specification, in fact, is taken directly from the UNIX library manuals.)
Thus, in almost all respects, we would consider the callable general-purpose library function (suitably parameterized) to be the more reliable implementation of the algorithm -- there are fewer degrees of freedom for making mistakes.
Here is the C source for the qsort function:
We started in Section 3.2 with the qksort function, which accepted an array of short integers and sorted it with subscripts. Then in Section 4.4 we transformed it into qksortp, which still worked upon short integers but used pointers. Finally, we transformed qksortp into qsort, shown just above. The next section will describe these transformations.
The transformations that took qksort into qsort are typical of the messiest aspects of programming with pointers. It will be helpful to consider the transformations step-by-step.
First transformation: an ordinary array reference (such as a[lastlo])is to be performed using a pointer. A pointer p_lastlo is created which will point to a[lastlo]. Then each reference to a[lastlo] becomes a reference to *p_lastlo. Finally, if any storage was dynamically allocated, it must be freed.
Second transformation: Pointers to known type are generalized into pointers to varying-size storage. Here, the rules are much trickier. The type (and therefore the size) that a pointer points to is fixed at compiletime; there is no direct way to obtain a pointer to varying-size storage. The best we can do in C is to use a "byte" pointer (char *) and perform any scaling directly. (In the qsort example, the pointers are all scaled by elt_size, the size of each element in the argument array.) Thus the declaration (from qksortp)
becomes (in qsort)
And ++p_lastlo becomes p_lastlo += elt_size.
The difference between two pointers must be divided by the pointed-to size: (p_hi - p_lo) becomes (p_hi - p_lo) / elt_size. The difference between two pointers is an integer result, so multiplying and dividing both take place in transforming
into
It is tempting to cut corners on the apparently redundant multiply-and-divide, but what gets added to p_lo must be an multiple of elt_size.
Third transformation: No operations can take place directly upon the pointed-to values; functions must be called for all operations. Thus
must be rewritten as
Swapping two pointed-to values becomes too complicated for our simple SWAP macro; a new function (swapfn) is now used.
To summarize these transformation rules in a table:
Such transformations must be done carefully, but it is testimony to the expressive power of C that they are possible at all. Certainly, in many languages it is difficult, if not impossible, to write a function which will accept an array of any dimension, composed of elements of any size, and then sort the elements according to an arbitrary ordering function. The C implementation of qsort does all this, in strictly portable code.
char s[10], *p;
p = s;
char s[10];
int *pi;
pi = s; improper mixed-type pointer assignment
short m[2];
char *pc;
char **ppc;
char **pc declaration
pc name
char ** type
if x() has the type T, then x has the type "function returning T";
if x[] has the type T, then x has the type "array of T";
if *x has the type T, then x has the type "pointer to T";
char *a[N]; declaration
a[i] expression
char * type
char (*pa)[N]; declaration
(*pa) expression
char [N] type
DECLARATION VARIABLE NAME TYPE ("DECLARED")
short **ap[5]; _____________ __________________
long (*pf)[2]; _____________ __________________
double **pps; _____________ __________________
char *pc = &c;
char* pc = &c;
char *pc = c;
char c = '1'; /* ASCII value is 49 decimal */
char *pc = &c;
char **ppc = &pc;
#define PR_PTR(p) (printf("X%10ld", (long)(p))
PR_PTR(pc);
PR_PTR (ppc);
printf("\n");
104 112
4.2 Pointers to Scalars
char c = '1';
char *pc = &c;
char **ppc = &pc;
&c __________ c __________ *c __________
&pc __________ pc __________ *pc __________
&ppc __________ ppc __________ *ppc __________
_____ scanf("%c", c)
_____ scanf( "%c", pc)
_____ scanf("%c", &pc)
_____ scanf("%c", *pc)
_____ scanf("%c ", *ppc)
short_order:
/* short_order - put two short integers into order
*/
# include "local.h"
void short_order(pi, pj)
short *pi, *pj;
{
short t;
if (*pj < *pi)
t = *pi, *pi = *pj, *pj = t;
}
short i1 = 1111;
short i2 = 22;
short *pi1 = &i1;
short *pi2 = &i2;
short_order(&i1, &i2); short_order(pi1, &i2);
short_order(&i1, pi2); short_order(pi1, pi2);
short *ps1, *ps2;
short_order(ps1, ps2);
short *ps1 = NULL;
short *ps2 = NULL;
short_order(ps1, ps2);
4.3 Dangling Pointers
dangling.c:
/* dangling - example of dangling pointer
*/
# include "local.h"
static short *pi = NULL;
main( )
{
void f1( );
/* pi => NULL initially */
f1( ); /* pi => undefined suddenly! */
}
void f1( )
{
short i;
pi = &i; /* pi => complete, momentarily */
}
4.4 Array Pointers and Strings
size_t strlen(s)
char s[]; /* : string */
strlen("hello")
int strcmp(s1, s2)
char s1[]; /* : string */
char s2[]; /* : string */
strcmp("ab", "aa")
strcmp("\377", "a")
char *strcpy(s1, s2) /* at return, s1 => string */
char s1[]; /* : !NULL */
char s2[]; /* : string */
char *strcat(s1, s2) /* at return, s1 => string */
char s1[]; /* : string */
char s2[]; /* : string */
char *strchr(s, c) /* returned value => string|NULL */
char s[]; /* : string */
int c; /* : {CHAR_MIN:CHAR_MAX} */
strchr("abc", 'b')
char *strrchr(s, c) /* returned value => string|NULL */
char s[]; /* : string */
int c; /* : {CHAR_MIN:CHAR_MAX} */
strrchr("aba", 'a')
char *strpbrk(s1, s2) /* returned value => string|NULL */
char s1[]; /* : string */
char s2[]; /* : string */
strpbrk("abc123", "0123456789")
size_t strspn(s1, s2)
char s1[]; /* : string */
char s2[]; /* : string */
strspn("\t\tabc", " \t\n")
size_t strcspn(s1, s2)
char s1[]; /* : string */
char s2[]; /* : string */
strcspn("abc\t", " \t\n")
char *strtok(s1, s2) /* returned value => string|NULL */
char s1[]; /* : string */
char s2[]; /* : string */
static char buf[] = " abc 123\t\txyz\n";
static char seps[] = " \t\n";
char *p;
printf("first token=<%s>\n", strtok(buf, seps));
while ((p = strtok(NULL, seps)) != NULL)
printf("next token=<%s>\n", p);
first token=<abc>
next token=<123>
next token=<xyz>
#define STREQ(s, t) (strcmp(s, t) == 0)
#define STRLT(s, t) (strcmp(s, t) < 0)
#define STRGT(s, t) (strcmp(s, t) > 0)
#define STREQ(s, t) ((*s) == (*t) && strcmp(s, t) == 0)
char *strncat(s1, s2, n) /* at return, s1 => string */
char s1[]; /* : string */
char s2[]; /* s2[0:n-1] : strn */
size_t n; /* : {0:DIM(s2[*])} */
int strncmp(s1, s2, n)
char s1[]; /* s1[0:n-1] : strn */
char s2[]; /* s2[0:n-1] : strn */
size_t n; /* : {0:DIM(s2[*])} */
char *strncpy(s1, s2, n) /* at return, s1 => nul-padded */
char s1[]; /*: !NULL, DIM(s1[*]) >= n */
char s2[]; /*: strn */
size_t n; /*: {0:DIM(s2[*])} */
data_ptr memcpy(s1, s2, n)
data_ptr s1; /* : !NULL */
data_ptr s2; /* : !NULL */
size_t n; /* : {0:DIM(s1[*])-1 */
memcpy:
/* memcpy - copy n bytes from b to a */
#include "local.h"
data_ptr memcpy(s1, s2, n)
data_ptr s1;
data_ptr s2;
register size_t n;
{
register char *a = s1;
register char *b = s2;
for (; n > 0; --n, ++a, ++b)
*a = *b;
return (s1);
}
struct xxx a;
struct xxx b;
memcpy((char *)&a, (char *)&b, sizeof(a));
#define STRUCTASST(a, b) memcpy((char *)&(a), (char *)&(b), sizeof(a))
#define STRUCTASST(a, b) ((a) = (b))
data_ptr memchr(s, c, n)
data_ptr s; /* : ! NULL */
int c; /* : {CHAR_MIN:CHAR_MAX} */
size_t n; /* : {0:DIM(s[*])-1} */
int memcmp(s1, s2, n)
data_ptr s1; /* : ! NULL */
data_ptr s2; /* : ! NULL */
size_t n; /* : {0:DIM(s1 [*] )-1 */
data_ptr memset(s, c, n)
data_ptr s; /* : ! NULL */
int c; /* : {CHAR_MIN:CHAR_MAX} */
size_t n; /* : {0:DIM(s[*])-1 */
4.5 Changing Subscripts into Pointers
qksortp:
/* qksort - sort array a (dimension n) using quicksort */
#include "local.h"
/* iqksort - internal function to partition the array [*lo:*hi] */
static void iqksort(p_lo, p_hi)
short *p_lo, *p_hi;
{
short *p_mid = p_lo + (p_hi - p_lo) / 2; /* : {p_lo:p_hi} */
short *p_i; /* : {p_lo+1:p_hi+1} */
short *p_lastlo; /* : {p_lo:p_hi-1} */
short tmp;
SWAP(*p_lo, *p_mid, tmp);
p_lastlo = p_lo;
for (p_i = p_lo+1; p_i <= p_hi; ++p_i)
{
if (*p_lo > *p_i)
{
++p lastlo;
SWAP(*p_lastlo, *p_i, tmp);
}
}
SWAP(*p_lo, *p_lastlo, tmp);
if (p_lo < p_lastlo && p_lo < p_lastlo - 1)
iqksort(p_to, p_lastlo - 1);
if (p_lastlo + 1 < p_hi)
iqksort(p_lastlo + 1, p_hi);
}
/* qksort - external entry point */
void qksort(a, n)
short a[]; /* array to be sorted; a[0:n-1]: multiset */
size_t n; /* number of elements */
{
if (n > 1)
iqksort(a, &a[n-1]);
}
4.6 Multi-dimensional Arrays
typedef short trio[3];
trio scores[2];
long weight(a, n)
short a[][3];
int n;
{
index_t i, j;
long sum = 0;
for (i = 0; i < n; ++i)
for (j = 0; j < DIM(a[0]); ++j)
sum += (j + 1) * a[i][j];
return (sum);
}
short a[][3];
short (*a)[3];
weight(scores, 2)
weight(&scores[0], 2)
4.7 Arrays of Pointers
static char meas_codes[][6] =
{"each", "box", "lb"};
static char *meas_codes[] =
{"each", "box", "lb"};
for (i = 0; i < DlM(meas_codes); ++i)
if (strcmp(s, meas_codes[i]) == 0) or if (STREQ(s, meas_codes[i]))
found a match
char *termtype;
termtype = getenv("TERM");
getenv(dummy):
/* getenv - get "environment" definition (dummy standal one version) */
# include "local.h"
char *getenv(s)
char s[];
{
extern char *env_names[]; /* : array of string ptrs, last one NULL */
extern char *env_defs[]; /* : array of string ptrs */
int i;
for (i = 0; env_names[i] != NULL; ++i)
if (STREQ(s, env_names[i]))
return (env_defs[i]);
return (NULL);
}
4.8 Command Line Arguments
args.c:
/* args - show command-line arguments, one line each
*/
# include "local.h"
main(ac, av)
int ac;
char *av[];
{
int i;
for (i = 0; i < ac; ++i)
printf("av[%d]=<%s>\n", i, av[i]);
}
args one two "quoted string for three"
av[0]=<args>
av[1]=<one>
av[2]=<two>
av[3]=<quoted string for three>
4.9 Pointers to Functions
double (*pfn)();
double *fnpd();
extern int strcmp();
int (*cmpfn)() = &strcmp;
int (*cmpfn)() = strcmp;
(*cmpfn)(s1, s2)
switch (code) /* layout to highlight structure */
{
case 0: initdb(); break;
case 1: opendb(); break;
case 2: readdb(); break;
...
case 7: closdb(); break;
}
typedef void (*pvoidfn)(); /* pvoidfn is ptr to fn ret void */
static pvoidfn dispatch[] = {initdb, opendb, readdb, ..., closdb};
...
if (0 <= code && code < DlM(dispatch))
(*dispatch[code])();
else
process invalid code
void qsort(a, n, size, cmpfn)
data_ptr a;
size_t n;
size_t size;
int (*cmpfn)();
qsort((char *)a, DIM(a), sizeof(a[0]), strcmp);
short shortcmp(pi, pj)
short *pi;
short *pj;
{
if (*pi > *pj)
return (1);
else if (*pi == *pj)
return (0);
else
return (-1);
}
return (*pi - *pj)
qsort((char *)a, DIM(a), sizeof(a[0]), shortcmp);
qsort:
/* qsort - sort array a (dimension n) using quicksort
* based on Bentley, CACM April 84
* Comments use notation A[i], a (fictitious) array of things
* that are of size elt_size.
*/
#include "local.h"
/* swapfn - swap elt_size bytes a <--> b (internal routine)
*/
static void swapfn(a, b, elt_size)
register char *a; /* pointer to one element of A */
register char *b; /* pointer to another element of A */
size_t elt_size; /* size of each element */
{
register size_t i;
char tmp;
for (i = 0 i < elt_size; ++i, ++a, ++b)
SWAP(*a, *b, tmp);
}
/* iqsort - internal quicksort routine */
static void iqsort(p_lo, p_hi, elt_size, cmpfn)
char *p_lo; /* ptr to low element of (sub)array */
char *p_hi; /* ptr to high element of (sub)array */
size_t elt_size; /* size of each element */
int (*cmpfn)(); /* comparison function ptr */
{
char *p_mid = p_lo + ((((p_hi - p_lo) / elt_size) / 2) * elt_size);
register char *p_i; /* pointer to A[i] */
register char *p_lastlo; /* pointer to A[lastlo] */
swapfn(p_lo, p_mid, elt_size); /* pick the middle element as pivot */
p_lastlo = p_lo;
for (p_i = p_lo + elt_size; p_i <= p_hi; p_i += elt_size)
{
if ((*cmpfn)(p_lo, p_i) > 0)
{
p_lastlo += elt_size;
swapfn(p_lastlo, p_i, elt_size);
}
}
swapfn(p_lo, p_lastlo, elt_size);
if (p_lo < p_lastlo && p_lo < p lastlo - elt_size)
iqsort(p_lo, p-lastlo - elt_size, elt_size, cmpfn); /* lower */
if (p_lastlo + elt_size < p_hi)
iqsort(p_lastlo + elt_size, p_hi, elt_size, cmpfn); /* upper */
}
/* qsort - the callable entry point */
void qsort(a, n, size, pf)
data_ptr a; /* address of array A to be sorted */
size_t n; /* number of elements in A */
size_t size; /* size of each element */
int (*pf)(); /* comparison function ptr */
{
if (n > 1)
iqsort((char *)a, (char *)a + (n-1) * size, size, pf);
}
4.10 Pointer Transformations
short *p_lastlo;
char *p_lastlo;
short *p_mid;
p_mid = p_lo + (p_hi - p_lo) / 2;
char *p_mid;
p_mid = p_lo + ((p_hi - p_lo) / elt_size) / 2) * elt_size;
if (*p_lo > *p_i)
if ((*cmpfn)(p_lo, p_i) > 0)
Original Transformed
(p_hi - p_lo) (p_hi - p_lo) / elt_size
p_lo + expr p_lo + expr * elt_size
++p_lastlo p_lastlo += elt_size
SWAP(*p_lo, *p_mid, tmp) swapfn(p_lo, p_mid, elt_size)