Next Chapter Return to Table of Contents Previous Chapter

CHAPTER 1: PREPROCESSING

The "definition" capabilities of C provide techniques for writing programs that are more portable, more readable and easier to modify reliably. For a simple example,

#define SCR_CLR_ALL "\33[2J"

defines a string which will clear the entire screen (on "ANSI" terminals). A program that contained numerous instances of printf(SRC_CLR_ALL) would be considerably more readable than one containing lots of printf("\33[2J") uses.

In addition, definitions can aid in modifiability. Using the SCR_CLR_ALL example, the program that uses the defined name can be more easily modified than the one which contains instances of the actual string sprinkled throughout the program.

In this chapter, we will see a variety of uses of definitions and header files which are helpful in achieving reliable programs.

1.1 Defined Constants

Topics assumed from previous book: #include [2.10], #define [2.10], ctype.h [3.5], stdio.h [3.21], defined constant [3.21].

A file of definitions to be #include'd is known (in ANSI C terminology) as a header. (In other Plum Hall books, it is also referred to as an include-file.) In the standard header stdio.h are found some definitions such as

#define EOF (-1)

This definition causes any subsequent appearance of the name EOF to be replaced by the characters (-1). Since the definition contains an operator (the "minus"), we enclose the definition in parentheses. If we left the parentheses off the definition, as in

#define EOF -1

it is possible that the name could generate an expression that would be misunderstood by the compiler. For example, this erroneous code

if (c EOF) WRONG - should be (c != EOF)

would be interpreted as

if (c - 1) syntactically correct, but unintended

which would not generate any compile errors, even though it is certainly not what the programmer intended. This suggests a rule for reliable programming (the first of many that will be presented):

Rule 1-1: Any macro definition containing operators needs parentheses around the entire definition. Each appearance of a macro argument in the definition also needs to be parenthesized if an embedded operator in the argument could cause a precedence problem.

All the reliability rules are summarized in the appendix [1-1].

There are two main reasons for creating defined constants. The first is readability: the defined constant clarifies the meaning of the constant. The name EOF tells the reader that it is a special indication for "end-of-file." For another example from <stdio.h>, consider

#define BUFSIZ 512

Again, the name clarifies the meaning: BUFSIZ is the "buffer size" used in efficient I/O transfers.

The second reason for defined constants is modifiability: the defined constant shows how the program can be modified for a different value. Thus, on systems where 1024 is a better value for the size of disk I/O transfers, the standard header stdio.h could specify the value 1024 for BUFSIZ.

Rule 1-2: Reliable modification of defined constants requires an environmental capability: there must be a means for ensuring that all files comprising a program have been compiled using the same set of headers. (The UNIX make command is one such capability.)

EOF, however, exemplifies a defined constant which cannot be modified arbitrarily: many implementations of the character-type tests (ctype.h) assume that its value is -1. The UNIX System V manual page for ctype(3C), for example, says explicitly that EOF equals -1. Thus the following usage rule:

Rule 1-3: If there are limitations on the modifiability of a defined constant, indicate the limitations with a comment:

#define EOF (-1)  /* DO NOT MODIFY: ctype.h expects -1 value */

Another application of this rule is the explicit indication of minimum and maximum values:

#define NBUFS 5   /* min 2, max 30 */

(to give a hypothetical example).

Rule 1-4: If one definition affects another, embody the relationship in the definition; do not give two independent definitions.

This pair of definitions

#define XX 5

#define XX2 (XX + 2)

follow the rule by showing the relationship, whereas

#define XX 5

#define XX2 7  misleading, no indication of relationship

does not. In the former case, we could reliably modify the program by changing the definition of XX; in the latter case, we would have to make a guess that XX2 should probably be changed to equal XX plus two.

Just giving the constant a name is not enough to ensure modifiability; you must be careful always to use the name, and remember that the value could change. One project had difficulties changing the value of BUFSIZ because some programmers had written

nblocks = nbytes >> 9; hard to modify, uses "magic number"

in a number of places where

nblocks = nbytes / BUFSIZ;

was needed. The programmers figured that "everyone knows that BUFSIZ equals 512," and right-shifting nine bits is the same (for positive numbers) as dividing by 512. But when BUFSIZ changed to 1024 on some systems, modifications were difficult. Hence, this rule:

Rule 1-5: If a value is given for a #defined name, do not defeat its modifiability by assuming its value in expressions.

In most compilers, it is also possible to define a constant on the invocation of the compiler, rather than putting a #define in the code itself. Suppose, for example, that we wish to have the constant NBUFS specified at compilation time, rather than putting the value into the program. Using the UNIX compiler cc, we accomplish the definition like this:

cc -DNBUFS=10 pgm.c

This technique allows a program to be targeted to different environments without changing the source code for the program.

The UNIX function library has recently been standardized by /usr/group, the UNIX industry association. This standard specifies a new header named <limits.h>, wherein are defined environment- dependent limits. (ANSI C will probably standardize a similar capability.)

A sample version of <limits.h> for C on an IBM PC-type machine might contain definitions like these:

#define CHAR_BIT            8    /* number of bits in a char */

#define CHAR_MAX          255    /* largest char value */

#define CHAR_MIN            0    /* smallest char value */

#define INT_MAX         32767    /* largest int value */

#define INT_MIN        -32768    /* smallest int value */

#define LONG_MAX   2147483647    /* largest long value */

#define LONG_MIN  -2147483648    /* smallest long value */

#define SHRT_MAX        32767    /* largest short value */

#define SHRT_MIN       -32768    /* smallest short value */

#define UCHAR_MAX         255    /* largest unsigned char value */

#define UINT_MAX        65535    /* largest unsigned int value */

#define ULONG_MAX  4294967295    /* largest unsigned long value */

#define USHRT_MAX       65535    /* largest unsigned short value */

Note that no extra definitions are needed for numbers like "number of bits in an int;" we can use sizeof(int) * CHAR_BIT, since every data type occupies an integral number of bytes.

Rule 1-6: Use limits.h for environment-dependent values.

1.2 Defined Types

Topics assumed from previous book: defined type [3.21], void [3.21], typedef [3.21].

It will be very important for purposes of reliability to make more precise definitions of data types than the basic types supplied by C. In this book, we will describe special semantic rules for various of these defined types. Here is a list of some useful defined types:

bits        /* an unsigned short integer used for bitwise operations */

ushort      /* an unsigned short integer used for arithmetic */

tbool       /* a char (one byte) to be tested for zero or non-zero */

metachar    /* a short integer holding a char value or EOF */

bool        /* an integer to be tested for zero or non-zero */

void        /* the "return type" for a function that returns no value */

These types can be defined using #define or typedef; here is a typedef version:

typedef unsigned short ushort, bits;

typedef char tbool;

typedef short metachar;

typedef int bool;

typedef int void;   /* delete if compiler supports void */

The important difference between #define and typedef is that #define replaces the name with its definition during preprocessing, whereas typedef is handled by the C syntax analysis. Thus, a defined type created via typedef cannot be "undefined" or re-defined, so its usage is more reliable. Also, if the name is mistakenly used as a variable name, the diagnostics are more intelligible.

This particular set of defined types are consistently used in the various Plum Hall books on C. In "publication" code, where one desires the fewest possible augmentations to standard C, an alternative approach is to use comments:

short status;   /* interface status: bits */

char is_open;   /* is device open?: bool */

int input_c;    /* most recent input character: metachar */

Even in publication code, however, the symbols ushort and void are important for portability reasons. Not all compilers currently accept the unsigned short type, but most that do not accept it are targeted for small machines where ushort can simply be translated into unsigned int.

Regarding void, any function that does not return a value should be indicated as being a void function. The following scr_beep function outputs the "bell" character, but does not return any value to the calling function, so we give it the void type:

void scr_beep( )

{

putchar('\7');  /* "bell" - ASCII version */

}

Recent C compilers implement void as a keyword. If your compiler supports void, you should not try to #define or typedef a definition for it; in this case, you should remove any such definitions from your headers. If your compiler does not support void, you should define it as int.

Rule 1-7: Use a consistent set of project-wide defined types.

1.3 Standard Headers

Topics assumed from previous book: #include [2.10], stdio.h [3,4]. math.h [3.17], ctype.h [3.5].

Headers are used for several purposes: creating defined constants, creating defined types, and declaring the type of various functions.

The modern treatment of libraries is to create a header for each related group of functions, along with any special symbols that are useful with those functions. Thus the math functions (and also the symbol HUGE_VAL) are declared in <math.h>:

acos      cos       fmod     modf     tan

asin      cosh      frexp    pow      tanh

atan      exp       ldexp    sin

atan2     fabs      log      sinh

ceil      floor     log10    sqrt

The character-test functions are declared in <ctype.h>:

isalnum   isdigit   isprint  isupper  toupper

isalpha   isgraph   ispunct  isxdigit

iscntrl   islower   isspace  tolower

And the standard I/O functions (plus some useful symbols) are declared in <stdio.h>:

clearerr  fopen     fseek    printf   scanf    ungetc

fclose    fprintf   ftell    putc     setbuf

feof      fputc     fwrite   putchar  setvbuf

ferror    fputs     getc     puts     sprintf

fflush    fread     getchar  remove   sscanf

fgetc     freopen   gets     rename   tmpfile

fgets     fscanf    perror   rewind   tmpnam

BUFSIZ    EOF       FILE     NULL

stdin     stdout    stderr

Another standard header is <assert.h>, which provides an assert macro for debugging. This header will be covered in Section 1.6.

The logical extension of this approach is to partition each library entirely into related groups of functions, with each header declaring all functions in its group. ANSI C envisions two more headers, <string.h> and <stdlib.h>. (<string.h> is already found in UNIX libraries, beginning with System V. System V has another header named <memory.h>, which ANSI C will probably merge into <string.h>.)

In <string.h>, there should be declarations for these functions:

memchr    memset    strcoll  strncat  strrchr

memcmp    strcat    strcpy   strncmp  strspn

memcpy    strchr    strcspn  strncpy  strtok

memmove   strcmp    strlen   strpbrk

In <stdlib.h> there should be declarations for several miscellaneous functions:

abort     atof     bsearch   exit     ldiv    rand     system

abs       atoi    calloc     free     malloc  realloc

atexit    atol    div        getenv   qsort   srand

Most of these names will be described in this book. Each header will be described when we need its functions; the names are given here just for completeness.

The variety of libraries available with existing compilers does create a portability problem for your code. Until the time that all your compilers conform to an ANSI standard, the following approach will help shelter your programs from library differences: The headers <math.h>, <ctype.h>, and <stdio.h> are well-nigh universal, and can simply be #included. For each of the other headers limits.h, string.h, and stdlib.h, create a header of your own which can be included without using the "angle-bracket" <xxx.h> form. In that header, provide declarations for all the functions that are supported on the environment that you are using. You might, of course, provide any missing functions yourself, and include their declarations in the header.

One other ANSI header needs to be mentioned. It is called <stddef.h> ("standard definitions") and it includes a few type definitions and named constants. The only one that we will need in this book is called size_t, and it is defined to be an integer that is big enough to hold the size of any object (unsigned int or unsigned long int).

The minimal versions of these "pre-ANSI" headers required for the programs in this book are as follows:

stddef.h:

/* stddef.h - standard definitions (partial listing) */

/* ENVIRONMENT-DEPENDENT - ADJUST TO LOCAL SYSTEM */

#ifndef STDDEF_H

#define STDDEF_H

typedef unsigned size_t;   /* use unsigned long in large-object model */

#ifndef NULL

#define NULL 0             /* use 0L if int-size < long-size == ptr-size */

#endif

extern int errno;

#endif

limits.h:

/* limits.h - environment limits (partial listing) */

#ifndef LIMITS_H

#define LIMITS_H

#define CHAR_BIT    8

#endif

string.h:

/* string.h - string functions (partial listing) */

#ifndef STRING_H

#define STRING_H

#include "stddef.h"

data_ptr memcpy( );  /* PARMS(data_ptr s1, data_ptr s2, size_t n) */

char *strcat( );     /* PARMS(char *s1, char *s2) */

char *strchr( );     /* PARMS(char *s1, int c) */

int strcmp( );       /* PARMS(char *s1, char *s2) */

char *strcpy( );     /* PARMS(char *s1, char *s2) */

size_t strlen( );    /* PARMS(char *s1) */

char *strncat( );    /* PARMS(char *s1, char *s2, size_t n) */

int strncmp( );      /* PARMS(char *s1, char *s2, size_t n) */

char *strncpy( );    /* PARMS(char *s1, char *s2, size_t n) */

#endif

stdlib.h:

/* stdlib.h - miscellaneous library functions (partial listing) */

#ifndef STDLIB_H

#define STDLIB_H

#include "stddef.h"

double atof( );      /* PARMS(char *s) */

int atoi( );         /* PARMS(char *s) */

long atol( );        /* PARMS(char *s) */

data_ptr calloc( );  /* PARMS(unsigned int n, size_t size) */

void exit( );        /* PARMS(int status) */

void free( );        /* PARMS(data_ptr ptr) */

data_ptr malloc( );  /* PARMS(size_t size) */

int rand( );         /* PARMS(void) */

void srand( );       /* PARMS(unsigned int seed) */

#endif

Each header declares the returned type of various functions. The types of the function parameters are also specified in comments [1-2].

Assuming that you have all these headers, or have created your own versions, when and why should you use them? First of all, notice that several problems can arise from not declaring the returned type of a library function. For example, most of the functions in the math library return double values. If you use one of these functions, such as

x = sqrt(y);

without including the math.h header (and without declaring the function in your own program), the compiler assumes that sqrt returns an int value, which will give a garbage value to x in this example.

A more subtle problem arises in using the string functions. A number of these functions return a value of type char * ("pointer to character"), and can be embedded into expressions:

strcat(strcat(s, ".1"), ".a");

will end up catenating ".1.a" onto the end of s. In order for this to work reliably, the compiler must know that strcat returns a char * value. Otherwise, the compiler assumes that it returns int, and the program will not work in some environments where int and char * are different sizes. (The lint checker will complain in any case, if you have lint.)

For these reasons, it is important to declare the library functions. Should you use the headers, or declare each function yourself? In general, the headers are the more reliable way to be sure that you have all the correct declarations, but there is one (temporary) problem. Some compilers will link the object code for every library function that you declare, whether or not your program actually calls it. The ANSI C standard will probably say that functions should not be linked unless they are used, though it may be a while before all compilers behave this way. I recommend that you use the standard headers for the library, unless your compiler links everything.

If you are going to use the standard headers, I also recommend that you create your own local header file that brings in all the headers needed for your project. I have used the generic name local.h for this header; your own local version will presumably have its own name. The simplest course is to have your local.h include all the standard library headers, as the safest way to be sure that they are all included.

Rule 1-8: Be sure that all functions are declared before use; headers are the most reliable way.

Rule 1-9: Create a project-wide "local" header for standard definitions and inclusions.

One of the headers that we will use is called portdefs.h ("portability definitions header"). Into it are collected various defined types, defined constants, and macros which will be important in producing portable code. By including it in our local.h we ensure that all programs compile with these "portability definitions." The following listings of local.h and portdefs.h show the definitions that are needed for the Plum Hall books:

local.h:

/* local.h - Definitions for use with Reliable Data Structures in C */

#ifndef LOCAL_H

#define LOCAL_H

#include <stdio.h>

#include <ctype.h>

#include <math.h>

#define FALSE           0          /* Boolean value */

#define FOREVER         for(;;)    /* endless loop */

#define NO              0          /* Boolean value */

#define TRUE            1          /* Boolean value */

#define YES             1          /* Boolean value */

#define getln(s, n)     ((fgets(s, n, stdin)==NULL) ? EOF : strlen(s))

#define ABS(x)          (((x) < 0) ? -(x) : (x))

#define MAX(x, y)       (((x) < (y)) ? (y) : (x))

#define MIN(x, y)       (((x) < (y)) ? (x) : (y))

#define DIM(a)          (sizeof(a) / sizeof(a[O]))

#define IN_RANGE(n, lo, hi) ((lo) <= (n) && (n) <= (hi))

#ifndef NDEBUG

#define asserts(cond, str) \

{if (!(cond)) fprintf(stderr, "Assertion '%s' failed\n", str);}

#else

#define asserts(cond, str)

#endif

#define SWAP(a, b, t)   ((t) = (a), (a) = (b), (b) = (t))

#define LOOPDN(r, n)    for ((r) = (n)+1; --(r) > 0; )

#define STREQ(s, t)     (strcmp(s, t) == 0)

#define STRLT(s, t)     (strcmp(s, t) < 0)

#define STRGT(s, t)     (strcmp(s, t) > 0)

#include "portdefs.h"   /* portability definitions */

#include "stddef.h"     /* (ANSI) standard definitions */

#include "limits.h"     /* (ANSI) machine parameters */

#include "string.h"     /* (ANSI) string functions */

#include "stdlib.h"     /* (ANSI) miscellaneous standard functions */

#include "rdslib.h"     /* functions from Reliable Data Structures in C */

#endif

portdefs.h:

/* portdefs.h - definitions for portability */

/* ENVIRONMENT-DEPENDENT - ADJUST TO LOCAL SYSTEM */

#ifndef PORTDEFS_H

#define PORTDEFS_H

/* adjust these names to local machine/compiler environment */

typedef unsigned short ushort;  /* or "unsigned" if short-size == int-size */

typedef unsigned char utiny;    /* to get unsigned byte */

typedef int void;               /* delete if compiler supports void */

typedef unsigned index_t;       /* may be chosen ad-lib locally */

typedef char *data_ptr;         /* use ANSI "generic ptr" if available */

/* next 5 names require no local changes, will work anywhere */

typedef char tbits;             /* one byte, for bitwise uses */

typedef char tbool;             /* one byte: {0:1} */

typedef ushort bits;            /* 16 bits (or more), for bitwise uses */

typedef int bool;               /* for function returns: {0:1} */

typedef short metachar;         /* return from getchar: {EOF,0:UCHAR_MAX} */

/* modulo function giving non-negative result */

#define IMOD(i, j) (((i) % (j)) < O ? ((i) % (j)) + (j) : ((i) % (j)))

/* if i % j is never negative, replace with the following line: */

/* #define IMOD(i, j) ((i) % (j)) */

/* portably convert unsigned number to signed */

#define UI_TO_I(ui) (int)(ui)   /* more complicated on ones complement */

/* structure offsets and bounds; adjust to local system */

#define STRICT_ALIGN int        /* adjust to local alignment requirement */

#define OFFSET(st, m) \

((char *)&((st *)&struct_addr)->m - (char *)&struct_addr)

#define BOUNDOF(t) \

((char *)(struct {char byte0; t byten; } *)&struct_addr)->byten - \

(char *)&struct_addr)

static STRICT_ALIGN struct_addr = 0;

#define STRUCTASST(a, b) memcpy(&(a), &(b), sizeof(a))

/* defined constants */

#define FAIL            1          /* failure exit */

#define SUCCEED         0          /* normal exit */

#define STDIN           0          /* standard input */

#define STDOUT          1          /* standard output */

#define STDERR          2          /* standard error output */

#define SEEK_SET        0          /* seek relative to start of file */

#define SEEK_CUR        1          /* seek relative to current position */

#define SEEK_END        2          /* seek relative to end */

#endif

To be consistent with the approach of giving headers for all function declarations, we will also provide a header for all the utility functions that will be presented in this book. We will call it rdslib.h ("Reliable Data Structures library header") and it looks like this:

rdslib.h:

#ifndef RDSLIB_H

#define RDSLIB_H

bool itoa( );      /* PARMS(int n, char *str, int ndigits) */

int fgetsnn( );    /* PARMS(char *str, int size, FILE *fp) */

int getsnn( );     /* PARMS(char *str, int size) */

int getreply( );   /* PARMS(char *prompt, char *reply, int size) */

bool getpstr( );   /* PARMS(char *p, char *s, size_t n) */

bool getplin( );   /* PARMS(char *p, char *s, size_t n) */

void plot_trk( );  /* PARMS(int n, char c) */

void reverse( );   /* PARMS(char *s) */

bool strfit( );    /* PARMS(char *s1, char *s2, size_t n) */

#endif

On any sizeable project that you work on, a significant part of the early work will be involved with choosing the project-wide set of headers to be used and putting them into the appropriate places in the system. Once the groundwork is done, however, the shared set of definitions makes everything else much easier.

1.4 Macro Functions

Topics assumed from previous book: macros with parameters [5.17].

A macro with parameters will be referred to as a macro function, for brevity. Of course, it is not really a function, but it is easier to talk about this way.

Our local.h specifies three macro functions:

#define ABS(x)     (((x) < 0) ? -(x) : (x))

#define MAX(x, y)  (((x) < (y)) ? (y) : (x))

#define MIN(x, y)  (((x) < (y)) ? (x) : (y))

Since each argument can contain operators, each parameter is parenthesized in the definition to avoid precedence conflicts. And since the entire result is an expression usable with other operators, the entire definition is also parenthesized. These are two reliability techniques totally under control of the programmer who writes the macro.

Unfortunately, one reliability problem is beyond the macro writer's control: side-effects on macro arguments. A typical example is

ABS(++n)  bug!

which increments n twice. To introduce some terminology, an unsafe macro function is one which evaluates a parameter more than once in the code expansion. Stated positively, a safe macro function evaluates each parameter only once in the code expansion. By this definition, all three macro functions (ABS, MAX, and MIN) are unsafe. As things stand now, the documentation for such macros must warn about putting side-effects on the invocation, and the responsibility is upon the programmer using the macro. (Some new compilers provide assistance in locating such bugs; see the appendix [1-3].)

Rule 1-10: Use UPPERCASE names for unsafe macro functions, to emphasize the restrictions on their usage.

For safe macros, there are some advantages to using lowercase names. Each safe macro could be replaced by an actual function call, and at different times during project development, one might want the macro version or the function version. For an example from the library, the character-type tests in ctype.h are usually implemented as safe macros; they could therefore be replaced with actual function versions.

Rule 1-11: Never invoke an unsafe macro with arguments containing assignment, increment/decrement, or function call.

Rule 1-12: Whenever possible, use safe macro functions.

To be more precise about our usage, all the macros described so far are what we will call expression macros: after replacement of arguments, they produce a valid C expression value. Thus, an expression macro (such as MAX(i, j)) can be used any place that C allows an expression.

An expression macro can be used purely for its side-effects, just like calling a void function. One such macro is useful for swapping two data objects using a third temporary:

#define SWAP(a, b, t) ((t) = (a), (a) = (b), (b) = (t))

Since the macro produces a single expression (three assignments connected with the comma operator), an invocation of the macro can be made into a statement just by adding a semicolon:

/* reverse the characters in a string */

for (i = 0, j = strlen(s) - 1; i < j; ++i, --j)

SWAP(s[i], s[j], t);

Another form of macro is the statement macro. For example, the "swap" operation can also be written as a statement macro, like this:

#define SWAP_SHORT(a, b) {short _t;_t = (a); (a) = (b); (b) =_t;}

Because the definition constitutes a block (compound statement), it can have local variables, such as _t. This eliminates the need for an explicit parameter, but it limits the type of data to which the macro can be applied.

A statement macro like SWAP_SHORT, which contains code enclosed in braces, has one small syntactic restriction on its usage. We would like to put a semicolon onto each invocation of the macro, just because this looks more familiar:

if (a[i] < a[j])

SWAP_SHORT(a[i], a[j]);

However, the macro replacement itself already is a statement, so the semicolon becomes an extra null statement tacked on. Usually, there is no harm, except for this one situation:

if (a[i] < a[j])

SWAP_SHORT(a[i], a[j]);  syntax error

else

/* ... */

Now we have a syntax error, because there are two statements between the if and the else. The unfortunate conclusion is that we need to be sure that the statement body following the if is enclosed in braces:

if (a[i] < a[j])

{

SWAP_SHORT(a[i], a[j]);

}

else

/* ... */

To keep things in perspective, statement macros comprise a small minority of the macros that you will typically use. The troublesome context is fairly uncommon, and the compiler will give a clear indication of any problem. Therefore, no special rules are necessary.

It would be nice to have a form for a statement macro which becomes a valid C statement when a semi-colon is appended. One interesting way to construct it is to terminate the macro with a trailing else:

#define SWAP_SHORT(a, b) if (1) {short_t; _t = a; a = b; b = _t} else

This does ensure that the macro will be syntactically valid when embedded into other C statements like this:

if (a[i] < a[j])

SWAP_SHORT(a [i], a[j]);

else

/* ... */

Unfortunately, the error of forgetting a semicolon is fairly common, and it will silently produce a very unreliable result in a common context like this:

SWAP_SHORT(i, j)        well-hidden bug

++n;

This generates

if (1) {short_t; _t = a; a = b; b = _t} else

++n;

Re-arranged for readability, it looks like this:

if (1)

{ short _t;  _t = a; a = b; b =  _t }

else

++n;

The increment statement following the macro has become the body of the (never executed) else, producing a mysterious result with no chance of syntax diagnosis (unless the compiler complains about "unreachable code").

The "trailing else" form for statement macros is therefore not recommended.

A control macro is one which resembles a control structure. One simple example is the FOREVER macro:

#define FOREVER  for (;;)

which is sometimes used in implementing an "N+1/2-time loop":

FOREVER

{

perform some action

if (time to quit)

break;

perform further actions

}

A more involved example is useful in constructing highly optimized inner loops; our local.h header includes a macro named LOOPDN, which counts a variable downwards toward zero:

#define LOOPDN(r, n) for ((r) = (n)+1; --(r) > 0; )

In most environments, using L00PDN (with a register variable r) is the most time-efficient way of specifying n iterations. (The expression is compared versus zero, and the decrement is part of an expression evaluation; see Plum and Brodie [1985].) Using the macro is preferable to writing the actual in-line code, because the code itself is rather obscure and ugly -- the macro hides the messy details. If some environment allows a more efficient loop than this one, using the macro allows us to change the definition in just one place.

1.5 Undefining

Once a symbol has been #define'd, its definition can be deleted via #undef. For a dangerous example, consider

#define NBUFS 5

char b1[NBUFS] [BUFSIZ];

/* ... one page of code here ... */

#undef NBUFS

/* ... another page of code here ... */

#define NBUFS 7

char b2[NBUFS] [BUFSIZ];

This example is deliberately chosen to illustrate the reliability problems of #undef. The declarations for b1 and b2 appear to the reader to be of equal size, but the definition of NBUFS has changed underfoot. Thus, as a general rule, the re-definition of symbols is unreliable.

To put it in a positive way, we want each symbol to have an invariant meaning, so that each instance of the symbol denotes the same thing.

One usage of #undef may become more common, as a certain discipline of library construction becomes standard. Namely: each name in a library is declared in its associated header (as we saw in previous section). The header can #define a function name as a macro function (provided that it is a safe macro). Then, if the programmer wishes to be sure that a given name is a true function, it will be adequate to #undef that name.

ANSI C will probably specify that it is always allowable to #undef a name, even if it has not been #define'd. Some existing compilers, however, will complain about #undef'ing a non-existent name. For the time being, the safest course is to use this pattern:

#ifdef NAME

#undef NAME

#endif

1.6 Conditional Compilation

The preprocessor provides for conditional compilation, whereby some lines of code may be selectively excluded from compilation, depending upon the outcome of some test. This is the general pattern:

#if-line

source lines

#else

other source lines

#endif

The #else part can be omitted:

#if-line

source lines

#endif

The #if-line can have any of these forms:

#if constant-expression

True if constant-expression is non-zero.

#ifdef identifier

True if identifier has been #define'd.

#ifndef identifier

True if identifier has not been #define'd.

Recent C compilers provide two more capabilities:

#if defined(identifier)

or

#if defined identifier

means the same as

#ifdef identifier

And "else-if" logic is provided:

#if defined(A)

n = A_LIMIT;

#elif defined(B)

n = B_LIMIT;

#endif

means the same as

#ifdef A

n = A_LIMIT;

#else

#ifdef B

n = B_LIMIT;

#endif

#endif

Now we will look at some examples of the application of conditional compilation to practical programming situations.

Creating portable defined types

Tuning defined types for size

Commenting-out sections of code

Inclusion Sandwich

Test drivers for library functions

Conditional debugging

Creating portable defined types

It is often useful to have a defined type to represent an unsigned short integer. Many compilers support this type directly, but some others (mostly 16-bit environments) do not. Suppose that our compile procedure ("shell script," "batch file," etc.) defines a constant named USHORT if the compiler supports the unsigned short data type. Then, we can specify the defined type ushort like this:

#ifdef USHORT

typedef unsigned short ushort;

#else

typedef unsigned ushort;   /* assumes 16-bit machine */

#endif

Using this definition in a local standard header, we can write our programs using ushort in declarations. Provided that our compilation procedure defines USHORT appropriately for the compilation environment, we will get the appropriate definition for each environment.

Tuning defined types for size

Suppose we need a defined type (to be called a TOKEN) which can represent any of a specified number of alternatives. (For simplicity, assume that the number of distinct "tokens" is never more than 64K.) We could define TOKEN like this:

#if MAXTOKEN <= 127

typedef char TOKEN;

#else

typedef ushort TOKEN;

#endif

Thus, for compilations in which the number of "tokens" is small, we can store each of them in a single char object (which can reliably hold a number between 0 and 127, in any environment).

Commenting-out sections of code

Sometimes one needs to "comment-out" several lines of code. Most compilers (and ANSI C) do not nest comments, so ordinary comments cannot be used. But the construct

#if 0

/* ... code to be commented-out ... */

#endif

will reliably delete the enclosed lines, even if they contain other #if constructs.

Rule 1-13: Use #if 0 if there is a need to comment-out sections of code.

Inclusion Sandwich

Until the early 1980's, large projects had a continual problem with the inclusion of headers. One group might have produced a graphics.h, for example, which started by including <stdio.h>. Another group might have produced keyboard.h, which also included <stdio.h>. And if <stdio.h> could not safely be included several times, arguments would break out about which header should include it. Sometimes an agreement was reached that each header should include no other headers, and therefore some application programs started with dozens of #include lines -- and sometimes they got the ordering wrong, or forgot a header that was needed.

All these complications disappeared with the discovery of a simple technique: each header should #define a symbol which means "I have already been included." Then the entire header should be enclosed in a "sandwich":

#ifndef HEADER_H

#define HEADER_H

/* ... contents of the header ... */

#endif

Thus, the first time that header.h is #include'd, all of its contents will be included. If it should subsequently be #include'd again, its contents will be by-passed.

Rule 1-14: Enclose each header in an "inclusion sandwich."

Test drivers for library functions

When writing a library function -- a general-purpose callable function, which has a source file to itself -- it can be useful to provide a simple test harness within the same source file. The following pattern can be used:

code for the function

#ifdef TRYMAIN

main()

{

code for the test harness

}

#endif

Now, when we want to compile the function with the test harness, we define the symbol TRYMAIN in the compilation procedure. When we want just the object code for the function, we do not define TRYMAIN, and compile for object-code only. Keeping the harness in the same source file as the function makes later modification easier to test, and provides useful documentation of how the function is supposed to be used.

Conditional debugging

One very important application of conditional compilation is the use of conditional debugging code. The macro name NDEBUG is conventionally used to turn off any conditional debugging code:

cc -DNDEBUG pgm.c

turns off debugging, while an ordinary compilation with

cc pgm.c

will enable debugging. For a specific example, the local.h header contains the following definition:

#ifndef NDEBUG

#define asserts(cond, str) \

{ if (!(cond)) fprintf(stderr, "Assertion '%s' failed\n", str); }

#else

#define asserts(cond, str)

#endif

If the program being compiled contains the line

asserts(x >= 0, "x is non-negative");

an ordinary compilation will generate a test for the non-negativity of x, while compilation with NDEBUG defined will turn off the checking.

Recent C compilers provide a more streamlined version of this facility, in the header <assert.h>. The assert macro takes only one argument, the condition being tested:

assert(x >= 0);

If the assertion fails during execution, a message of this form will be printed:

Assertion failed: "x >= 0", file pgm.c, line 17

Furthermore, the macro calls the function abort, which terminates program execution. (Under UNIX, a "core-image" file is produced, for further assistance in debugging.)

The assert macro is an excellent tool for putting executable assertions into a program, but its streamlined implementation requires several preprocessor features which are not yet universal. The preprocessor must provide the symbols _ _FILE_ _ and _ _LINE_ _ (which identify the source file name and current line number), the preprocessor must be able to make a string constant out of the macro argument, and the library must support the abort function. The asserts macro is intended to provide a more universally-portable substitute until these features are provided by all compilers. In the following chapters, there will be many uses for the asserts macro.

Go to Chapter 2 Return to Table of Contents