# CHAPTER 2: SCALARS

Throughout this chapter, we will address the reliable use of scalars. As always, our desire is that the program actually does what it appears to do. Our motto for reliable programming continues to be "No surprises!"

# 2.1 The Math Library, with <math.h>

`double ceil(x)    - smallest integer not less than x`

`double cos(x)     - cosine of x`

`double exp(x)     - exponential function of x`

`double floor(x)   - largest integer not greater than x`

`double log(x)     - natural log of x`

`double log10(x)   - base-0 log of x`

`double pow(x, y)  - raise x to the power y`

`double sin(x)     - sine of x`

`double sqrt(x)    - square root of x`

`double tan(x)     - tangent of x`

`sqrtx.c:`

`/* sqrttst - test accuracy of sqrt`

`*/`

`#include "local.h"`

`main()`

`{`

`double x, y;`

`for (x = 1.; x <= 25.; x = x + 1.)`

`{`

`y = sqrt(x) * sqrt(x);`

`if (y != x)`

`printf("%5.1f %21.17f %10.2e\n", x, y, x-y);`

`}`

`}`

`sqrtx.out:`

`2.0   2.00000000000000010  -5.55e-17`

`3.0   3.00000000000000010  -5.55e-17`

`7.0   6.99999999999999990   2.22e-16`

`8.0   8.00000000000000020  -2.22e-16`

`10.0  10.00000000000000000  -2.22e-16`

`12.0  12.00000000000000000  -2.22e-16`

`15.0  15.00000000000000000  -2.22e-16`

`17.0  17.00000000000000100  -4.44e-16`

`18.0  18.00000000000000000   4.44e-16`

`21.0  21.00000000000000000   4.44e-16`

`22.0  22.00000000000000000   4.44e-16`

`23.0  22.99999999999999900   8.88e-16`

Another aspect of floating representation concerns conversion from floating-point to integer. When a positive floating-point value is converted to an integer value, the fractional digits are lost, i.e., truncated [2-2]. When a negative floating-point value is converted, the truncation may be toward or away from zero, depending on implementation.

Rule 2-1: When exactness counts in converting floating-point to integer, be sure the value being converted is non-negative.

`pmt = bal * (intmo / (1. - pow(1. + intmo, (double)-npmts)));`

With balance (bal) equal to \$82,500.00, monthly interest (intmo) equal to 16.25% / 12, and number of payments (npmts) equal to 25 times 12 (i.e. 300), the C computation produces a result (pmt) equal to 1137.29654996 (showing just eight places). Since this result is about 35/100 different from an exact number of pennies, it will not take many additions or subtractions before the intermediate results start to differ noticeably from true figures.

`round:`

`/* round - adjust floating dollars to even pennies */`

`#include "local.h"`

`double round(dollars)`

`double dollars;`

`{`

`double pennies;`

`pennies = floor(dollars * 100 + .5);`

`return (pennies / 100.);`

`}`

which will give a "fair" or "symmetric" rounding. Thus, round(1137.2965) produces the result 1137.30, but round(1137.2925) would produce 1137.29.

The floor function produces a double result equal to the largest integer that is not larger than the argument; i.e., it "truncates" the argument's fraction. If we wanted a function that would round all fractional pennies upward (in favor of the bank), we could use the ceil function, which produces a double result equal to the smallest integer that is not smaller than the argument.

`roundup:`

`/* roundup - adjust floating dollars to even pennies`

`* round all fractional pennies upward`

`*/`

`#include "local.h"`

`double roundup(dollars)`

`double dollars;`

`{`

`double pennies;`

`pennies = ceil(dollars * 100.);`

`return (pennies / 100.);`

`}`

Using roundup(1137.2925) produces 1137.30.

We turn now from errors of representation to errors of computation. Most of the math functions can fail, if they are given arguments which do not produce a proper result. For example, the square root of a negative number, such as sqrt(-1.0), has no meaning in "real" arithmetic. This is known as a domain error; the argument is outside the domain over which the function is defined. For another example, ten raised to the one-millionth power, pow(10., 1e6), cannot be represented in any (current) floating representation. This is known as a range error; the result cannot be represented as a proper double number. In all such error cases, the function will return some value, but the value returned will not be the "correct" value [2-3]. The sqrt function returns zero, when given a negative argument. The pow function returns a very large number (appropriately positive or negative) on an "overflow" range error. This "very large number" is defined in <math.h> as HUGE_VAL (or in some versions, HUGE).

I would advise you against testing for errors by comparing the returned value against HUGE_VAL (or -HUGE_VAL, or zero), for several reasons. First of all, these are in general valid (albeit unlikely) data values. Secondly, making such tests requires detailed knowledge of the various error returns for each math function - knowledge which is otherwise useless. Also, there are three different possibilities - -HUGE_VAL, O, and HUGE_VAL - and you must know which are possible in each case. Finally, different versions of the library have differed in their error-return behavior.

`errno = 0;`

`perform computation`

`if (errno != 0)`

`handle the error situation: report, compute differently, etc.`

`else`

`computation was error-free`

Rule 2-2: Test errno before using results from the math functions.

# 2.2 Error-handling

Diagnose and terminate: This is one of the simplest approaches, suitable to interactive use only. The restart point is re-execution of the entire program.

Diagnose and re-prompt: In this approach, computation does not begin until the program determines that input values will not cause errors in processing. If inputs are invalid, the program can prompt for replacement values. The restart point is the prompting for re-entry.

Re-try the operation: If the error is of a transient nature (e.g., a read error from disk), the program can re-try the operation, up to some pre-set maximum number of re-tries. The restart point is the re-try.

Terminate one transaction and proceed to the next: In this approach, the input is a series of independent transactions, such that one transaction can be diagnosed and terminated. This approach is common on batch systems in which the user is not assumed to be interactively monitoring the execution, but it may also be appropriate in some standalone systems. The restart point is the beginning of the next transaction.

Re-execute the computation with different algorithm: Sometimes one has a fast algorithm which sometimes fails, and a slower algorithm which always works. In this case, the restart point is the execution of the slower, more robust algorithm.

Ignore errors completely: This approach is only suitable when the error will be obvious to the user from the output of the program. Since no action is taken, there is no restart point. This is usually not suitable for reliable programming.

# 2.3 Character Tests, with <ctype.h>

Topics assumed from previous book: <ctype.h> [3.5].

In Chapter 1, a safe macro function is defined to be one which evaluates each argument exactly once, and observed that safe macro functions can reliably be used as replacements for function calls. This approach to macro-versus-function tradeoff is most simply illustrated by the "character-tests" in <ctype.h>. These have usually been implemented as safe macros. ANSI C requires that they also have true function versions in the library. As mentioned in Chapter 1, the true function version of, for example, isdigit can be obtained by using

`#undef isdigit`

(See the appendix for a list of the character-test functions from the library of recent C compilers[2-5].)

These functions are the most portable way of testing and converting characters, so their use is encouraged.

Rule 2-3: Use the <ctype.h> facilities for character tests and upper-lower conversions.

# 2.4 Boolean Data

Topics assumed from previous book: Semi-Boolean tests [3.2], bool [3.21], tbool [3.21].

As described in the introductory book, the logic of test conditions (if, while, for, do) is "semi-Boolean," in that zero means false and non-zero means true. In other words, any test expression is implicitly compared against zero, and many existing C programs employ this implicit comparison in shorthand uses. One often sees

`if (c)  instead of  if (c != '\0')  and`

`if (p)  instead of  if (p != NULL)`

`if (i = j)  either wrong or misleading`

is syntactically valid; it means the same as

`if ((i = j) != 0)`

but the mistaken use of the "single-equal" assignment instead of the "double-equal" comparison is a common mistake. A new generation of C interpreters has the potential for eliminating this type of mistake, but the only simple way to achieve it is to require that all test conditions must clearly be "Boolean" expressions. Any expression whose top-level operators are relational and logical is obviously a "Boolean" expression, but what about Boolean variables? From the interpreter's point of view, any value other than zero or one might be assumed to be non-Boolean.

From the human reader's point of view, the program will be easier to understand if Boolean variables are clearly indicated as such. In the type scheme of C Programming Guidelines [Plum, 1984], two defined-types are provided for Boolean variables:

`bool   designates an int-sized Boolean variable, and`

`tbool  designates a char-sized Boolean variable.`

With an eye toward future automated assistance, we propose the following reliability rules:

Rule 2-4: Make sure that Boolean variables are assigned the values zero and one. This means that the type tbool is always adequate, and if this rule is part of local standards, the types bool and tbool could be made synonymous.

Rule 2-5: Make sure that each test condition is Boolean, involving only Boolean types or relational and logical operators.

In most modern compilers, there is no efficiency penalty for saying

`while (*s1++ != '\0')`

`while (*s1++)       not strictly Boolean`

# 2.5 Enumeration Types

`enum stoplight {red, yellow, green};`

This establishes enum stoplight as a type which can be used in further declarations, such as

`enum stoplight main_st, front_st;`

And the enumeration constants (red, yellow, and green) become symbolic names for the integer constants 0, 1, and 2. Each of the enumeration constants can be explicitly initialized, as in

`enum stoplight {red = 'r', yellow = 'y', green = 'g'};`

If an uninitialized enumeration constant follows one which is initialized, its value is one greater than the previous constant. To avoid surprises, we suggest this rule:

Rule 2-6: An enumeration's constants should all be initialized, or else none of them should be initialized.

There are some reliability problems with the semantics of enum. In a strict usage of enum data, the variables main_st and front_st would be assigned no values other than the named enumeration constants, red, yellow, and green, like this:

`main_st = red;`

That is not, however, what C requires. Instead, an enumeration variable is treated just like an int variable, except that its actual storage size might be optimized according to the range of the enumeration constants. The declaration of an enumeration type is therefore more or less equivalent to a listing of #define'd names, with a few minor advantages: The scope of the names is limited (just like ordinary identifiers), and it is marginally easier to add new constants to the list without having to manually change their values.

Future implementations could invoke stricter checking if programs were written according to stricter rules:

Rule 2-7: Write programs as if enumeration variables could receive no values other than the associated enumeration constants. Treat the enumeration types as if they were unique types, not for any arithmetic int usages. Convert between enumeration variables and integer values only by use of an explicit cast.

This allows, at least, easier inspection of the code by reviewers, and creates the possibility of stricter lint-like checking.

# 2.6 Range-Checking

`double heading;  /* degrees: {0:360} */`

If we follow the notational convention that the colon is taken to mean "must be," then the comment can be understood to mean "heading is measured in degrees, and must be between 0 and 360, inclusive."

Executable assertions can be useful for putting teeth into assumptions about ranges of data. In this case, an appropriate assertion on entry to a function using heading would look like this:

`asserts(0<= heading && heading <= 360, "heading range ok");`

Tests of this form are common enough to deserve a macro:

`#define IN_RANGE(n, lo, hi)  ((lo) <= (n) && (n) <= (hi))`

which would allow the assertion to be given in this form:

`asserts(IN_RANGE(heading, 0, 360), "heading range ok");`

`for (month_no = 0; month_no < 12; ++month_no)`

`process array[month_no];`

The for loop always involves a "one-too-far" value at the end of the loop; month_no must be incremented to 12 in order for the loop test (month_no < 12) to terminate the loop.

Rule 2-8: Include "one-too-far" values in the ranges for variables, if they are needed for loop terminations or other testing purposes.

Rule 2-9: Function parameters accepting the size of an arbitrarily large object should be declared with size_t type.

`reverse:`

`/* reverse - reverse the order of a string */`

`#include "local.h"`

`void reverse(s)`

`char s[ ];`

`{`

`char t;`

`size_t i, j;`

`if ((j = strlen(s)) == 0)`

`return;`

`for (i = 0, j = j - 1; i < j; ++i, -- j)`

`SWAP(s[i], s[j], t);`

`}`

A related question concerns how a subscript variable should be declared. The lowest subscript of an array is always zero in C, but regarding the highest subscript there are new complications with recent versions of C. Some implementations allow declaration of arrays with more elements than the largest (signed) int value, whereas in K&R [Kernighan and Ritchie, 1978], subscripts were always representable as int values. Recent compilers for C allow for larger subscripts. For one example, the "large memory model" of the 8086 family of machines allows for objects larger than 32K elements, but the int is still 16 bits. Thus, in general, portable code (especially library functions callable in any context) must assume that a subscript could require an unsigned representation. Fortunately for portability, there is always a valid "one-too-far" value greater than the largest possible subscript, because in zero-origin indexing, the largest subscript is always one less than the largest dimension value for the array. However, a variable capable of holding the largest subscript must be an unsigned variety of integer, which eliminates the possibility of a "one-too-far" value at the low end.

# 2.7 Signed and Unsigned Arithmetic

In this section, all our examples will be easier to describe if we assume that int has 16 bits, but the principles are the same for any size of int. Most C machines use twos complement arithmetic, in which the smallest int has the value -32768, and the largest int has the value 32767. (These values are named INT_MIN and INT_MAX, respectively, in <limits.h>, as described in Section 1.1.) On a hypothetical 16-bit ones complement machine, INT_MIN would be -32767 and INT_MAX would be 32767. Notice that the very smallest twos-complement integer has no counterpart in ones complement, nor does it have a corresponding positive integer in either system.

While twos complement is the most popular usage in the C world, programmers aiming for the widest portability must consider various problems regarding ones complement, even if their current machine does not use it. More precisely, these are reliability problems for portable code, in that they can cause nasty surprises if not attended to. We shall see several such problems in this section.

`+- ... +-+-+-+-+ ... -+`

`INT_MIN      0 1 2    INT_MIX`

Subtraction of two ordinary int values produces a (signed) int result, as in the subtraction of two temperatures:

`int temp0, temp1, delta_temp;`

`delta_temp = temp1 - temp0;`

This gives an algebraically correct result as long as the true difference lies between INT_MIN and INT_MAX, inclusive. But if temp1 and temp0 are too far apart, an overflow takes place in computing delta_temp, and the result is undefined.

Unsigned arithmetic is based on a "modulus" arithmetic, like an odometer or a timer counter. Consider two unsigned integers timer0 and timer1 which represent successive samples from a timer counter:

`unsigned int delta_time, timer0, timer1;`

`delta_time = timer1 - timer0;`

`unsigned int weight0, weight1;`

`int delta_weight;`

`delta_weight = weight1 - weight0;`

In either case, there are surprises, but the "twos-complement mimic" behavior is slightly less surprising than the "raw-bits" behavior. If a ones-complement environment uses the "raw-bits" treatment of conversion from unsigned to signed integers, we must do some work in order to reverse the "modulus" behavior of signed-to-unsigned conversion.

Here is an example. If the (signed) integer -1 is converted to an unsigned integer, the unsigned result consists of all l-bits. When these "raw bits" are converted back to signed integer, the result is zero. Here, then, is a 36-bit ones-complement version of an "unsigned int to int" conversion macro, called UI_TO_I, which can be found in the header portdefs.h:

`#define UI_TO_I(n) (((n) & 0x800000000) == 0 ? (n) : -(n) -1)`

Be aware that this is a non-unique mapping. Both 0 and 0x800000000 (2 to the 35-th power) are mapped into 0. Since 2 to the 35-th power could not occur in the typical 32-bit twos-complement environment, this is not likely to cause any portability problem.

In a twos-complement environment, all that UI_TO_I has to do is to cast the result to int:

`#define UI_TO_I(n)  (int)(n)`

`UINT_MAX`

`V 0 1 2     INT_MAX  [unsigned interpretation]`

`+- ... +-+-+-+-+ ... -+`

`INT_MIN?   -1 0 1 2     INT_MAX  [signed (2s-comp) interpretation]`

Any computations involving qsigned values are performed in the same (twos-complement) arithmetic that is used for unsigned values. (If the machine's signed arithmetic is ones-complement, the INT_MIN limit is actually equal to the smallest qsigned value plus one; this corresponds to the conversions that should take place when going from unsigned values to signed values in such environments.) When a qsigned int value is assigned to an unsigned int variable, the unsigned interpretation is followed. When a qsigned int value is converted to a (signed) int value (using the UI_TO_I macro), the signed (twos-complement) interpretation is followed.

`long delta_timer;`

`delta_timer = timer1 - timer0;`

produces a non-negative value, and that

`long delta_weight;`

`delta_weight = weight1 - weight0;`

produces a signed value. Of course, both cannot be true; the difference between them is only in the mind of the programmer, not in the semantics of C. (The second example is the incorrect usage -- the unsigned difference becomes a positive long integer.) Accordingly, we suggest that the conversion of a qsigned int result to a larger long int type should (for maximum reliability) be considered an undefined operation. In other words, before assigning or casting a qsigned value to a larger int type, one should convert the value explicitly either with UI_TO_I or with (unsigned). Thus, we should write

`delta_timer = (unsigned)(timer1 - timer0);`

and

`delta_weight = UI_TO_I(weight1 - weight0);`

This can all be summarized by a reliability rule:

Rule 2-10: When two unsigned int's are subtracted, convert the result using either (unsigned) or UI_TO_I.

`/* modulo macro giving non-negative result */`

`#define IMOD(i, j) (((i) % (j)) < 0 ? ((i) % (j)) + (j) : ((i) % (j)))`

`/* if i % j is never negative, replace with the following line: */`

`/* #define IMOD(i, j) ((i) % (j)) */`

Rule 2-11: Use IMOD (or some similar mechanism) to ensure that a nonnegative modulo result is produced.

# 2.8 Overflow

Many C programmers have assumed that (on a twos-complement machine) overflows can be ignored in most cases. Consider the following (simplified) version of the atoi function:

`atopi(#1):`

`/* atopi - convert string to positive integer */`

`int atopi(s)`

`char s[ ];`

`{`

`int n = 0;`

`int i;`

`for (i = 0; isdigit(s[i]); ++i)`

`n = 10 * n + s[i] - '0';`

`return (n);`

`}`

`3276 * 10  =  32760`

`  + '7'      +   55`

`             ------`

`              32815  momentarily greater than INT_MAX`

`  - '0'      -   48`

`             ------`

`              32767  now within proper range`

A "true" overflow takes place when the correct answer is not capable of being produced, as in atopi ("999999999999").

`n = 10 * n + (s[i] - '0');`

Existing conversion functions in C libraries have tended to ignore overflows. In recent C libraries, there are conversion functions with a more constrained behavior. On overflow, they return a maximum value, and use errno to indicate that an error has taken place. In the case of atopi, we could more simply return a special value, such as -1, to indicate the overflow error. No overflows will take place during the computation, and an excessively-large argument value is diagnosed. Applying this philosophy to our atopi function would produce something like this:

`atopi(#2):`

`/* atopi - convert string to positive integer {0:INT_MAX} */`

`#include <limits.h>`

`int atopi(s)`

`char s[ ];`

`{`

`unsigned n = 0; /* the converted number: {0:INT_MAX} */`

`int i;`

`for (i = 0; isdigit(s[i]) && n <= INT_MAX/10; ++i)`

`{`

`if (n > INT_MAX / 10)`

`return (-1);`

`n = 10 * n + (s[i] - '0');`

`}`

`if (n > INT_MAX  isdigit(s[i]))`

`return (-1);`

`else`

`return (n);`

`}`

Rule 2-12: In (signed) integer arithmetic, assume that overflow is illegal, may be detected (hence should never be programmed), and cannot be trapped or ignored. earlier, using unsigned can at least prevent "fussy" overflows.

# 2.9 Data Properties

Throughout the book, we will refer to three aspects of reliable use of data: type, properties, and allowable operations.

## Type

By "type," we refer both to the way the data is represented in memory, and to the syntactic type on the declaration. Early C compilers were rather loose about the agreement of data types; code like this was allowable:

`char *p;`

`int i;`

`i = p + 2;   loose and dangerous`

C performs a number of conversions upon expression values, so there is an important distinction to make between the "declared" type and the "converted" type. For example, in

`int i, j;`

`char c;`

`i = j + c;`

the declared type of c is char, but when c appears in an expression like this, its converted type is int.

The main point is that C programmers must know the exact type of each variable and each expression in which the variable is used.

## Properties

`long sum;`

`int i;`

`for ( i = 0; i < N; ++i )`

`sum += a[i];      error - sum not initialized`

In C, automatic variables are undefined until they are initialized; static variables are initialized by default, so they are initially defined. A variable can acquire an undefined value as the result of an invalid operation;

`n = m / 0;`

causes n to become undefined, possibly terminating execution as well.

Similarly, when an overflow takes place in a computation, the result of the expression is undefined. And when a value is assigned to an object that is too small to hold the value, the object's value becomes undefined.

Besides this fundamental distinction between defined and undefined values, a variable may have other properties ascribed to it by the programmer. In Section 2.6, we discussed programmer-defined ranges for variables, documented like this:

`double heading;   /* degrees: {0:360} */`

By this comment, the programmer indicates that the property of being between 0 and 360 is an important property for this variable. If heading receives the value -1, the variable is "defined" from the point of view of C, but it is not properly defined according to the programmer's intention.

In previous sections, we saw other properties that the programmer can ascribe to a variable, by using an informative defined-type: metachar (a valid char-sized number or EOF), bool and tbool (zero or one), and bits (an unsigned number for bit manipulation).

We will use the term well-defined to refer to a value that conforms to the programmer's stated defining property. If, for example, a metachar variable were to receive the value 5000, the variable would not be "well-defined," because it exceeds the range of proper char-sized numbers.

Besides stating desired properties on the declaration of variables, the programmer can also document properties of function returned values. A function declared like this

`bool is_empty();   /* is node empty? */`

or like this

`int is_empty();    /* is node empty?: bool */`

is indicated to return only Boolean values. This documentation is important for determining that the function is producing a value that is appropriate for the context in which it is used.

We will be following a convention that a comment with a colon followed by a property name means "must have this property."

Rule 2-13: Document the defining properties of declared names in a comment on the declaration, using a convention such as "colon followed by property."

## Allowable operations

The type and properties of a expression determine the operations that are allowable on that expression. For example, a bool variable should not be the operand of arithmetic operations like multiply or divide; the operations just do not make sense for the intended properties of the variable. Sometimes the type itself restricts the operations. For example, bitwise operations are not allowed upon any floating-point values.

In later chapters, we will see more complicated objects such as arrays, structures, stacks, and trees, each of which has its own repertoire of allowable operations.