1.6.2 Data Representation

Data types must be specified. An example will show why. Suppose a compiler represents integer and real variables (floats) using seven decimal digits. Then in storage that is designated type integer, the value 1230045 represents the integer number 1,230,045. In storage that is designated type float the value 1230045 represents the real number 0.30045 x 10l2. In general, the compiler interprets

d1 d2 . . . d7 of type integer as the integer number dl d2 . . . d7.

and

d1 d2 . . . d7 of type float as the real number 0.d3 d4 . . . d7 x 10d1d2

Given this situation, suppose the compiler must cause the contents of sum = 1230045 to be added to the contents of increment = 1250047,where they are both of type integer. The compiler might apply the usual column-by-column addition algorithm:

sum        1230045

+

increment  1250047

           枛枛枛-

           2480092

Here, 2480092, when stored in sum, represents the correct result. If sum and increment are of type real, then the content of sum represents the real number 0.30045 x 1012 and the content of increment represents the real number 0.50047 x 1012. Their sum is 0.80092 x 1012. However, if the above column-by-column addition algorithm is applied and the result stored in sum, then its contents will still be 2480092, but this now represents the real number 0.80092 x 1024. This is incorrect by a factor of 1012. To represent the correct result, sum should contain 1280092. The difficulty is, of course, that the same algorithm should not be applied for the addition of numbers of both type integer and type float. Two distinct algorithms are needed, because two distinct representations are used. In general, the algorithms used must depend on the data types of the variables involved. The compiler can choose the correct algorithm only when it knows the data types.

Thus it is essential to declare the data type of each variable so that the compiler can choose the proper algorithm to achieve the intended result. Type declarations also allow high-level language compilers to build in checks for validity. Such checks may generate error messages during compilation or during execution of the program. Languages with more restrictive rules for the manipulation of different data types, called strongly typed languages, allow more checking. For example, Pascal's strong typing prevents the contents of a variable of one type from being copied into a variable of a different type. C, FORTRAN, and COBOL allow this to be done in some cases but not in others.

The C data types integer, float, and char or their equivalents are now standard in most high-level languages. The values they may assume, and the operations that may be performed on variables of these types, are specified in each language.