2.1.4 Representation of Records in Arrays

In C the record, or structure, is a basic data structure. In some languages the record is not a built-in feature but must be constructed from the data structures (such as arrays) that are available. To see how this construction might be done, assume C designers had provided the integer, float and char data types, and only one-dimensional arrays. Not only would records have to be constructed, but so would arrays of more dimensions if they were needed. There are two reasons for looking at ways to build data structures.

1. It affords insight into how compilers implement records and higher-dimensional arrays.

2. It demonstrates how programmers build new structures from those that are given in order to tailor data structures to specific needs.

Moreover, even though records and higher-dimensional arrays are available in C, there are times when programmers may find it preferable to use these construction techniques.

Let us consider how to represent n records, each composed of m fields of the same fixed type. Suppose m is 3. A natural solution is to take an array of length 3?/FONT>n, of the same type as the fields, and store the records in the array. The first record takes up the first three entries, the second record the next three entries, and so on, as in Figure 2.5. Each field of a record will be stored in the corresponding one of its three entries.

The ith record begins in position 3*(i - 1) of the array. Thus we may select the ith record. To access a specific field of the ith record (say, the third) requires that position 3*(i - 1) + 2 of the array be accessed. Although this is the (3*(i - 1) + 2)th array element, it contains the third field value of the ith record. Thus we may select any field of any record too.

It is also easy to traverse the records, using a loop. For example, the following function traverses the n records stored in the array a and prints the value in their kth field, where k may be 1, 2, or 3.

>Comment

typedef int fieldtype;

array is the name for the type: array of fieldtype of size 30

typedef fieldtype array [30];

printfield(a,n,k)

/* Prints the contents of the

kth field of each of the first

n records stored in array a.

*/

>Comment

array a;

int n,k;

{

>Comment

int length = 3, i;

for(i=0; i<n; i++)

printf("\n %d\n",a[length * i + (k-1)]);

}

In C, since arrays must be homogeneous (consisting of entries of the same type), this method is applicable only to homogeneous records?/FONT>those whose fields are all of the same type. How then to handle the more general case of records whose fields are not homogeneous? The following example demonstrates the general concept and then how it is achieved in C.

Example 2.4

Consider Figure 2.6, which represents an array cars containing a record for each car of an automobile rental agency. The record for each car has an entry for the make (characters), monthly rate (a real number), current mileage (an integer), and the rentee (characters). As noted, this array, with entries of different types, could not be constructed in C. For the sake of simplicity, the following discussion does not deal with the actual storage taken by a character, float, or integer variable, but assumes that each takes one element of memory. All the records in the array are 42 elements long, using 15 elements for the make, 1 element each for the monthly rate and current mileage, and 25 elements for the rentee field. n

Figure 2.5 An Array of Records (of Length 3)

Figure 2.6 Records of a Car Rental Agency Stored in an Array CARS

An offset gives the position of a data item with respect to a given reference point. When you say, "the third house from the corner," you are using an offset reference point. Each field has a field offset from the beginning of the record. In this case, the field offsets are 0, 15, 16, and 17, respectively, for make, monthly rate, current mileage, and rentee. The ith record has a record offset of 42?/FONT>(i - 1) from the beginning of the array. The offset for its rentee field from the beginning of the array is 42?/FONT>(i - 1) + 17, the sum of the record offset for the ith record and the field offset for the rentee.

In order to reference, for example, the current mileage field of the ith record, the programmer may refer to cars[42*(i-1)+16], assuming the first record starts in element zero of the array. Only the fields may be referred to by the programmer; the record itself cannot be dealt with directly as a unit. In C the task is simplified for the programmer by the use of structures and arrays of structures. Carrecord is defined as a structure and cars[100] as an array of 100 carrecords.

struct carrecord

{

char make[15];

float rate;

int mileage;

char rentee[25];

}cars[100];

Then the C programmer may refer to the (i + 1)th record as a unit as cars[i], and to its mileage field by cars[i].mileage. Treating a record as a unit allows intentions to be stated more clearly in a program. It also allows for the contents of one record to be assigned to the storage for another record more efficiently. This is analogous to being able to refer to an automobile and say "Move that automobile from here to there," rather than specifying the automobile or the move by referring to each of its parts individually. Languages that do not provide the record data structure prevent the programmer from referring to a record as a unit in this way.

Note that the record referred to during the execution of a program may be changed merely by changing the value of i, but the field referred to cannot be changed, since field names may not be changed during execution. The variable names cars[i].mileage shows this. The value of array index i can be changed, but the name cars cannot be. This means that code referring to the ith position of an array, such as cars[i], may refer to different array entries as aprogram executes, by computing a new value for i. Code using a field name will always refer to that particular field.

This example gives a technique for sorting records in languages without strong typing constraints. Even in languages with such constraints, the compiler might implement cars in the actual computer memory as illustrated in Figure 2.6. In this case, if the initial array address in memory is 100 rather than 0, then the offset (or relative position) for the (i + 1)th record is added to 100 instead of to 0. The compiler can produce commands that calculate the actual position in memory of any field of the ith record by adding the record offset and field offset to 100. Then a programmer's reference to cars[1] is interpreted as a reference to the 42 consecutive locations 142 to 183. Similarly, cars [1].rate refers to the entry in location 157. This assumes the array starts at memory location 100. The offsets are independent of the starting location. When applied to homogeneous records, this method requires that cars be defined to be the same type as the record's fields.

Observe that the cars array preserves the homogeneity of type that is required of all arrays in C. Its entries are now all of the same specific record type. The array entries just happen to be of type carrecord, even though a record itself is not homogeneous. C allows operations to be performed directly on structures, treating them as a unit. Of course, the compiler hides the details of the records' implementation from readers of the program and provides facility in handling records.

The method just discussed is a single-array implementation. The second method of representing the n records is a multiarray implementation, shown in Figure 2.7. The method simply uses a different array for each field. Each array then contains information of the appropriate type only. There is no direct way to refer to the collective notion cars[2]. Instead, its data are given by make[2],rate[2], mileage[2], and rentee[2]. Selection of the ith record and traversal through the records is also possible for this method.

The single-array technique may cause problems when all entries are not of the same type. As noted, however, Example 2.4 shows offset calculations of the type that may be used by a compiler but are invisible to the programmer. The multiarray technique is useful in any language that provides arrays. Both techniques fail to give the programmer means to refer directly to, or to manipulate as a unit, an entire record composed of more than one field. The C approach provides the record data structure and makes the underlying details transparent to the C programmer, thus providing an elegant solution to the problem of treating information of different types as a unit.

Figure 2.7 A Multiarray Storage of Records of a Car Rental Agency