CHAPTER 8: FILES

8.1 Standard (Default) Files

One of the virtues that C inherited from UNIX systems is a uniform concept of data transfer through files. A file, to a C program, is a transmission stream of bytes to or from the main memory. The ultimate source or sink for the bytes may be of many forms: disk file, tape, printer, communication channel to other programs, etc. With a few exceptions, the C program can be written independent of the ultimate source or sink

One important distinction about files is present in most operating systems except for UNIX: the distinction between binary files and text files.

A binary file is a one-for-one transmission stream of bytes. If a program writes a binary file onto an external medium, that file will read byte-for-byte the same when read back in as a binary file (provided that the same compiler and machine are used both times). However, some implementations may append extra nul characters to conform to binary block-size conventions.

A text file is a mapping between internal machine bytes and external conventions for representation of text. For example, some implementations map the newline ('\n') into a two-character sequence (carriage-return, newline). Some append a ctrl-Z character to mark the end of file. Some systems delete some nonprinting characters from text files.

On UNIX systems, of course, there are no such distinctions. On other systems, binary files might seem simpler to use and understand, but on such systems most of the utility programs that deal with text will expect to see files that conform to the text-file conventions. Therefore, on non-UNIX systems, the default type for files will generally be the text file.

The most portable scheme for accessing files originated with UNIX 7th Edition. Each file is represented by a FILE structure, whose contents are specified by the header <stdio.h>. (A FILE object is, strictly speaking, not required to be a C structure, but most implementations do define it that way.) Such files have generally been implemented with buffers associated with the structure, so they have sometimes been referred to as "buffered" files. Since each operating system may have its own treatment of I/O buffers, we will follow ANSI C in adopting the more neutral term stream file to designate the type of file represented by a FILE structure.

8.2 Processing Named Files

To perform I/O on files opened by name, we use the function

fopen:

FILE *fp;

fp = fopen("input.dat", "r");

will open the file input.dat for reading ("r"). The variable fp was declared as a pointer to FILE. After the fopen call, fp will point to a FILE properly initialized for reading. If the open fails, fp will contain a NULL pointer; hence this rule:

Rule 8-1: Always test the returned value from fopen to be sure that the open succeeded.

The reasons why an open might fail are implementation-defined; some possible reasons are that no file exists by the specified name, or the file may exist but be unreadable to the program, or the program has already opened the maximum allowable number of files.

The file-name argument must specify a name in a format acceptable to the host system, and systems differ in their naming rules for files. In general, a name in the form xxxxxx.xxx, where x is a letter or a digit, will be acceptable to most C environments. Name formats such as /tmp/t11111, a:infile, or [70,40]datfil will work only on some systems.

Note that the environment may be allowed some liberty in mapping the name string into a system's format for file names, for example by mapping lower case into upper case, or by mapping the "'dot" (.) into some other character.

Note also that the file name may specify a device (terminal, printer, tape drive, etc.) if the host system provides the appropriate name mapping. However, there are no standard names for any devices, so such usage is potentially non-portable.

A file thus opened for reading can be used with functions analogous to the default-input functions shown earlier:

Named-file function  Default-file function

getc(fp)              getchar()

fscanf(fp, fmt, ...)  scanf(fmt, ...)

fgets(s, n, fp)       gets(s)

Note, however, that fgets (unlike getc and fscanf) is not functionally equivalent to its default-file version: fgets takes an argument specifying the size of the available storage in s, and fgets does not delete the newline from the input characters (whereas gets does delete the newline).

A file can be opened for output by specifying "w" as its usage. If the file existed previously, it will be truncated to zero length. If it did not exist, it is created with zero length. Output functions analogous to the default-file versions are available:

Named-file function   Default-file function

putc(c, fp)            putchar(c)

fprintf(fp, fmt, ...)  printf(fmt, ...)

fputs(s, fp)           puts(s)

As with gets, there is a functional difference between fputs and puts: fputs does not tack a newline onto its output, whereas puts does.

A third option available from fopen is "a" ("append") mode. Theprevious contents of the file (if any) are left undisturbed; each write operation will take place at the end of the file.

Besides the functions above which have default-file analogs, there are a few functions available only in their named-file versions, notably fread, and fwrite. The call

n = fread(s, elt_size, num_elts, fp);

will read a designated number of elements (each of a designated size) into the object pointed to by s, using the file specified by fp. The returned value tells how many elements were successfully read. A return of zero is ambiguous; it might indicate end-of-file, or it might reflect the occurrence of an I/O error before any elements were read.

A call upon feof(fp) will return YES if the file fp is at end-of-file, and NO otherwise. Similarly, ferror(fp) will tell whether any I/O errors have taken place on the file. A program can call clearerr(fp) to clear both the end-of-file and the I/O error indicators for the file.

When fread is called upon to read an array of char's, with elt_size specified as 1, the usage is portable onto any system. If the elements are multi-byte data, the byte-ordering is environment-dependent. Similarly for writing files with fwrite:

n = fwrite(s, elt_size, num_elts, fp);

will write a designated number of elements (each of a designated size) onto the file specified by fp.

When the program is finished with a file, the file should be closed via fclose(fp).

As mentioned in Section 5.2, the buffering behavior of a file can be changed by calling the setbuf function after opening the file but prior to any file accesses. Calling setbuf(fp, NULL) turns off the buffering; calling setbuf(fp, mybuf) causes the array mybuf (which should be of size BUFSIZ) to be used as the buffer for file fp. One use for the latter option might be an environment in which dynamic "heap" memory is limited; a statically-declared buffer can be used instead of one that would be dynamically allocated by the I/O functions.

8.3 Direct Access and Fseek

If a stream file is accessing a "seekable" medium (such as a disk or diskette), a certain measure of direct access is provided by the functions fseek and ftell. For now, we will continue to discuss "text" files, which are the type of file that is specified by the "r", "w", and "a" modes of fopen. The number of char's actually read by a C program, or the number of char's written by the program, may be different from the number of bytes recorded in the file (in a non-UNIX environment). For example, some systems record two bytes (carriage-return, newline) at the end of each text line, but the C program will only see the '\n' (newline) character.

During the reading or writing of a file, the program can call

position = ftell(fp);

which sets position to a long integer that represents the current position of the file. At some later time, the program could call

success = fseek(fp, position, SEEK_SET);

which would re-position the file to the same place where the original ftell call was performed. A success return of zero indicates no errors; anything else is an error. (The name SEEK_SET is defined in various places in recent C libraries: <unistd.h> for systems conforming to the /usr/group standard, <stdio.h> and also <stddef.h> according to draft ANSI C. If neither definition is available, put one in your stddef.h header. The conventional definition for SEEK_SET is zero.)

It is possible to test whether a file is on a "seekable" device by using

seekable = (fseek(fp, 0L, SEEK_CUR) == 0);

where SEEK_CUR is conventionally equal to l. This call of fseek specifies a displacement of zero bytes relative to the current position. In other words, it has no effect except to test the returned value from fseek. (If fp turns out not to specify a seekable file, its error indicator will be set as a result of the call to fseek. Invoking clearerr(fp) will clear the error indicator.)

A text file can be positioned to the end of the file, by calling

success = fseek(fp, 0L, SEEK_END);

where SEEK_END is conventionally equal to 2.

And a file can be re-positioned to the very beginning by calling

success = fseek(fp, 0L, SEEK_SET);

Because of the text-mapping problem, these four usages of fseek are the only ones that can be portably used on text files.

It is possible to fopen a file for both reading and writing, but because of the text-mapping problem, these read/write modes are difficult to use on text files. It is time to discuss binary I/O.

8.4 Binary l/O

In the library proposed for ANSI C, a trailing b can be appended to the fopen mode string to indicate a binary file. For example,

fopen("payroll.dat", " rb" )

would open the binary file payroll.dat for reading. Here, then, is the complete list of mode options for fopen:

Text  Binary  Action

"r"   "rb"   Open for reading

"w"   "wb"   Truncate or create for writing

"a"   "ab"   Open or create for writing at end

"r+"  "r+b"  Open for reading and writing

"w+"  "w+b"  Truncate or create for reading and writing

"a+"  "a+b"  Open or create for reading and writing at end

The "reading and writing" modes are most generally useful in binary mode, since over-writing a text file can give unpredictable results; the new text may take more or less space than the previous file contents. In binary mode, however, the number of bytes written into the file is exactly the number of characters specified by the program. (Some environments may provide binary treatment for r+, w+, and a+ modes even without the trailing b on the mode.)

We could, for example, treat a binary file as consisting of a number of fixed-size records, each of which can be read, modified in some way, and then written back into its previous space:

FILE *fp;

fp = fopen("records.dat", "r+b");

if (fp == NULL)

error("can' t open", "records.dat");

/* ... */

if (fseek(fp, (long)recno * recsize, SEEK_SET) != 0) /* seek the record */

can't find specified record

if (fread(rec, recsize, 1, fp) != 1)  /* read the record */

can't read the record

modify the record contents

if (fseek(fp, (long)recno * recsize, SEEK_SET) != 0) /* seek record again */

can't find record

if (fwrite(rec, recsize, 1, fp) != 1)  /* re-write the record */

can't rewrite record

To allow reasonably efficient treatment of buffered files, it is required that an fseek must be performed when switching from reading to writing, or vice versa.

There are some problems involved with this form of binary I/O. For one, many versions of fread and fwrite do not perform very efficiently when used in this manner. And many existing compilers will complain about the b in a mode string. For these reasons, the next section will present a different approach to binary I/O.

8.5 Record I/O

Most existing implementations of C provide another set of functions for performing record-oriented I/O, namely the functions open, close, read, write, and lseek. (These functions are sometimes known as the "file-descriptor library," because they refer to files by numbers known as file descriptors, rather than by the "pointer to FILE" mechanism provided for stream files.) In many implementations they are the most efficient functions for the purpose. They are, however, implemented with a variety of system-dependent options and behaviors, and they will probably not be part of the ANSI C standard. If we restrict ourselves to the problem of doing record-oriented I/O on seekable devices like disks, we can still make effective use of this type of functions.

Our approach will be to provide a header to give access to a set of functions to be known as bin_open, bin_close, bin_read, bin_write, and bin_lseek. On systems with direct equivalents in the underlying library, these names can simply be mapped into the corresponding library name. On other systems, some new C functions must be provided, possibly making use of the (ANSI-style) stream library. Thus, the actual implementation of bin_open will be system-dependent, but the program that calls bin_open will itself be portable [8-1].

We must define a new header, bin_io.h, which will provide the necessary mappings. This version of the bin_io.h header will work on MS-DOS (with Lattice), Idris, and UNIX systems:

bin_io.h:

/* bin_io.h - header for binary file I/O functions

* SYSTEM DEPENDENT - MUST BE CONFIGURED FOR EACH TARGET SYSTEM

*/

#ifndef BIN_IO_H

#define BIN_IO_H

#include "local.h"

#include "fcntl.h"    /* provide your own if not standard on system */

typedef int bin_fd;   /* "binary file descriptor" {0:BIN_NFILE-1} */

#define BIN_NFILE 20  /* adjust to local system */

#define O_RWMODE (O_RDONLY|O_WRONLY|O_RDWR) /* uses symbols from fcntl.h */

#ifndef IDRIS

#define bin_open(s, m)      open(s, m)

#endif

#define bin_close(f)        close(f)

#define bin_lseek(f, o, w)  lseek(f, o, w)

#define bin_read(f, b, n)   read(f, b, n)

#define bin_write(f, b, n)  write(f, b, n)

#endif

The header bin_io.h includes another header via

#include "fcntl.h"

Many systems already have this header in one of the standard places for headers. If yours does not, you will need to create one that is compatible with the open options on your system, using this model:

fcntl.h:

/* fcntl.h - definitions for binary open

* Compatible with UNIX Sys V, ...

*/

#ifndef FCNTL_H

#define FCNTL_H

#define O_RDONLY  0      /* delete or change to conform to local */

#define O_WRONLY  1      /* delete or change to conform to local */

#define O_RDWR    2      /* delete or change to conform to local */

#define O_NDELAY  4      /* NOT USED BY bin_io FUNCTIONS */

#define O_APPEND  8      /* delete or change to conform to local */

#define O_CREAT   0x100  /* delete or change to conform to local */

#define O_TRUNC   0x200  /* delete or change to conform to local */

#define O_EXCL    0x400  /* delete or change to conform to local */

#endif

Each of the binary I/O functions sets the global variable errno to a non- zero value if an error is encountered, and then returns an error indication in the returned value. Here are the descriptions of the binary I/O functions:

int bin_open(fname, mode)

char fname[ ];  /* : string */

int mode;       /* as described in fcntl.h */

opens the file specified by the string fname, according to the specified mode (which is OR'ed together from "fcntl.h " options). The returned value is an int file descriptor if successful; otherwise, -l is returned. For the uses of this chapter, the file should be a seekable disk file.

int bin_close(fd)

int fd;      /* : {0:BIN_NFILE-1} */

closes the file associated with file descriptor fd. The returned value is zero if successful, otherwise -l.

int bin_read(fd, s, n)

int fd;      /* : {0:BIN_NFILE-1} */

dats_ptr s;  /* : !NULL */

int n;       /* : {0:INT_MAX} */

reads n bytes into the char array s from the file specified by fd. The returned value is the number of bytes read, or -l for an I/O error, or 0 for end-of-file.

int bin_write(fd, s, n)

int fd;      /* : {0:BIN_NFILE-1} */

data_ptr s;  /* : !NULL */

int n;       /* : {0:INT_MAX} */

writes n bytes from the char array s onto the file specified by fd. The returned value is the number of bytes written; anything less than n indicates an error.

long bin_lseek(fd, n, whence)

int fd;      /* : {0:BIN_NFILE-01} */

long n;

int whence;  /* : {SEEK_SET, SEEK_CUR, SEEK_END} */

seeks n bytes relative to beginning-of-file, current position, or end-of-file, according to this scheme:

SEEK_SET  relative to beginning-of-file  (usually 0)

SEEK_CUR  relative to current position   (usually 1)

SEEK_END  relative to end-of-file        (usually 2)

The returned value is the new file position, relative to beginning-of-file, or -1 if any error occurred. Seeking past end-of-file is not portable, and gives different results on different systems. Note that, unlike the stream file functions, the binary I/O functions allow alternation between reading and writing with no intervening seeks.

Now we need to take note of an important problem about binary I/O on some systems. Sometimes, binary files are padded with nul characters at the end of the file, to make the file conform to a local fixed block-size. If we want to have record files that can grow at the end, the only portable approach is to record the file size at the start of the file. To keep this detail out of the application program, and to make access to fixed-size records easier, we can make up a second level of functions, the "record I/O" functions.

The header rec_io.h defines a rec_fd as an int. (In early printings of this book, it was a short integer, but int gives more consistent calling sequences for these functions.) The following functions are provided for record I/O:

rec_fd rec_open(fname, mode, recsize)

char fname[ ];  /* : string */

int mode;       /* as specified in fcntl.h */

int recsize;    /* : {0:SHORT_MAX} */

opens the file named by fname for mode (as in fcntl.h), with the record size specified by recsize. Note that the record size is limited to SHORT_MAX, for maximum portability of the functions.

int rec_close(fd)

rec_fd fd;      /* : {0:REC_NFILE-1} */

closes the file specified by fd. The return values are 0 for success and -l for errors.

bool rec_get(fd, rec, recno)

rec_fd fd;     /* : {0:REC_NFILE-1} */

data_ptr rec;  /* : !NULL */

long recno;    /* : {REC_NEXT,0:rec_nrecs(fd)-1} */

reads from fd into rec the record whose record number is recno. If recno is specified to be REC_NEXT, the next sequential record is read. The return is YES for success and NO for failure. More detailed (but system-dependent) error information is available in errno.

bool rec_put(fd, rec, recno)

rec_fd fd;     /* : {0:REC_NFILE-1} */

data_ptr rec;  /* : !NULL */

long recno;    /* : {REC_W_END,REC_NEXT,0:rec_nrecs(fd)} */

writes onto file fd the contents of rec in the position specified by recno. If recno is specified as REC_NEXT, the record is written in the current file position; if it is specified as REC_W_END, the record is appended at the end of the file, thus extending the file by one record. The return is YES if success, NO if failure.

long rec_nrecs(fd)

rec_fd fd;  /* : {0:REC_NFILE-1} */

returns the current number of records in file fd.

int rec_recsize(fd)

rec_fd fd;  /* : {0:REC_NFILE-1} */

returns the record size associated with file fd.

Here are the listings of the header and the functions in the rec_io package.

rec_io.h:

/* rec_io.h - header for record file I/O functions */

#ifndef REC_IO_H

#define REC_IO_H

#include "local.h"

#include "bin_io.h"

#define REC_NFILE      BIN_NFILE  /*  adjust to local system */

#define REC_NEXT       (-1L)

#define REC_W_END      (-2L)

#define REC_NOT_FOUND  (-3L)  /* codes for hashed file access */

#define REC_FULL       (-4L)  /* codes for hashed file access */

#define REC_ERROR      (-5L)  /* codes for hashed file access */

#define REC_AVAIL      'a'    /* codes for hashed file access */

#define REC_OCCUP      'o'    /* codes for hashed file access */

#define REC_DELET      'd'    /* codes for hashed file access */

#define REC_FILE struct rec_file

REC_FILE

short _byte0;    /* offset of the first data byte in file */

short _recsize;  /* size (in bytes) of each record */

long_nrecs;      /* size (in records) of current file */

bits_status;     /* open-modes of file */

};

#define REC_BYTE0 BUFSIZ

#define rec_nrecs(fd)    (rec_files[fd]._nrecs + 0)

#define rec_recsize(fd)  (rec_files[fd]._recsize + 0)

typedef int rec_fd;

extern int errno;

extern REC_FILE rec_files[REC_NFILE];

int rec_close();     /* PARMS(rec_fd fd) */

rec_fd rec_open();   /* PARMS(char fname[], int type, int recsize) */

bool rec_get();      /* PARMS(rec_fd fd, data_ptr buf, long i) */

bool rec_put();      /* PARMS(rec_fd fd, data_ptr buf, long i) */

long rec_hfind();

/* PARMS(rec fd fd, data_ptr keybuf, data_ptr buf, int (*cmp)(), long (*hash)()) */

long rec_havail();

/* PARMS(rec_fd fd, data_ptr keybuf, data_ptr buf, long (*hash)()) */

#endif

The "record file descriptor" (rec_fd) is used as an index into the array rec_files. The rec_open function gets a file descriptor from bin_open. If the file existed previously, its REC_FILE information is read. Otherwise, the REC_FILE information is written out at the head of the new file.

rec_open:

/* rec_open- open a record-binary file */

#include "rec_io.h"

REC_FILE rec_files[REC_NFILE] = {0};

rec_fd rec_open(fname, type, recsize)

char fname[ ];

int type;

int recsize;

rec_fd fd;

REC_FILE *rfp;          /* pointer to a REC_FILE entry */

int old_errno = errno;  /* save system error indicator */

short i;                /* counter to clear first record */

errno = 0;              /* clear error indicator */

fd = bin_open(fname, type);

if (fd < 0 || fd >= REC_NFILE)  /* validate fd */

return (-1);

rfp = &rec_files[fd];

bin_lseek(fd, (long) 0, SEEK_SET);  /*seek to initial byte of file */

if ((type & O_RWMODE) == O_WRONLY || /* new file: opened write-only, or */

bin_read(fd, rfp, sizeof(*rfp)) != sizeof(*rfp))  /* can't be read */

bin_lseek(fd, (long)0, SEEK_SET);

rfp->_byte0 = REC_BYTE0;

rfp->_recsize = recsize;

rfp->_nrecs = 0;

bin_write(fd, rfp, sizeof(*rfp));

for (i = 1; i -< REC_BYTE0 - sizeof(*rfp); ++i)

bin_write(fd, "\0", 1);

else if (recsize != rfp->_recsize) /* file exists, but bad recsize */

bin_close(fd);

return (-1);

if (errno == 0)

errno = old_errno;    /* restore saved value */

bin_lseek(fd, (long)rfp->_byte0, 0);

rfp->_status = type;  /* save the open-mode of file */

return (fd);

else /* error was returned from some bin_io function */

bin_close(fd);

return (-1);

rec_close:

/* rec_close - close the REC_FILE fd */

#include "rec_io.h"

int rec_close(fd)

rec_fd fd;

REC_FILE *rfp;

int old_errno = errno;

errno = 0;

if (fd < 0 || fd >= REC_NFILE)    /* validate fd */

return (-1);

rfp = &rec_files[fd];

if ((rfp->_status & O_RWMODE) == O_RDONLY)

return (bin_close(fd));        /* if read-only, all done */

bin_lseek(fd, (long)0, SEEK_SET);

bin_write(fd, rfp, sizeof(*rfp));  /* write new REC_FILE info */

bin_close(fd);

if (errno == 0)

errno = old_errno;

return (0);

else

return (-1);

If the file was opened read-only, rec_close needs only to close the file. Otherwise, the rec_close function has to write the REC_FILE information back to the disk. If any errors took place in the low-level I/O functions, rec_close returns -1.

rec_get:

/* rec_get - read one record from record-binary file */

#include "rec_io.h"

bool rec_get(fd, buf, recno)

rec_fd fd;     /* which file */

data_ptr buf;  /* where to put the data */

long recno;    /* {REC_NEXT,0:rec_nrecs(fd)-1} */

REC_FILE *rfp;

short n;  /* size of each record */

if (fd < 0 || fd >= REC_NFILE)  /* validate fd */

return (NO);

rfp = &rec_files[fd];

n = rfp->_recsize;

if (recno != REC_NEXT && recno < 0 || recno >= rfp->_nrecs)

return (NO);

else if (recno == REC_NEXT)

;  /* no seek, ready for next record */

else if (bin_lseek(fd, rfp->_byte0 + recno * n, SEEK_SET) < 0)

return (NO);

return (bin_read(fd, buf, n) == n);

Aside from the necessary validation of fd and recno, all that rec_get needs to do is seek to the proper byte and read the record. If either the seek or the read fails, rec_get returns NO.

rec_put:

/* rec_put - write one record to record-binary file

*/

#include "rec_io.h"

bool rec_put(fd, buf, recno)

rec_fd fd;

data_ptr buf;  /* where to get the data */

long recno;  /* {REC_W_END,REC_NEXT,0:rec_nrecs(fd)} */

REC_FILE *rfp;

short n;  /* size of each record */

if (fd < 0 || fd >= REC_NFILE)

return (NO);

rfp = &rec_files[fd];

n = rfp->_recsize;

if (recno != REC_NEXT && recno != REC_W_END && recno < 0 ||

recno > rfp->_nrecs)

return (NO);

else if (recno == REC_W_END || recno == rfp->_nrecs)

recno = rfp->_nrecs;  /* block will be added at end */

++rfp->_nrecs;  /* extend the file size */

if (bin_lseek(fd, rfp->_byte0 + recno * n, SEEK_SET) < 0)

return (NO);

return (bin_write(fd, buf, n) == n);

The parameter recno can be a proper record number, or the coded value REC_W_END (requesting to write onto the end of the file), or the coded value REC_NEXT (requesting to write the next sequential record). After validating fd and recno, rec_put seeks to the appropriate record and writes the new record. The return is YES if success, NO if failure.

The following demonstration program (rec_main) writes a record file consisting of ten records, each containing one long integer. Then it reads the entire file sequentially and prints each number to the standard output. Next it opens the file for reading and writing, and writes new data into the records, using direct access by record numbers. Finally it reads the file with direct access by record numbers and prints each number to the standard output.

rec_main.c:

/* rec_main - simple test harness for rec_io */

#include "local.h"

#include "rec_io.h"

main( )

long lnum;

long rec_no;

rec_fd rfd;

rfd = rec_open("lnum.dat", O_WRONLY | O_CREAT | O_TRUNC, sizeof(lnum));

if (rfd < 0)

error("can't open (output)", "lnum.dat");

for (lnum = 0; lnum <= 9; ++lnum)

if (!rec_get(rfd, (data_ptr)&lnum, REC_W_END))

error("rec_put (END) error", "");

rec_close(rfd);

rfd = rec_open("lnum.dat", O_RDONLY, sizeof(lnum));

if (rfd < 0)

error("can't open (input)", "lnum.dat");

while (rec_get(rfd, (data_ptr)&lnum, REC_NEXT))

printf("%4ld\n", lnum);

rec_close(rfd);

rfd = rec_open("lnum.dat", O_RDWR, sizeof(lnum)):

if (rfd < 0)

error("can't open (update-output)", "lnum.dat");

for (lnum = 109; lnum >= 100; --lnum)

if (!rec_put(rfd, (data_ptr)&lnum, lnum - 100))

error("rec_put (direct) error", "");

rec_close(rfd);

rfd = rec_open("lnum.dat", O_RDWR, sizeof(lnum));

if (rfd < 0)

error("can't open (update-input)", "lnum.dat");

for (rec_no = 0; rec_no <= 9; ++rec_no)

if (!rec_get(rfd, (data_ptr)&lnum, rec_no))

error("rec_get (direct) error", "");

else if (lnum != rec_no + 100)

error("bad data on re-read", "");

else

printf("%4ld %4ld\n", rec_no, lnum);

rec_close(rfd);

exit(SUCCEED);

8.6 A Record File with Hashed Access

The simple scheme of integer-numbered records is convenient only when there is an easy mapping from the data to be stored and an integer record number. Very few databases are keyed with such simple integer keys. One way to use record-numbered files with arbitrary keys is to "hash" each key into an integer:

for (p = part.part_no; *p != '\0'; ++p)

sum += *p;

rec_ no = IMOD(sum, rec_nrecs(fd));

Here the bytes of a "part-number" string are simply being added together to form a "hash-sum." Taking the remainder upon division by the number of records in the file produces a record-number specifying the place to start looking for this particular record.

Each record on the file contains an extra byte to indicate its status. The possible statuses are "available," "occupied," or "deleted."

When creating a hashed file for the first time, all the records must be marked as "available." The program crhash ("create hash file") will create a file of a specified number of records of specified size.

crhash.c:

/* crhash -- create and initialize an empty hash file

* used to hold part database records

*/

#include "local.h"

#include "part.h"

#include "rec_io.h"

main(ac, av)

int ac;

char *av[];

rec_fd fd;   /* record file descriptor */

short size;  /* size of one record in created file */

long nrecs;  /* maximum numbers of records in created file */

char *buf;   /* data to init. each record */

long i;      /* counter to write records in created file */

if (ac != 4)

error("usage: crhash filename nrecs recsize", "");

nrecs = atol(av[2]);

size = atoi(av[3]);

fd = rec_open(av[1], O_RDONLY, size);

if (fd > 0)

rec_close(fd);

error("file already exists:", av[1]);

fd = rec_open(av[1], O_RDWR|O_CREAT, size);

if (fd < 0)

error("can't create file", av[1]);

buf = (char *)calloc(size, 1);

if (buf ==NULL)

error("Need more heap space to allow size =", av[3]);

*buf = REC_AVAIL;

/* file was created ok, initialize file */

for (i = 0; i < nrecs; ++i)

if (!rec_put(fd, buf, REC_W_END))

error("I/O error writing new file", "");

rec_close(fd);

exit (SUCCEED );

The function rec_hfind starts reading at a specified record-number. If the record contains the desired key, the search is finished. Otherwise, successive records are read (wrapping from end-of-file to beginning-of-file if necessary) until the desired record is found or an unoccupied record space is found. rec_hfind needs two pointer-to-function arguments: one to specify the key-comparison function, and one to specify the hashing function. The keybuf argument points to a record that contains the key being searched for. The buf argument points to the storage into which the record will be read. Here is the rec_hfind function:

rec_hfind:

/* rec_hfind.c -- find a record in a hashed file */

#include "rec_io.h"

long rec_hfind(fd, keybuf, buf, cmpfn, hashfn)

rec_fd fd;         /* hashed file */

data_ptr keybuf;   /* record key to find */

data_ptr buf;      /* buffer for found record */

int (*cmpfn)();    /* record key comparison function */

long (*hashfn)();  /* hash lookup function */

long nrecs;        /* number of records in file */

long ntry;

long i;

nrecs = rec_nrecs(fd);

i = (*hashfn)(keybuf);

for (ntry = 0; ntry < nrecs; ++ntry)

if (!rec_get(fd, buf, i))

return (REC_ERROR);          /* i/o error during rec_get */

if (*(char *)buf == REC AVAIL)  /* examine first byte of buf */

return (REC_NOT_FOUND);

if (*(char *)buf == REC_OCCUP)

if ((*cmpfn)(keybuf, buf) == 0)

return (i);

i = IMOD(i + 1, nrecs);

return (REC_FULL);

When a new record is to be added to the file, a different function (rec_havail) must be used to search for an available space to store the record. This function requires that the new record's key not be present in the file already, so rec_hfind must have been called first. The rec_hfind function skips over deleted records, because the record it is searching for might still be found further down in the file. On the other hand, rec_havail can stop when it sees the deleted record (or an "available" record), because a new record could be added here.

rec_havail:

/* rec_havail.c -- find where a record belongs in a hashed file */

#include "rec_io.h"

long rec_havail(fd, keybuf, buf, hashfn)

rec_fd fd;         /* hashed file */

data_ptr keybuf;   /* record key to find */

data_ptr buf;      /* buffer for found record */

long (*hashfn)();  /* hash lookup function */

long nrecs;        /* number of records in file */

long ntry;

long i;

nrecs = rec_nrecs(fd);

i = (*hashfn)(keybuf);

for (ntry = 0; ntry < nrecs; ++ntry)

if (!rec_get(fd, buf, i))

return (REC_ERROR);

if (*(char *)buf == REC_AVAIL || *(char *)buf == REC_DELET)

return (i);

i = IMOD(i + 1, nrecs);

return (REC_FULL);

Now we have all we need to implement a simple database such as the "parts" file. Each of the "part database" functions now makes use of the "hash file" functions to find, add, change, and delete records.

part_hash.c:

/* part_hash.c -- hash file database access to parts */

#include "rec_io.h"

#include "part.h"

#define HPART struct hash_part

HPART

char hcode;

PART part;

};

static HPART hpart = {0};

static HPART kpart = {0};

static rec_fd part_fd = 0;

/* db_cmp_part - compare function for database access */

static int db_cmp_part(p1, p2)

HPART *pl, *p2;

return (strcmp(p1->part.part_no, p2->part.part_no));

/* db_hash_part - hash the part_no field */

static long db_hash_part(p1)

HPART *p1;

char *p;

long sum = 0;

for (p = p1->part.part_no; *p != '\0'; ++p)

sum += *p;

return (IMOD(sum, rec_nrecs(part_fd)));

/* db_open_part - open the parts database */

void db_open_part()

part_fd = rec_open("part.dat", O_RDWR, sizeof(HPART));

if (part_fd < 0)

error("can't open ", "part.dat");

/* db_close_part - close the parts database */

void db_close_part()

if (rec_close(part_fd) < 0)

error("can't close ", "part.dat");

/* db_add_part - add a part to the database */

bool db_add_part(partp)

PART *partp;

long i;

STRUCTASST(hpart.part, *partp);

hpart.hcode = REC_OCCUP;

i = rec_havail(part_fd, &hpart, &kpart, db_hash_part);

if (i == REC_FULL)

remark("Part storage is full", "");

return (NO);

else if (i == REC_ERROR || !rec_put(part_fd, &hpart, i))

remark("I/O error", "");

return (NO);

return (YES);

/* db_chg_part - change a part record on database */

bool db_chg part(partp)

PART *partp;

long this_rec;

STRUCTASST(kpart.part, *partp);

this_rec = rec_hfind(part_fd, &kpart, &hpart, db_cmp_part, db_hash_part);

kpart.hcode = REC_OCCUP;

return (rec_put(part_fd, &kpart, this_rec));

/* db_del_part - delete a part record on database */

bool db_del_part(partp)

PART *partp;

long this_rec;

STRUCTASST(kpart.part, *partp);

this_rec = rec_hfind(part_fd, &kpart, &hpart, db_cmp_part, db_hash_part);

kpart.hcode = REC_DELET;

return (rec_put(part_fd, &kpart, this_rec));

/* db_find_part - find a part on the database */

bool db_find_part(partp)

PART *partp;

long i;

STRUCTASST(kpart.part, *partp);

this_rec = rec_hfind(fd, &kpart, &hpart, db_cmp_part, db_hash_part);

if (this_rec >= 0)

STRUCTASST(*partp, hpart.part);

return (YES);

else if (this_rec == REC_ERROR)

remark("I/O error", "");

return (NO);

else

return (NO);

This simple database concludes our discussion of file handling. The "hash file" is adequate for very simple applications, with only one key field and a generous amount of disk available for the record storage. Its handling of deleted records is rudimentary. It lacks the ability to find records sequentially. For more sophisticated requirements, there are now numerous database and file management packages available for use with C.

My main purpose throughout the book has been to introduce you to the use of C for programming reliable data structures. Wherever possible, I have tried to show examples which may also be useful in their own right, but ultimately the techniques are the most important thing. You now have all the tools of C language at your disposal, and the necessary concepts to use them reliably.

Go to Appendix Return to Table of Contents