10.5: Sorting Tape Files

Once again, one picture?/FONT>or in this case, a good example?/FONT>is worth a thousand words. The different merging techniques demonstrated for the following example illustrate some of the complexities of sorting files stored on magnetic tape, when all records of the file cannot fit into internal memory at the same time. Tape sorts are used here, since they highlight the concepts of sorting, but these concepts are also applicable to disk sorts. Detailed analysis of the sorting algorithms discussed here, and others, may be found in Knuth [1973b]. We begin with an extreme case.

Example 10.6

Sort an original file of twenty-one records that are stored sequentially on a tape file. The records have integer key values. Assume that only two records can be kept in internal memory at any one time. Storage for the two records is in addition to input and output buffers. n

Suppose the sequence of records in the original file is as follows:

2 12 17 16 14 30 17 2 50 65 20 32 48 58 16 20 15 10 30 45 16

It is not possible simply to read all the records into internal memory and then apply one of the internal sort algorithms of Chapter 8, since available internal storage is assumed to be only enough to hold two records. Instead, the general strategy, given that two sorted files can always be merged as shown in Section 10.3, is to create sorted subfiles repeatedly and merge them repeatedly until a final merged file containing all the original records is produced. The literature contains many ingenious algorithms for external sorting. This chapter gives only the flavor of the solutions. One major constraint on a solution is the number of tapes available for use in the sort. Assume three available tapes.

10.5.1 Straight Merge

10.5.2 Natural Merge

10.5.3 Replacement-Selection

10.5.4 Polyphase Sort