10.5.3 Replacement-Selection

Since there is typically room for a reasonable number of records in internal memory, both the straight and natural merge sorts can be speeded up. For instance, if m records can be sorted in internal memory, the straight merge sort can start with the distribution of runs of length m on each of the tapes a and b.

There is a clever way to produce runs for distribution (for example, with the natural merge sort) without actually storing the m records. This is the replacement-selection sort, which uses a heap of size m and works as follows. Start with the first m keys of the original file of Example 10.6 in internal memory and create a heap to obtain Figure 10.6(a) (m = 7).

Remove the root key from the heap and output it to an output file. The next input file key, 2, is inserted at the root of the heap, and the heap is reheaped, giving Figure 10.6(b).

Figure 10.6 Replacement-Selection Sort

Again, the root key is removed and written to the output file. The next record key of the input file, 50, is inserted at the root and reheaping occurs. Figure10.6(c) results.

Continuing in this way generates a run on the output file of length 8, which would be terminated when 16 from the input file is inserted at the root of the heap. The situation at that point is shown in Figure 10.6(d). Instead of terminating the run at length 8, treat 16 as associated with the next run, since 16 is less than the last output value, 20. This means that 16 must be distinguished from the heap entries for the current run; it is shaded in Figure 10.6(d). In reheaping, current run entries are considered smaller than next run entries. Figure 10.6(e) shows the heap after reheaping. Each input smaller than the last output is handled in this way. Eventually the heap becomes filled with entries of the next run, and the process repeats.

Notice that the length of the output run will always be at least m (with the possible exception of the last run generated). Continuing this process, called replacement-selection, we obtain the output file,

2 2 12 14 16 17 17 20 30 32 48 50 58 65|10 15 16 16 20 30 45

The original file consisted of ten runs; the output file after replacement-selection consists of two runs. With randomly ordered input files, this replacement-selection generates runs whose average length is 2m. The example exhibits this behavior in an exemplary fashion.

Notice that both the straight and natural merges spend half their time accessing records for the distribution phase. Spending this time can be avoided by using another tape. By this technique, each time two runs are merged, they can be written, alternately, to tapes a and b. This eliminates the need for extra, basically unproductive, time spent in distribution from tape c.