10.5.4 Polyphase Sort

Rather than using more tapes, it is possible to incorporate the distribution phase (except for the initial distribution) into the merge phase. Such a sort is called a polyphase sort. In this type of sort, the tapes being merged, and the tape to which the merged subfiles are written, vary continuously throughout the sort. In this technique, the concept of a pass through records is not as clear-cut as in the straight or the natural merge. A distribution, a merge, and a pass all blend together. This might have more properly been called an amorphous sort, except that it has considerable character and is one of the better sorts.

The polyphase sort starts with an initial distribution of runs on tapes a and b. The initial distribution is critical to its proper execution. For the case of the original file of twenty-one records in Example 10.6, the initial distribution should be thirteen runs to tape a, and 8 runs to tape b, each of length 1. It is the number, not the length, of the runs on each tape that is critical. The number of records for the example is twenty-one because the first few Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21. The Fibonacci numbers are crucial to the polyphase sort because the number of records must be a Fibonacci number to make the polyphase sort work.

When a three-tape polyphase sort is used, the original number of runs should be a Fibonacci number, with the distribution of runs between tapes a and b the two preceding Fibonacci numbers, which add to the original number of runs. In the present case, 8 + 13 = 21. When more tapes are used, generalized Fibonacci numbers, which specify the required number of records, become the basis for the initial run distribution among the tapes. No programs will be shown for the implementation of the details of the external sorting algorithms discussed in this section. Therefore it is unnecessary to resolve in detail the question of how to proceed when the actual number of runs is not a Fibonacci number. The remedy, though, involves adding an appropriate number of "dummy" runs [Knuth, 1973b] .

Getting back to the workings of the polyphase sort, the initial situation after distribution will be an initial distribution.

a   2?/FONT>12?/FONT>17?/FONT>16?/FONT>14?/FONT>30?/FONT>17?/FONT>  2?/FONT>50?/FONT>65?/FONT>20?/FONT>32?/FONT>48  13 runs of length 1

b  58?/FONT>16?/FONT>20?/FONT>15?/FONT>10?/FONT>30?/FONT>45?/FONT> 16?/FONT>                 8 runs of length 1

c

This distribution required twenty-one record accesses.

We now proceed by merging the first eight runs of a and the eight runs of b to c to obtain the merge of the first eight runs of a and b to c.

a  50?/FONT>65?/FONT>20?/FONT>32?/FONT>48                                 5 runs

b

c   2 58?/FONT>12 16?/FONT>17 20?/FONT>15 16?/FONT>10 14?/FONT>30 30?/FONT>17 45?/FONT>2 16  8 runs

Notice that b is now empty, a total of 8 + 8 = 16 records have been accessed, and a and c contain 5 + 8 = 13 runs. All three tapes had to be rewound before the initial distribution, but only b need be rewound now. It is possible to continue merging the first five runs of c and the five runs of a to b. This results in a merge of the first five runs of a and c to b.

a

b  2 50 58?/FONT>12 16 65?/FONT>17 20 20?/FONT>15 16 32?/FONT>10 14 48  5 runs

c    30 30?/FONT>17 45   ?/FONT> 2 16   ?/FONT>                   3 runs

Notice that a is empty, a total of 5 + 5 ?/FONT> 2 = 15 records have been accessed, and b and c contain 5 + 3 = 8 runs. Tape a is now rewound, and the first three runs of b are merged with the three runs of c. The result is the merge of the first three runs of b and c to a.

a  2 30 50 58?/FONT>12 16 17 45 65?/FONT>2 16 17 20 20  3 runs

b 15 16 32?/FONT>10 14 48                         2 runs

c

Tape c is now empty and must be rewound. A total of (3 ?/FONT> 3) + (3 ?/FONT> 2) = 15 records were accessed, and a total of 2 + 3 = 5 runs remain. Merging the first two runs of a and b to c yields

a  2 16 17 20 20                                   1 run

b

c  2 15 16 30 30 32 50 58?/FONT>10 12 14 16 17 45 48 65  2 runs

This leaves b empty, so we rewind it. A total of (2 ?/FONT> 3) + (2 ?/FONT> 5) = 16 records were accessed, and 1 + 2 = 3 runs remain. Merging the run of a and the first run of c to b yields.

a

b  2 2 15 16 16 17 20 20 30 30 32 50 58  1 run

c 10 12 14 16 17 45 48 65                1 run

A total of (1 ?/FONT> 5) + (1 ?/FONT> 8) = 13 record accesses occurred, and 1 + 1 = 2 runs are left. Tape a is rewound. Then a final merge produces

a  2 2 10 12 14 15 16 16 16 17 17 20 20 30 30 32 45 48 50 58 65

b

c

This final merge of b and c to a took 13 + 8 = 21 record accesses. A total of 117 ( = 21 + 16 + 15 + 15 + 16 + 13 + 21) record accesses is required. These 117 record accesses represent an effective 117/21, or 5 4/7, passes. The total time for this algorithm is 117 x Ta + 8 x Tr. The straight merge sort and the natural merge sort required 210 ?/FONT> Ta + 15 ?/FONT> Tr and 126 ?/FONT> Ta + 9 ?/FONT> Tr time, respectively.

The polyphase sort, until completion, results in exactly one empty tape, so that the remaining tapes may be partially merged to it. After each merge, the distribution of runs is made up of two consecutive Fibonacci numbers (13 + 8, 8 + 5, 5 + 3, 3 + 2, 2 + 1, 1 + 1, 1 + 0) . With more tapes , this is also true , but generalized Fibonacci numbers appear.