10.5.2 Natural Merge

The straight merge sort does not take advantage of any natural ordering that may exist in the original file. Even if that file were given in sorted order, the straight merge would proceed as before. A natural merge sort distributes the records of the original file to a and b, so that sorted subfiles or runs are preserved and treated as a unit. Such a distribution would assign 2 12 17 to a, since they form a run. The 16 terminates this run and goes to b. The 14 terminates this run and goes to a as the beginning of its second run, followed by 30. So far the result is

a  2 12 17|14 30

b 16

The 17 now terminates the 14 30 run and goes to b. Note that this extends the first run of b to 16 17. At this point, 2 terminates this run with the following result.

a  2 12 17|14 30

b 16 17

If balance is to be maintained, each file must end up with a number of runs differing by no more than 1 from the other files. Therefore the next run (starting with 2) must go to b. Subtleties of this kind must be carefully monitored in the actual implementation of merge sorting algorithms. The resultant distribution will be the initial distribution.

a  2 12 17|14 30   |20 32 48 58|15 16

b 16 17   | 2 50 65|16 20      |10 30 45

Merging these pairs of runs and writing the merged results to c yields the first merge.

c 2 12 16 17 17|2 14 30 50 65|16 20 20 32 48 58|10 15 16 30 45

Another distribution and merge gives the second distribution,

a 2 12 16 17 17|16 20 20 32 48 48

b 2 14 30 50 65|10 15 16 30 45

which is followed by the second merge.

c 2 2 12 14 16 17 17 30 50 65|10 15 16 16 20 20 30 32 45 48 58

A third and final distribution and merge complete the sort. The third distribution yields

a  2  2 12 14 16 17 17 30 50 65

b 10 15 16 16 20 20 30 32 45 48 58

The result of the third merge is as follows:

c 2 2 10 12 14 15 16 16 16 17 17 20 20 30 30 32 45 48 50 58 65

This is a three-pass, two-phase sort, using three tapes. The total time is 3 ?/FONT> (2 ?/FONT> 21 ?/FONT> Ta + 3 x Tr). Any inherent ordering of the initial data allows the number of passes to be reduced, as compared to the straight merge sort.

Note that to merge runs embedded in a file, the procedure of Section 10.3.1 requires modification.