# SECTION S: FILE MERGING AND MERGESORT

The StraightMerge procedure of section 9.2.1 has two variations of interest, the merging of linked lists and binary merging. The discussion of these forms of merging in this section is based on the assumption that the number of items to be merged is large, but will fit into memory. (Lists too large to fit into memory are discussed in section T.) For large lists, it is worthwhile to spend some effort on preparation before plunging into the heart of an algorithm. In both cases, it is assumed for the sake of simplicity that key values are numerical and that the lists to be merged are sorted in nondecreasing order.

## S.1 A Linked List Merge

The two lists a and b to be merged are accessed by records containing Head and Tail pointers. The merging operation is a function that produces a similar list c.

#### Figure S.2

Preparation for the merge is done by:

procedure Setup                                  {O(1)

if a.Tail  .Key  b.Tail  .Key

then Sentinel  a.Tail  .Key + 1

else Sentinel  b.Tail  .Key + 1

endif

NEW(Pivot)

Pivot  .Key  Sentinel

c.Tail  Pivot

a.Tail  Pivot

b.Tail  Pivot

end {Setup

After Setup has been executed, the leading node of smallest value in either list can be shifted by:

endif

c.Tail Temp

With these tools, the merge is:

function ListMerge(a,b)                          {O(n)

Setup

endwhile

return c

end {ListMerge

In practice, GetLeader would not be a called procedure; it is a shorthand reference to code that is incorporated at the point mentioned.

## S.2 Binary Merge

Sometimes one list (say b) is much smaller than the other (a) with which it is to be merged. Suppose the lists are stored in arrays a[1 . . N] and b[1 . . M], sorted in nondecreasing order, and they are to be merged into an array c[1 . . NM], where NM = N + M.

The general idea of binary merge is that since N is much larger than M, entire segments of a tend to fall between values of b. The search for the possible position of a b-value is localized to a small segment of a that is searched by the method of binary search to determine an interval of a that can be moved to c. It is convenient to extend a with a sentinel value in a[0] and then to load c from right to left.

A value b[i] is compared with a segment of a of length Size: a[Left .. Right]. If b[i] < a[Left], then the entire segment of a can be moved to c. If not, the segment is searched with a tailored version of binary search for the largest index s such that: a[s] < b[i]. The subsequence b[i],a[s + 1] ,a[s + 2] ,...,a[Right] can then be moved into c. In either case, a new segment of a is chosen for testing against the (possibly) new value b[i].

The move operation is:

procedure Move(x,y)                              {O(y - x)

for z = y downto x do

c[k]  a[z]

k  k - 1

next z

end {Move

An effective search segment size is one less than the largest power of 2 less than (N/M). It can be determined by Size Power(N/M) - 1, using:

function Power(x)                                {O(ln x)

PX  1

while PX  x do

PX  PX X 2

endwhile

return (PX DIV 2)

end {Power

The initialization procedure is:

procedure Setup                                  {O(1)

if a[1] < b[1]

then Sentinel  a[1] - 1

else Sentinel  b[1] - 1

endif

a[0]  Sentinel

Size  Power(N / M) - 1

i  M

k  N X M

end {Setup

The merge operation is then:

procedure BinaryMerge                            {O(n 1n n)

Setup

while i  1 do

if b[i] < a[Left]

then Move(Left,Right)

Right  Left - 1

else s  BinSearch(b[i],Left,Right)

Move(s + 1,Right)

Right  s

c[k]  b[i]

k  k - 1

i  i - 1

endif

Left  Right - Size

if Left < 0 then Left  0 endif

endwhile

end {BinaryMerge

## S.3 Merge Sorting

The simplest form of sorting routine based on merging sorts an array, a, with the help of an array, b, of the same size. Segments in one array are merged into the other. The segments are initially of length one and are increased in size from one pass of the routine to another until one of them contains the entire file. It is convenient to make passes back and forth between the arrays. The merging procedures of the previous sections do not work unmodified, partly because sentinel records would need to be placed in the middle of a file.

A utility routine used in the merge that copies a segment of one array into another is:

procedure Copy, (a,x,y,b,z)                      {O(y - x)

for dex = x to y do

b[z]  a[dex]

z  z +1

next dex

end {Copy

With the help of Copy, the basic merge operation is:

procedure TailMerge(a,L,M,R,b)                   {O(R - L)

i  L

k  L

j  M + 1

while i  M AND j  R do

if a[i]  a[j]

then b[k]  a[i]

i  i + 1

else b[k]  a[j]

j  j + 1

endif

k  k + 1

endwhile

if i > M

then Copy(a,j,R,b,k)

else Copy(a,i,M,b,k)

endif

end {TailMerge

Now one phase (pass) is of the form:

procedure Phase(a,b,Size)                        {O(N)

s  1

while s  (N - 2 * Size - 1) do

TailMerge(a,s,s + Size - 1,s + 2 * Size - 1,b)

s  s + 2 * Size

endwhile

if (s + Size - 1) < N

then TailMerge(a,s,s + Size - 1,N,b)

else Copy(a,s,N,b,s)

endif

end {Phase

The sort then becomes a repetition of phases:

procedure MergeSort                              {O(N 1n N)

Size  1

while Size < N do

Phase(a,b,Size)

Size  2 * Size

Phase(b,a,Size)

Size  2 * Size

endwhile

end {MergeSort

## Programs

Programs for trying it yourself