Next Chapter Return to Table of Contents Previous Chapter

SECTION T: PRIORITY QUEUE EXTENSION OF MERGE RUNS

The merge operations of section S work within memory, which may not be large enough to hold a file of interest. A common way to handle large files is to write them in segments of sorted values, called runs. Pairs of runs are merged to form larger runs--the essential idea of MergeSort. The access to external storage is expensive, however, and so larger runs and fewer phases increase efficiency.

One way to form the initial runs for a file is to simply make one pass through the file, writing values found to be in order onto a run file and the others onto a reject file for a second pass. Consider the integer file:

2 1 6 3 8 9 4 4 10 5 2 8   

The first pass would produce the following run and reject:

run     2 6 8 9 10   

reject  1 3 4 4 5 2 8   

The straightforward version of this scheme, acting on a source file, whose termination can be recognized by the flag EOF(Source), is:

Read(Source, Cutoff)

while NOT EOF(Source) do

Read(Source,x)

if x  Cutoff

then Write(RunFile,x)

Cutoff  x

else Write(Reject,x)

endif

endwhile

Unfortunately, if the largest item is first, the run will be only one item long. A second approach is to input M items, sort them internally, and write them out as a run. It is this value M that can be stretched with the help of a priority queue, specifically a heap, although other forms of priority queues would serve.

Suppose the items from Source are tagged with a run number R, and placed in a heap, PQ, with room for M elements. PQ is taken to be an array of records of the form:

RECORD

Run : INTEGER

Key : {comparable type

END

The heap is maintained in lexicographic order:

PQ[i] < PQ[j] iff:

PQ[i].Run < PQ[j].Run, or

PQ[i].Run = PQ[j].Run and PQ[i].Key < PQ[j].Key

For consistency with section S, the highest priority in the heap is the smallest value. The heap is formed with:

procedure BuildPQ

for i = 1 to M do

Read(Source,x)

PQ[i].Run  1

PQ[i].Key  x

UpHeap(i)

next i

HN  m

end {BuildPQ

The stretched run is then created with:

procedure PQruns

BuildPQ

R  1

while NOT Empty(PQ) do

Top  PQ[1]

if Top.Run = R

then Write(RunFile(R),Top.Key)

Swap(PQ[1],PQ[HN])

HN  HN - 1

DownHeap(1)

else R  R + 1

endif

if NOT EOF(Source)

then Read(Source,x)

if x  Top.Key

then xRun  R

else xRun  R + 1

endif

HN  HN + 1

PQ[HN].Run  xRun

PQ[HN].Key  x

UpHeap(HN)

endif

endwhile

end {PQruns

It is possible to derive additional enhancement by replacing R R + 1 by a procedure DumpPQ that stores the queue on external storage so that the pass through the source file can continue. The dumps are retrieved for the next run.

Program

Program

Program for trying it yourself

PGT.1 Implement PQruns.

Go to Section U Return to Table of Contents