A file is a collection of records, each record having one or more fields The fields used to distinguish among the records are known as *keys*. Since the same file may be used for several different applications, the key fields for record identification will depend on the particular application. For instance, we may regard a telephone directory as a file, each record having three fields: name, address, and phone number. The key is usually the person's name. However, one may wish to locate the record corresponding to a given number, in which case the phone number field would be the key. In yet another application one may desire the phone number at a particular address, so this field too could be the key. Once we have a collection of records there are at least two ways in which to store them: sequentially or non-sequentially. For the time being let us assume we have a sequential file *F* and we wish to retrieve a record with a certain key value *K*. If *F* has *n *records with *K _{i}* the key value for record

procedureSEQSRCH(F,n,i,K)

//Search a fileFwith key valuesK_{1}, ...,Kfor a record_{n}Rsuch_{i}

thatK=_{i}K. If there is no such record,iis set to 0//

K_{0}K;in

whileK_{i}Kdo

ii- 1

end

endSEQSRCH

Note that the introduction of the dummy record *R*_{0} with key *K*_{0} = *K* simplifies the search by eliminating the need for an end of file test (*i* < 1) in the **while** loop. While this might appear to be a minor improvement, it actually reduces the running time by 50% for large *n* (see table 7.1). If no record in the file has key value *K*, then *i* = 0, and the above algorithm requires *n* + 1 comparisons. The number of key comparisons made in case of a successful search, depends on the position of the key in the file. If all keys are distinct and key *K _{i}* is being searched for, then

One of the better known methods for searching an ordered sequential file is called binary search. In this method, the search begins by examining the record in the middle of the file rather than the one at one of the ends as in sequential search. Let us assume that the file being searched is ordered by nondecreasing values of the key (i.e., in alphabetical order for strings) Then, based on the results of the comparison with the middle key, *K _{m}*, one can draw one of the following conclusions:

(ii) if *K* = *K _{m}* then the middle record is the one being searched for;

procedureBINSRCH(F,n,i,K)

//Search an ordered sequential fileFwith recordsR_{1}, ...,Rand_{n}

the keysK_{1}K_{2}Kfor a record_{n}Rsuch that_{i}K=_{i}K;

i= 0 if there is no such record elseK=_{i}K. Throughout the

algorithm,lis the smallest index such thatKmay be_{l}Kandu

the largest index such thatKmay be_{u}K//

l1;un

whileludo

m(l+u)/2 //compute index of middle record//

case

:K>K:_{m}lm+ 1 //look in upper half//

:K=K:_{m}im;return

:K<K:_{m}um- 1 //look in lower half//

end

end

i0 //no record with keyK//

endBINSRCH

It is possible to consider other criteria than equal splitting for dividing the remainig file. An alternate method is Fibonacci search, which splits the subfile according to the Fibonacci sequence,

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

which is defined as *F*_{0} = 0, *F*_{1} = 1 and

*F _{i}* =

(ii) *K* = *K _{Fk-}*1 in which case the search terminates successfully;

The following algorithm implements this Fibonacci splitting idea.

procedureFlBSRCH(G,n,i,K)

//search a sequential fileGwith keys ordered in nondecreasing order,

for a recordRwith key_{i}K=_{i}K. Assume thatF+_{k}m=n+ 1,

m 0 andF+ 1 >_{k}n+ 1.nis the number of records in G.F_{k}

andF+1 are consecutive Fibonacci numbers. If_{k}Kis not present,

iis set to zero.//

iF1;_{k-}pF-2;_{k}qF-3_{k}

ifK>K_{i}then[//setiso that size of the right subfile isF2//_{k-}

ii+m]

whilei0do

case

:K<K:_{i}ifq= 0theni0

else[ii; -q;tp;pq;qt-q]

:K=K:_{i}return

:K>K:_{i}ifp= 1theni0

else[ii+q;pp-q;qq-p]

end

end

endFIBSRCH

First let us formally state the problem we are about to consider.

(i) *K*_{s(i)} *K*_{s(i+1)},1 *i* *n* - 1.

(ii) If *i* < *j* and *K _{i}* =

A sorting method generating the permutation * _{s}* will be said to be

To begin with we characterize sorting methods into two broad categories: (i) internal methods, i.e., methods to be used when the file to be sorted is small enough so that the entire sort can be carried out in main memory; and (ii) external methods, i.e., methods to be used on larger files. In this chapter we shall study the following internal sorting methods:

External sorting methods will be studied in Chapter 8.

The basic step in this method is to insert a record *R* into a sequence of ordered records, *R*_{1},*R*_{2}, ...,*R _{i}*, (

procedureINSERT(R,i)

//Insert recordRwith keyKinto the ordered sequenceR, ...,_{o}R_{i}

in such a way that the resulting sequence is also ordered on key

K. We assume thatRis a record (maybe a dummy) such that_{o}

KK//_{o}

ji

whileK<K_{j}do

//moveRone space up as_{j}Ris to be inserted left ofR//_{j}

R+1_{j}R;_{j}jj- 1

end

R+1_{j}R

endINSERT

procedureINSORT(R,n)

//sort the recordsR_{1}, ...,Rin nondecreasing value of the key_{n}K.

Assumen> 1//

K_{o}- //Create a dummy recordR_{o}such thatK_{o}<K,_{i}

1in//

forj2tondo

TR_{j}

callINSERT(T, j- 1) //insert recordsR_{2}toR//_{n}

end

endINSORT

resulting in an overall worst case time of .

**Example 7.1:** Assume *n* = 5 and the input sequence is (5,4,3,2,1) [note the records have only one field which also happens to be the key]. Then, after each insertion we have the following:

-, 5, 4, 3, 2, 1 [initial sequence]

-, 4, 5, 3, 2, 1i= 2

-, 3, 4, 5, 2, 1i= 3

-, 2, 3, 4, 5, 1i =4

-, 1, 2, 3, 4, 5i= 5

Note that this is an example of the worst case behavior.

**Example 7.2:** *n* = 5 and the input sequence is (2, 3, 4, 5, 1). After each execution of INSERT we have:

-, 2, 3, 4, 5, 1 [initial]

-, 2, 3, 4, 5, li= 2

-, 2, 3, 4, 5, 1i= 3

-, 2. 3, 4, 5, 1i= 4

-, 1, 2, 3, 4, 5i= 5

We now turn our attention to a sorting scheme with a very good average behavior. The quicksort scheme developed by C. A. R. Hoare has the best average behavior among all the sorting methods we shall be studying. In Insertion Sort the key *K _{i}* currently controlling the insertion is placed into the right spot with respect to the sorted subfile (

procedureQSORT(m,n)

//sort recordsR, ...,_{m}Rinto nondecreasing order on key_{n}K. Key

Kis arbitrarily chosen as the control key. Pointers_{m}iandjare

used to partition the subfile so that at any timeK_{l}K,l<i

andK_{l}K,l>j. It is assumed thatK_{m}K1//_{n+}

ifm<n

then[im;jn+ 1;KK_{m}

loop

repeatii+ 1untilK_{i}K;

repeatjj- 1untilK_{j}K;

ifi<j

then callINTERCHANGE(R(i),R(j))

else exit

forever

callINTERCHANGE(R(m),R(j))

callQSORT(m,j- 1)

callQSORT(j+ 1,n)]

endQSORT

**Example 7.3:** The input file has 10 records with keys (26, 5, 37, 1, 61, 11, 59, 15, 48, 19). The table below gives the status of the file at each call of QSORT. Square brackets are used to demarcate subfiles yet to be sorted.

RR_{1}R_{2}R_{3}R_{4}R_{5}R_{6}R_{7}R_{8}R_{9}m n_{l0}

[26 5 37 1 61 11 59 15 48 19] 1 10

[11 5 19 1 15] 26 [59 61 48 37] 1 5

[ 1 5] 11 [19 15] 26 [59 61 48 37] 1 2

1 5 11 [19 15] 26 [59 61 48 37] 4 5

1 5 11 15 19 26 [59 61 48 37] 7 10

1 5 11 15 19 26 [48 37] 59 [61] 7 8

1 5 11 15 19 26 37 48 59 [61] 10 10

1 5 11 l5 l9 26 37 48 59 6l

T(n)cn+ 2T(n/2) , for some constantc

cn+ 2(cn/2 + 2T(n/4))

2cn+ 4T(n/4)

:

cnlog_{2}n+nT(1) =O(nlog_{2}n)

**Lemma 7.1:** Let *T*_{avg}(*n*) be the expected time for QSORT to sort a file with *n* records. Then there exists a constant *k* such that *T*_{avg}(*n*) *k* *n* 1og_{e} *n* for *n* 2.

**Induction Base:** For *n* = 2 we have from eq. (7.1):

*T*_{avg}(2) 2*c* + 2*b* k*n *log_{e}2

**Induction Hypothesis:** Assume *T*_{avg}(*n*) *kn* 1og_{e}*n* for 1 *n* < *m*

**Induction Step:** From eq. (7.1) and the induction hypothesis we have:

Since *j* log* _{e} j* is an increasing function of

Both of the sorting methods we have seen have a worst case behavior of *O*(*n*^{2}) It is natural at this point to ask the question: "What is the best computing time for sorting that we can hope for?" The theorem we shall prove shows that if we restrict our question to algorithms for which the only operations permitted on keys are comparisons and interchanges then *O*(*n* log_{2} *n*) is the best possible time.

SAMPLE INPUT KEY VALUES WHICH

LEAF PERMUTATION GIVE THE PERMUTATION

I 1 2 3 (7,9,10)

II 1 3 2 (7,10,9)

III 3 1 2 (9,10,7)

IV 2 1 3 (9,7,10)

V 2 3 1 (10,7,9)

VI 3 2 1 (10,9,7)

**Theorem 7.1:** Any decision tree that sorts *n* distinct elements has a height of at least log_{2}(*n*!)+ 1.

n! =n(n- 1)(n- 2) ... (3)(2)(1)

(n/2)/2,^{n}

so log_{2} *n*! (*n*/2) log_{2} (*n*/2) = *O*(*n* log_{2} *n*).

Before looking at the merge sort algorithm to sort *n* records let us see how one may merge two files (*X _{l}*, ...,

procedureMERGE(X,l,m,n,Z)

//(X, ...,_{l}X) and (_{m}X+1, ...,_{m}X) are two sorted files with keys_{n}

x_{l}_{... x}m_{ and X}m+1_{}...x. They are merged to obtain the_{n}

sorted file (Z, ...,_{l}Z_{n}) such thatz..._{l}z//_{n}

ikl;jm+ 1 //i,jandkare position in the three files//

whileimandjndo

ifxx_{i }_{j }Zthen[_{k}X_{i};ii+ 1]

else[Z_{k}X;_{j}jj+ 1]

kk+ 1

end

ifi>mthen(Z, ...,_{k}Z) (_{n}X, ...,_{j}X)_{n}

else(Z, ...,_{k}Z) (_{n}X...,_{i}X)_{m}

endMERGE

**Example 7.4.1: ** The input file is (26, 5, 77, 1, 61, 11, 59, 15, 15, 48, 19). The tree below illustrates the subfiles being merged at each pass:

procedureMPASS(X,Y,n,l)

//This algorithm performs one pass of merge sort. It merges adjacent

pairs of subfiles of lengthlfrom fileXto fileY.nis the number

of records inX//

i1

whilein- 2l+ 1do

callMERGE(X,i,i+l- 1,i+ 2l- 1,Y)

ii+ 2l

end

//merge remaining file of length <2l//

ifi+l- 1 <nthen callMERGE(X,i,i+l- 1,n,Y)

else(Y, ..._{i}Y) (_{n}X, ...,_{i}X)_{n}

endMPASS

The merge sort algorithm then takes the form:

procedureMSORT(X,n)

//Sort the fileX= (X_{1}, ...,X) into nondecreasing order of the keys_{n}

x_{1}, ...,x//_{n}

declareX(n),Y(n) //Yis an auxilliary array//

//lis the size of subfiles currently being merged//

l1

whilel<ndo

callMPASS(X,Y,n,l)

l2*l

callMPASS(Y,X,n,l) //interchange role ofXandY//

l2*l

end

endMSORT

**Example 7.4.2: ** The input file (26, 5, 77, 1, 61, 11, 59, 15, 49, 19) is to be sorted using the recursive formulation of 2-way merge sort. If the subfile from l to u is currently to be sorted then its two subfiles are indexed from *l* to (*l *+ *u*)/2 and from (*l *+ *u*)/2 + 1 to *u*. The subfile partitioning that takes place is described by the following binary tree. Note that the subfiles being merged are different from those being merged in algorithm MSORT.

procedureRMSORT(X,l,u,P)

//The fileX= (X, ...,_{l}X) is to be sorted on the field KEY. LINK_{u}

is a link field in each record and is initially set to 0. The sorted

file is a chain beginning atP//

ifluthenPl

else[mid (l+u)/2

callRMSORT(X,l,mid,Q)

callRMSORT(X,mid + 1,u,R)

callRMERGE(Q,R,P)]

endRMSORT

procedureRMERGE(X,Y,Z)

//The linked filesXandYare merged to obtainZ. KEY(i) denotes

the key field and LINK(i) the link field of recordi. InX, Yand

Zthe records are linked in order of nondecreasing KEY values.

A dummy record with indexdis made use of.dis not a valid

index inXorY//

iX;jY;zd

whilei0andj0do

ifKEY(i)KEY(j)then[LINK(z)i

zi; iLINK(i)]

else[LINK(z) j

zj; jLINK(j)]

end

//move remainder//

ifi= 0thenLINK(z)j

elseLINK(z)i

ZLINK(d)

endRMERGE

While the Merge Sort scheme discussed in the previous section has a computing time of *O*(*n* log *n*) both in the worst case and as average behavior, it requires additional storage proportional to the number of records in the file being sorted. The sorting method we are about to study will require only a fixed amount of additional storage and at the same time will have as its worst case and average computing time *O*(*n* log *n*). In this method we shall interpret the file to be sorted *R* = (*R*_{1}, ...,*R _{n}*) as a binary tree. (Recall that in the sequential representation of a binary tree discussed in Chapter 5 the parent of the node at location i is at

procedureADJUST(i,n)

//Adjust the binary tree with rootito satisfy the heap property.

The left and right subtrees ofi, i.e., with roots 2iand 2i+ 1, already

satisfy the heap property. The nodes of the trees contain records,

R, with keysK. No node has index greater thann//

RR;_{i}KK;_{i}j2i

whilejndo

ifj<nandK<_{j}K+1_{j}thenjj+ 1 //find max of left

and right child//

//compare max. child withK. IfKis max. then done//

ifKK_{j}then exit

Rj_{}/2R_{}_{j}; j2j//moveR_{j}up the tree//

end

Rj_{}/2R_{ }

endADJUST

The heap sort algorithm may now be stated.

procedureHSORT(R,n)

//The fileR= (R_{1}, ...,R) is sorted into nondecreasing order of_{n}

the keyK//

forin/2to1by-1do//convertRinto a heap//

callADJUST(i,n)

end

forin- 1to1by-1do//sortR//

TR+1;_{i}R+1_{i}R_{1};R_{1}T//interchangeR_{1}andR+1//_{i}

callADJUST(1,i) //recreate heap//

end

endHSORT

**Example 7.5:** The input file is (26, 5, 77, 1, 61, 11, 59, 15, 48, 19). Interpreting this as a binary tree we have the following transformations:

Let us now look at the problem of sorting records on several keys, *K*^{1},*K*^{2}, ..., *K ^{r}* (

and face values: 2 < 3 < 4 ... < 10 < *J* < *Q* < *K* < *A*.

procedureLRSORT(R,n,d)

//recordsR= (R_{1}, ...,R) are sorted on the keys_{n}K^{1}, ...,K. The^{d}

range of each key is 0K. Sorting within a key is done using^{i}< r

bin sort. All records are assumed to be initially linked together

such thatLINK(i) points toR+1, 1_{i}inandLINK(n) = 0.//

declareE(0:r- 1),F(0:r- 1) //queue pointers//

p1 //pointer to start of list of records//

foridto1by- 1do//sort on keyK//^{i}

forj0tor- 1do//initialize bins to be empty queues//

F(j) 0

end

whilep0 do //put records into queues//

k//kis thei-th key ofp-th record//

ifF(k) = 0thenF(k)p//attach recordpinto bink//

elseLINK(E(k))p

E(k)p

pLINK(p) //get next record//

end

j0;whileF(j) = 0dojj+ 1end//find first nonempty

queue//

pF(j); tE(j)

forkj+ 1 tor- 1do//concatenate remaining queues//

ifF(k) 0then[LINK(t)F(k);tE(k)]

end

LINK(t) 0

end

endLRSORT

**Example 7.6:** We shall illustrate the operation of algorithm LRSORT while sorting a file of 10 numbers in the range [0,999]. Each decimal digit in the key will be regarded as a subkey. So, the value of *d* is 3 and that of *r* is 10. The input file is linked and has the form given on page 365 labeled *R*_{1}, ...,*R*_{10}. The figures on pages 365-367 illustrates the *r* = 10 case and the list after the queues have been collected from the 10 bins at the end of each phase. By using essentially the method above but by varying the radix, one can obtain (see exercises 13 and 14) linear time algorithms to sort *n* record files when the keys are in the range 0 *K _{i}* <

Apart from radix sort, all the sorting methods we have looked at require excessive data movement; i.e., as the result of a comparison, records may be physically moved. This tends to slow down the sorting process when records are large. In sorting files in which the records are large it is necessary to modify the sorting methods so as to minimize data movement. Methods such as Insertion Sort and Merge Sort can be easily modified to work with a linked file rather than a sequential file. In this case each record will require an additional link field. Instead of physically moving the record, its link field will be changed to reflect the change in position of that record in the file (see exercises 4 and 8). At the end of the sorting process, the records are linked together in the required order. In many applications (e.g., when we just want to sort files and then output them record by record on some external media in the sorted order) this is sufficient. However, in some applications it is necessary to physically rearrange the records *in place* so that they are in the required order. Even in such cases considerable savings can be achieved by first performing a linked list sort and then physically rearranging the records according to the order specified in the list. This rearranging can be accomplished in linear time using some additional space.

procedureLIST1(R,n,P)

//Pis a pointer to a list ofnsorted records linked together by the

fieldLINK. A second link field,LINKB, is assumed to be present

in each record. The records are rearranged so that the resulting

recordsR_{1}, ...,Rare consecutive and sorted//_{n}

u0;sP

whiles0do//convertPinto a doubly linked list usingLINKB//

LINKB(s)u//ufollowss//

us;sLINK(s)

end

fori1 ton- 1do//moveRto position_{P}iwhile//

ifPi//maintaining the list//

then[ifLINK(i) 0 thenLINKB(LINK(i))P

LINK(LINKB(i))P; AR_{P}

R_{P}R;_{i}R_{i}A]

PLINK(i) //examine the next record//

end

endLIST1

**Example 7.7:** After a list sort on the input file (35, 18, 12, 42, 26, 14) has been made the file is linked as below (only three fields of each record are shown):

RR_{1}R_{2}R_{3}R_{4}R_{5}_{6}

Key 35 18 12 42 26 14

LINK 4 5 6 0 1 2 P = 3

LINK B

R1 R2 R3 R4 R5 R6

35 18 12 42 26 14

4 5 6 0 1 2 P = 3

5 6 0 1 2 3

The configuration at the end of each execution of the **for** loop is:

12 18 35 42 26 14

6 5 4 0 3 2 P = 6

0 6 5 3 2 3

i = 1

12 14 35 42 26 18

6 6 4 0 3 5 P = 6

0 3 5 3 6 6

LS: R_{6}. R_{5}, R_{3}, R_{4 }i = 2

12 14 18 42 26 35

6 6 5 0 6 4 P = 5

0 3 6 6 6 5

LS: R5, R6, R4 i = 3

12 14 18 26 42 35

6 6 5 6 0 5 P = 6

0 3 6 6 6 5

LS: R6, R5, i = 4

12 14 18 26 35 42

6 6 5 6 6 0 P = 6

0 3 6 6 5 6

LS: R6, i = 5

procedureLIST2(R,n,P)

//Same function asLIST1 except that a second link fieldLINKB

is not reguired//

fori1ton-1do

//find correct record to place intoi'th position. The index of this

record must beias records in positions 1,2, ...,i-1 are already

correctly positioned//

whileP<ido

PLINK(P)

end

QLINK(P) //R_{Q}is next record with largest key//

ifPiRthen[//interchange_{i}andR_{p}movingR_{p}to its correct

spot asRhas_{p}i'th smallest key. Also set link from

old position of Rto new one//_{i}TR_{i }R_{i}RT_{p}; R_{p}

LINK(i)P]

PQ

end

endLIST2

**Example 7.8:** The data is the same as in Example 7.7. After the list sort we have:

RR_{1}R_{2}R_{3}R_{4}R_{5}_{6}

Key 35 18 12 42 26 14

LINK 4 5 6 0 1 2 P = 3

After each iteration of the **for** loop, we have:

12 18 35 42 26 14

3 5 4 0 3 2 P = 6

i = 1

12 14 35 42 26 18

3 6 4 0 1 5 P = 2

i = 2

Now *P* < 3 and so it is advanced to LINK(*P*) = 6.

12 14 18 42 26 35

3 6 6 0 1 4 P = 5

i = 3

12 14 18 26 42 35

3 6 6 5 0 4 P = 1

i = 4

Again P < 5 and following links from *R _{p}* we find

12 14 18 26 35 42

3 6 6 5 6 0 P = 4

i = 5

The list sort technique discussed above does not appear to be well suited for use with sort methods such as Quick Sort and Heap Sort. The sequential representation of the heap is essential to Heap Sort. In such cases as well as in the methods suited to List Sort, one can maintain an auxiliary table with one entry per record. The entries in this table serve as an indirect reference to the records. Let this table be T(l), T(2), ...,T(n). At the start of the sort T(i) = i, 1 i n. If the sorting algorithm requires an interchange of R_{i} and R_{j}, then only the table entries need be interchanged, i.e., T(i) and T(j). At the end of the sort, the record with the smallest key is R_{T(1) }and that with the largest R_{T(n)}. In general, following a table sort R_{T(i)} is the record with the i'th smallest key. The required permutation on the records is therefore R_{T(1)}, R_{T(2)}, ...,R_{T(n)} (see Figure 7.2). This table is adequate even in situations such as binary search, where a sequentially ordered file is needed. In other situations, it may be necessary to physically rearrange the records according to the permutation specified by T. The algorithm to rearrange records corresponding to the permutation T(1), T(2), ...,T(n) is a rather interesting application of a theorem from mathematics: viz, every permutation is made up of disjoint cycles. The cycle for any element i is made up of i, T(i), T^{2}(i), ...,T^{k}(i), (where T^{j}(i) = T(T^{j-1 }(i)) and T^{o}(i) = i) such that T^{k}(i) = i. Thus, the permutation T of figure 7.2 has two cycles, the first involving R_{1} and R_{5} and the second involving R_{4}, R_{3} and R_{2}. Algorithm TABLE utilizes this cyclic decomposition of a permutation. First, the cycle containing R_{1} is followed and all records moved to their correct positions. The cycle containing R_{2} is the next one examined unless this cycle has already been examined. The cycles for R_{3}, R_{4}, ...,R_{n-1} are followed in that order, achieving a reordering of all the records. While processing a trivial cycle for R_{i} (i.e. T(i) = i), no rearrangement involving record R_{i} is required since the condition T(i) = i means that the record with the i'th smallest key is R_{i}. In processing a nontrivial cycle for record R_{i} (i.e. T(i) i), R_{i} is moved to a temporary position P, then the record at T(i) is moved to (i); next the record at T(T(i)) is moved to T(i); and so on until the end of the cycle T^{k}(i) is reached and the record at P is moved to T^{k-1} (i).

procedureTABLE(R,n,T)

//The recordsR_{1}, ...,Rare rearranged to correspond to the sequence_{n}

R(1), ...,_{T}R(_{T}n),n1//

fori1ton- 1do

ifT(i) ithen//there is a nontrivial cycle starting ati//

[PR;_{i}ji//moveRto a temporary spot_{i}Pand follow//

repeat//cyclei,T(i),T(T(i)), ... until the correct spot//

kT(j)

RjR_{k}

T(j)j

jk

untilT(j) =i

R_{j}P//jis position for recordP//

T(j)j]

end

endTABLE

**Example 7.9: **: Following a table sort on the file *F* we have the following values for *T* (only the key values for the 8 records of *F* are shown):

RR_{1 }R_{2 }R_{3 }R_{4}R_{5 }R_{6 }R_{7 }_{8}

F 35 14 12 42 26 50 31 18

T 3 2 8 5 7 1 4 6

F 12 14 18 42 26 35 31 50

T 1 2 3 5 6 7 4 8

F 12 14 18 26 31 35 42 50

T 1 2 3 5 7 6 4 8

For the remaining values of *i* (*i *= 5, 6 and 7), *T*(*i*)* *=* i*, and no more nontrivial cycles are found.

n10 20 50 100 250 500 1000

Quicksort [with median of 3]

(File: [N,1,2,3, ...,

N - 2, N - 1]) .499 1.26 4.05 12.9 68.7 257. 1018.

Quicksort [without median

of 3] (File: [1,2,3, ...,

N - 1, N]) .580 1.92 9.92 38.2 226. 856. 3472.

Insertion Sort

[with K(0) = -]

(File: [N,N - 1, ...,2,1]) .384 1.38 8.20 31.7 203. 788. --

Insertion Sort

[without K(0) = -]

(File: [N,N - 1, ...,2,1]) .382 1.48 9.00 35.6 214. 861. --

Heap Sort .583 1.52 4.96 11.9 36.2 80.5 177.

Merge Sort .726 1.66 4.76 11.3 35.3 73.8 151

n10 20 50 100 250 500 1000

Radix Sort

(L.S.D.)

(R^{D}1.82 3.05 5.68 9.04 20.1 32.5 58.0

100,000;

optimal

D & R) R = 18; R = 18; R = 47; R = 47; R = 317; R = 317; R = 317;

D = 4 D = 4 D = 3 D = 3 D = 2 D = 2 D = 2

Radix Sort

(L.S.D.)

R = 10,

D = 5) 1.95 3.23 6.97 13.2 32.1 66.4 129.

Quicksort

[with

median

of 3] .448 1.06 3.17 7.00 20.0 43.1 94.9

Quicksort

[without

median

of 3] .372 .918 2.89 6.45 20.0 43.6 94.0

Insertion

Sort .232 .813 4.28 16.6 97.1 385. --

Insertion

Sort

(without

K(0) = -) .243 .885 4.72 18.5 111. 437. --

Heap Sort .512 1.37 4.52 10.9 33.2 76.1 166.

Merge Sort .642 1.56 4.55 10.5 29.6 68.2 144.

A comprehensive discussion of sorting and searching may be found in:

Two other useful references on sorting are:

*Sorting and Sort Systems* by H. Lorin, Addison-Wesley, Reading, Massachusetts, 1975.

For an in depth study of quicksort and stable merge sort see:

**1.** Work through algorithms BINSRCH and FIBSRCH on an ordered file with keys (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16) and determine the number of key comparisons made while searching for the keys 2, 10 and 15. For FIBSRCH we need the three Fibonnaci numbers *F*_{5 }= 5, *F*_{6 }=* *8, *F*_{7 }= 13.

**2.** [Count sort] About the simplest known sorting method arises from the observation that the position of a record in a sorted file depends on the number of records with smaller keys. Associated with each record there is a COUNT field used to determine the number of records which must precede this one in the sorted file. Write an algorithm to determine the COUNT of each record in an unordered file. Show that if the file has *n *records then all the COUNTs can be determined by making at most *n*(*n* - 1)/2 key comparisons.

**3.** The insertion of algorithm INSERT was carried out by (a) searching for the spot at which the insertion is to be made and (b) making the insertion. If as a result of the search it was decided that the insertion had to be made between *R _{i}* and

**4.** Phase (b) (see exercise 3) can be sped up by maintaining the sorted file as a linked list. In this case the insertion can be made without any accompanying movement of the other records. However, now (a) must be carried out sequentially as before. Such an insertion scheme is known as list insertion. Write an algorithm for list insertion. Note that the insertion algorithms of exercises 3 and 4 can be used for a sort without making any changes in INSORT.

**5.** a) Show that algorithm QSORT takes *O*(*n*^{2}) time when the input file is already in sorted order.

b) Why is *K _{m }*

**6.** (a) The quicksort algorithm QSORT presented in section 7.3 always fixes the position of the first record in the subfile currently being sorted. A better choice for this record is to choose the record with key value which is the median of the keys of the first, middle and last record in the subfile. Thus, using this median of three rule we correctly fix the position of the record *R _{i} *with

**7.** Quicksort is an unstable sorting method. Give an example of an input file in which the order of records with equal keys is not preserved.

**8.** a) Write a nonrecursive merge sort algorithm using linked lists to represent sorte subfiles. Show that if *n* records each of size *m *are being sorted then the time required is only* O *(*n *log *n*) as no records are physically moved.

**9.** (i) Prove that algorithm MSORT is stable.

**10.** In the 2-way merge sort scheme discussed in section 7.5 the sort was started with *n *sorted files each of size 1. Another approach would be to first make one pass over the data determining sequences of records that are in order and then using these as the initially sorted files. In this case, a left to right pass over the data of example 7.4 would result in the following partitioning of the data file into sorted subfiles. This would be followed by pairwise merging of the files until only one file remains.

**11.** Does algorithm LRSORT result in a stable sort when used to sort numbers as in Example 7.6?

**12.** Write a sort algorithm to sort records *R*_{1}, ...,*R _{n}* lexically on keys (

**13.** If we have *n* records with integer keys in the range [0, *n*^{2}), then they may be sorted in *O*(*n *log *n*) time using heap or merge sort. Radix sort on a single key, i.e.,* d *= 1 and *r* = *n*^{2 }takes *O*(*n*^{2}) time. Show how to interpret the keys as 2 subkeys so that radix sort will take only O(*n*) time to sort *n *records. (Hint: each key, *K _{i}*, may be written as with and integers in the range [0,

**14.** Generalize the method of the previous exercise to the case of integer keys in the range [0, *n ^{p}*) obtaining

**15.** Write the status of the following file *F *at the end of each phase of the following algorithms;

*F *= (12, 2, 16, 30, 8, 28, 4, 10, 20, 6, 18)

**16.** Write a table sort version of quicksort. Now during the sort, records are not physically moved. Instead, *T*(*i*) is the index of the record that would have been in position *i* if records were physically moved around as in algorithm QSORT. To begin with *T*(*i*) = *i*, 1 *i* *n*. At the end of the sort *T*(*i*) is the index of the record that should be in the *i*'th position in the sorted file. So now algorithm TABLE of section 7.8 may be sued to rearrange the records into the sorted order specified by *T*. Note that this reduces the amount of data movement taking place when compared to QSORT for the case of large records.

**17.** Write an algorithm similar to algorithm TABLE to rearrange the records of a file if with each record we have a COUNT of the number of records preceding it in the sorted file (see Exercise 2).

**18.** Under what conditions would a MSD Radix sort be more efficient than an LSD Radix sort?

**19.** Assume you are given a list of five-letter English words and are faced with the problem of lisitng out these words in sequences such that the words in each sequence are anagrams, i.e., if *x *and *y *are in the same sequence, then word *x *is a permutation of word *y*. You are required to list out the fewest such sequences. With this restriction show that no word can appear in more than one sequence. How would you go about solving this problem?

**20.** Assume you are working in the census department of a small town where the number of records, about 3,000, is small enough to fit into the internal memory of a computer. All the people currently living in this town were born in the United States. There is one record for each person in this town. Each record contains

i) the state in which the person was born;