In this chapter we discuss the problem of sorting an array of elements. To simplify matters, we will assume in our examples that the array contains only integers, although, obviously, more complicated structures are possible. For most of this chapter, we will also assume that the entire sort can be done in main memory, so that the number of elements is relatively small (less than a million). Sorts that cannot be performed in main memory and must be done on disk or tape are also quite important. This type of sorting, known as external sorting, will be discussed at the end of the chapter.

Our investigation of internal sorting will show that

There are several easy algorithms to sort in *O*(*n*^{2}), such as insertion sort.

There are slightly more complicated *O*(*n* log *n*) sorting algorithms.

Any general-purpose sorting algorithm requires (*n* log *n*) comparisons.

We will also assume the existence of the "<" and ">" operators, which can be used to place a consistent ordering on the input. Besides the assignment operator, these are the only operations allowed on the input data. Sorting under these conditions is known as *comparison-based *sorting*.*

One of the simplest sorting algorithms is the *insertion sort.* Insertion sort consists of *n* - 1 *passes.* For pass *p* = 2 through *n*, insertion sort ensures that the elements in positions 1 through *p* are in sorted order. Insertion sort makes use of the fact that elements in positions 1 through *p* - 1 are already known to be in sorted order. Figure 7.1 shows a sample file after each pass of insertion sort.

Figure 7.1 shows the general strategy. In pass *p*, we move the *p*th element left until its correct place is found among the first *p* elements. The code in Figure 7.2 implements this strategy. The sentinel in *a*[0] terminates the *while* loop in the event that in some pass an element is moved all the way to the front. Lines 3 through 6 implement that data movement without the explicit use of swaps. The element in position *p* is saved in *tmp,* and all larger elements (prior to position *p*) are moved one spot to the right. Then *tmp* is placed in the correct spot. This is the same technique that was used in the implementation of binary heaps.

Original 34 8 64 51 32 21 Positions Moved

----------------------------------------------------------

Afterp= 2 8 34 64 51 32 21 1

Afterp= 3 8 34 64 51 32 21 0

Afterp= 4 8 34 51 64 32 21 1

Afterp= 5 8 32 34 51 64 21 3

Afterp= 6 8 21 32 34 51 64 4

void

insertion_sort( input_type a[ ], unsigned int n )

{

unsigned int j, p;

input_type tmp;

/*1*/ a[0] = MIN_DATA; /* sentinel */

/*2*/ for( p=2; p<= n; p++ )

{

/*3*/ tmp = a[p];

/*4*/ for( j = p; tmp<a[j-1]; j-- )

/*5*/ a[j] = a[j-1];

/*6*/ a[j] = tmp;

}

}

Because of the nested loops, each of which can take *n* iterations, insertion sort is *O*(*n*^{2}). Furthermore, this bound is tight, because input in reverse order can actually achieve this bound. A precise calculation shows that the test at line 4 can be executed at most *p* times for each value of *p*. Summing over all *p* gives a total of

An *inversion* in an array of numbers is any ordered pair (*i, j*) having the property that *i* < *j* but *a*[*i*] > *a*[*j*]. In the example of the last section, the input list 34, 8, 64, 51, 32, 21 had nine inversions, namely (34,8), (34,32), (34,21), (64,51), (64,32), (64,21), (51,32), (51,21) and (32,21). Notice that this is exactly the number of swaps that needed to be (implicitly) performed by insertion sort. This is always the case, because swapping two adjacent elements that are out of place removes exactly one inversion, and a sorted file has no inversions. Since there is *O*(*n*) other work involved in the algorithm, the running time of insertion sort is *O*(*I + n*), where *I* is the number of inversions in the original file. Thus, insertion sort runs in linear time if the number of inversions is *O*(*n*).

Th*e average number of inversions in an array of n distinct numbers is n*(*n* - 1)/4.

This theorem implies that insertion sort is quadratic on average. It also provides a very strong lower bound about any algorithm that only exchanges adjacent elements.

*Any algorithm that sorts by exchanging adjacent elements requires *(*n*^{2})* time on average.*

Shellsort, named after its inventor, Donald Shell, was one of the first algorithms to break the quadratic time barrier, although it was not until several years after its initial discovery that a subquadratic time bound was proven. As suggested in the previous section, it works by comparing elements that are distant; the distance between comparisons decreases as the algorithm runs until the last phase, in which adjacent elements are compared. For this reason, Shellsort is sometimes referred to as *diminishing increment *sort.

Shellsort uses a sequence, *h*_{1}, *h*_{2}, . . . , *h _{t}*, called the

Original 81 94 11 93 12 35 17 95 28 58 41 75 15

----------------------------------------------------------------

After 5-sort 35 17 11 28 12 41 75 15 96 58 81 94 95

After 3-sort 28 12 11 35 15 41 58 17 94 75 81 96 95

After 1-sort 11 12 15 17 28 35 41 58 75 81 94 95 96

The general strategy to *h _{k}*-sort is for each position,

A popular (but poor) choice for increment sequence is to use the sequence suggested by Shell: *h _{t}*

The program in Figure 7.4 avoids the explicit use of swaps in the same manner as our implementation of insertion sort. Unfortunately, for Shellsort it is not possible to use a sentinel, and so the code in lines 3 through 7 is not quite as clean as the corresponding code in insertion sort (lines 3 through 5).

void

shellsort( input_type a[ ], unsigned int n )

{

unsigned int i, j, increment;

input_type tmp;

/*1*/ for( increment = n/2; increment>0; increment /= 2 )

/*2*/ for( i = increment+1; i<=n; i++ )

{

/*3*/ tmp = a[i];

/*4*/ for( j = i; j>increment; j -= increment )

/*5*/ if( tmp<a[j-increment] )

/*6*/ a[j] = a[j-increment];

else

/*7*/ break;

/*8*/ a[j] = tmp;

}

}

Although Shellsort is simple to code, the analysis of its running time is quite another story. The running time of Shellsort depends on the choice of increment sequence, and the proofs can be rather involved. The average-case analysis of Shellsort is a long-standing open problem, except for the most trivial increment sequences. We will prove tight worst-case bounds for two particular increment sequences.

*The worst-case running time of Shellsort, using Shell's increments, is *(*n*^{2})*.*

The proof requires showing not only an upper bound on the worst-case running time but also showing that there exists some input that actually takes (*n*^{2}) time to run. We prove the lower bound first, by constructing a bad case. First, we choose *n* to be a power of 2. This makes all the increments even, except for the last increment, which is 1. Now, we will give as input an array, *input_data*, with the *n*/2 largest numbers in the even positions and the *n*/2 smallest numbers in the odd positions. As all the increments except the last are even, when we come to the last pass, the *n*/2 largest numbers are still all in even positions and the *n*/2 smallest numbers are still all in odd positions. The *i*th smallest number (*i* *n*/2) is thus in position 2*i* -1 before the beginning of the last pass. Restoring the *i*th element to its correct place requires moving it *i* -1 spaces in the array. Thus, to merely place the *n*/2 smallest elements in the correct place requires at least work. As an example, Figure 7.5 shows a bad (but not the worst) input when *n* = 16. The number of inversions remaining after the 2-sort is exactly 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28; thus, the last pass will take considerable time.

To finish the proof, we show the upper bound of *O*(*n*^{2}). As we have observed before, a pass with increment *h _{k}* consists of

The problem with Shell's increments is that pairs of increments are not necessarily relatively prime, and thus the smaller increment can have little effect. Hibbard suggested a slightly different increment sequence, which gives better results in practice (and theoretically). His increments are of the form 1, 3, 7, . . . , 2* ^{k}* - 1. Although these increments are almost identical, the key difference is that consecutive increments have no common factors. We now analyze the worst-case running time of Shellsort for this increment sequence. The proof is rather complicated.

Start 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16

-------------------------------------------------------------------

After 8-sort 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16

After 4-sort 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16

After 2-sort 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16

After 1-sort 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

*The worst-case running time of Shellsort using Hibbard's increments is *(*n*^{3/2})*.*

Because both sums are geometric series, and since ,this simplifies to

The average-case running time of Shellsort, using Hibbard's increments, is thought to be *O*(*n*^{5/4}), based on simulations, but nobody has been able to prove this. Pratt has shown that the (*n*^{3/2}) bound applies to a wide range of increment sequences.

As mentioned in Chapter 6, priority queues can be used to sort in *O*(*n* log *n*) time. The algorithm based on this idea is known as *heapsort* and gives the best Big-Oh running time we have seen so far. In practice however, it is slower than a version of Shellsort that uses Sedgewick's increment sequence.

Recall, from Chapter 6, that the basic strategy is to build a binary heap of *n* elements. This stage takes *O(n)* time. We then perform *n delete_min* operations. The elements leave the heap smallest first, in sorted order. By recording these elements in a second array and then copying the array back, we sort *n* elements. Since each *delete_min* takes *O*(log *n*) time, the total running time is *O*(*n* log *n*).

The main problem with this algorithm is that it uses an extra array. Thus, the memory requirement is doubled. This could be a problem in some instances. Notice that the extra time spent copying the second array back to the first is only *O*(*n*), so that this is not likely to affect the running *time* significantly. The problem is space.

A clever way to avoid using a second array makes use of the fact that after each *delete_min*, the heap shrinks by 1. Thus the cell that was last in the heap can be used to store the element that was just deleted. As an example, suppose we have a heap with six elements. The first *delete_min* produces *a*_{1}. Now the heap has only five elements, so we can place *a*_{1} in position 6. The next *delete_min* produces *a*_{2}. Since the heap will now only have four elements, we can place *a*_{2} in position 5.

Using this strategy, after the last *delete_min* the array will contain the elements in *decreasing* sorted order. If we want the elements in the more typical *increasing *sorted order, we can change the ordering property so that the parent has a larger key than the child. Thus we have a (*max*)heap.

In our implementation, we will use a (*max*)heap, but avoid the actual ADT for the purposes of speed. As usual, everything is done in an array. The first step builds the heap in linear time. We then perform *n* - 1 *delete_maxes* by swapping the last element in the heap with the first, decrementing the heap size, and percolating down. When the algorithm terminates, the array contains the elements in sorted order. For instance, consider the input sequence 31, 41, 59, 26, 53, 58, 97. The resulting heap is shown in Figure 7.6.

Figure 7.7 shows the heap that results after the first *delete_max*. As the figures imply, the last element in the heap is 31; 97 has been placed in a part of the heap array that is technically no longer part of the heap. After 5 more *delete_max *operations, the heap will actually have only one element, but the elements left in the heap array will be in sorted order.

The code to perform heapsort is given in Figure 7.8.

void

heapsort( input_type a[], unsigned int n )

{

int i;

/*1*/ for( i=n/2; i>0; i-- ) /* build_heap */

/*2*/ perc_down (a, i, n );

/*3*/ for( i=n; i>=2; i-- )

{

/*4*/ swap( &a[1], &a[i] ); /* delete_max */

/*5*/ perc_down( a, 1, i-1 );

}

}

void

perc_down( input_type a[], unsigned int i, unsigned int n )

{

unsigned int child;

input_type tmp;

/*1*/ for( tmp=a[i]; i*2<=n; i=child )

{

/*2*/ child = i*2;

/*3*/ if( ( child != n ) && ( a[child+1]>a[child] ) )

/*4*/ child++;

/*5*/ if( tmp<a[child] )

/*6*/ a[i] = a[child];

else

/*7*/ break;

}

/*8*/ a[i] = tmp;

}

We now turn our attention to *mergesort*. Mergesort runs in *O*(*n* log *n*) worst-case running time, and the number of comparisons used is nearly optimal. It is a fine example of a recursive algorithm.

2 is added to *c*, and then 13 and 15 are compared.

13 is added to *c*, and then 24 and 15 are compared. This proceeds until 26 and 27 are compared.

26 is added to *c*, and the *a* array is exhausted.

The remainder of the *b* array is then copied to *c*.

The mergesort algorithm is therefore easy to describe. If *n* = 1, there is only one element to sort, and the answer is at hand. Otherwise, recursively mergesort the first half and the second half. This gives two sorted halves, which can then be merged together using the merging algorithm described above. For instance, to sort the eight-element array 24, 13, 26, 1, 2, 27, 38, 15, we recursively sort the first four and last four elements, obtaining 1, 13, 24, 26, 2, 15, 27, 38. Then we merge the two halves as above, obtaining the final list 1, 2, 13, 15, 24, 26, 27, 38. This algorithm is a classic divide-and-conquer strategy. The problem is *divided* into smaller problems and solved recursively. The *conquering* phase consists of patching together the answers. Divide-and-conquer is a very powerful use of recursion that we will see many times.

An implementation of mergesort is provided in Figure 7.9. The procedure called *mergesort* is just a driver for the recursive routine *m_sort*.

The *merge* routine is subtle. If a temporary array is declared locally for each recursive call of *merge*, then there could be log *n* temporary arrays active at any point. This could be fatal on a machine with small memory. On the other hand, if the merge routine dynamically allocates and frees the minimum amount of temporary memory, considerable time will be used by *malloc*. A close examination shows that since *merge* is the last line of *m_sort*, there only needs to be one temporary array active at any point. Further, we can use any part of the temporary array; we will use the same portion as the input array *a*. This allows the improvement described at the end of this section. Figure 7.10 implements the *merge* routine.

Mergesort is a classic example of the techniques used to analyze recursive routines. It is not obvious that mergesort can easily be rewritten without recursion (it can), so we have to write a recurrence relation for the running time. We will assume that *n* is a power of 2, so that we always split into even halves. For *n* = 1, the time to mergesort is constant, which we will denote by 1. Otherwise, the time to mergesort *n* numbers is equal to the time to do two recursive mergesorts of size *n*/2, plus the time to merge, which is linear. The equations below say this exactly:

T(1) = 1

T(n) = 2T(n/2) +n

void

mergesort( input_type a[], unsigned int n )

{

input_type *tmp_array;

tmp_array = (input_type *) malloc

( (n+1) * sizeof (input_type) );

if( tmp_array != NULL )

{

m_sort( a, tmp_array, 1, n );

free( tmp_array );

}

else

fatal_error("No space for tmp array!!!");

}

void

m_sort( input_type a[], input_type tmp_array[ ],

int left, int right )

{

int center;

if( left<right )

{

center = (left + right) / 2;

m_sort( a, tmp_array, left, center );

m_sort( a, tmp_array, center+1, right );

merge( a, tmp_array, left, center+1, right );

}

}

This is a standard recurrence relation, which can be solved several ways. We will show two methods. The first idea is to divide the recurrence relation through by *n*. The reason for doing this will become apparent soon. This yields

This equation is valid for any *n* that is a power of 2, so we may also write

/* 1_pos = start of left half, r_pos = start of right half */

void

merge( input_type a[ ], input_type tmp_array[ ],

int l_pos, int r_pos, int right_end )

{

int i, left_end, num_elements, tmp_pos;

left_end = r_pos - 1;

tmp_pos = l_pos;

num_elements = right_end - l_pos + 1;

/* main loop */

while( ( 1_pos<= left_end ) && ( r_pos<= right_end ) )

if( a[1_pos]<= a[r_pos] )

tmp_array[tmp_pos++] = a[l_pos++];

else

tmp_array[tmp_pos++] = a[r_pos++];

while( l_pos<= left_end ) /* copy rest of first half */

tmp_array[tmp_pos++] = a[l_pos++];

while( r_pos<= right_end ) /* copy rest of second half */

tmp_array[tmp_pos++] = a[r_pos++];

/* copy tmp_array back */

for(i=1; i<= num_elements; i++, right_end-- )

a[right_end] = tmp_array[right_end];

}

Now add up all the equations. This means that we add all of the terms on the left-hand side and set the result equal to the sum of all of the terms on the right-hand side. Observe that the term *T*(*n*/2)/(*n*/2) appears on both sides and thus cancels. In fact, virtually all the terms appear on both sides and cancel. This is called *telescoping* a sum. After everything is added, the final result is

T(n) =nlogn+n=O(nlogn)

T(n)= 2T(n/2) +n

Since we can substitute *n*/2 into the main equation,

2T(n/2) = 2(2(T(n/4)) +n/2) = 4T(n/4) +n

T(n) = 4T(n/4) + 2n

Again, by substituting *n*/4 into the main equation, we see that

4T(n/4) = 4(2T(n/8)) + (n/4) = 8T(n/8) +n

T(n) = 8T(n/8) + 3n

Continuing in this manner, we obtain

T(n) = 2(^{k}Tn/2) +^{k}kn

T(n) =nT(1) +nlogn=nlogn+n

Although mergesort's running time is *O*(*n* log *n*), it is hardly ever used for main memory sorts. The main problem is that merging two sorted lists requires linear extra memory, and the additional work spent copying to the temporary array and back, throughout the algorithm, has the effect of slowing down the sort considerably. This copying can be avoided by judiciously switching the roles of *a* and *tmp_array* at alternate levels of the recursion. A variant of mergesort can also be implemented nonrecursively (Exercise 7.13), but even so, for serious internal sorting applications, the algorithm of choice is quicksort, which is described in the next section. Nevertheless, as we will see later in this chapter, the merging routine is the cornerstone of most external sorting algorithms.

As its name implies, *quicksort* is the fastest known sorting algorithm in practice. Its average running time is *O*(*n *log *n*). It is very fast, mainly due to a very tight and highly optimized inner loop. It has *O*(*n*^{2}) worst-case performance, but this can be made exponentially unlikely with a little effort. The quicksort algorithm is simple to understand and prove correct, although for many years it had the reputation of being an algorithm that could in theory be highly optimized but in practice was impossible to code correctly (no doubt because of FORTRAN). Like mergesort, quicksort is a divide-and-conquer recursive algorithm. The basic algorithm to sort an array *S* consists of the following four easy steps:

1. If the number of elements in *S* is 0 or 1, then return.

2. Pick any element *v* in *S*. This is called the *pivot*.

4. Return { quicksort(*S*_{1}) followed by *v* followed by quicksort(*S*_{2})}.

Figure 7.11 shows the action of quicksort on a set of numbers. The pivot is chosen (by chance) to be 65. The remaining elements in the set are partitioned into two smaller sets. Recursively sorting the set of smaller numbers yields 0, 13, 26, 31, 43, 57 (by rule 3 of recursion). The set of large numbers is similarly sorted. The sorted arrangement of the entire set is then trivially obtained.

It should be clear that this algorithm works, but it is not clear why it is any faster than mergesort. Like mergesort, it recursively solves two subproblems and requires linear additional work (step 3), but, unlike mergesort, the subproblems are not guaranteed to be of equal size, which is potentially bad. The reason that quicksort is faster is that the partitioning step can actually be performed in place and very efficiently. This efficiency more than makes up for the lack of equal-sized recursive calls.

The algorithm as described so far lacks quite a few details, which we now fill in. There are many ways to implement steps 2 and 3; the method presented here is the result of extensive analysis and empirical study and represents a very efficient way to implement quicksort. Even the slightest deviations from this method can cause surprisingly bad results.

7.7.4. Actual Quicksort Routines

7.7.6. A Linear-Expected-Time Algorithm for Selection

Although the algorithm as described works no matter which element is chosen as pivot, some choices are obviously better than others.

The popular, uninformed choice is to use the first element as the pivot. This is acceptable if the input is random, but if the input is presorted or in reverse order, then the pivot provides a poor partition, because virtually all the elements go into *S*_{1} or *S*_{2}. Worse, this happens consistently throughout the recursive calls. The practical effect is that if the first element is used as the pivot and the input is presorted, then quicksort will take quadratic time to do essentially nothing at all, which is quite embarrassing. Moreover, presorted input (or input with a large presorted section) is quite frequent, so using the first element as pivot *is an absolutely horrible idea* and should be discarded immediately. An alternative is choosing the larger of the first two distinct keys as pivot, but this has the same bad properties as merely choosing the first key. Do not use that pivoting strategy either.

The median of a group of *n* numbers is the *n*/2 th largest number. The best choice of pivot would be the median of the file. Unfortunately, this is hard to calculate and would slow down quicksort considerably. A good estimate can be obtained by picking three elements randomly and using the median of these three as pivot. The randomness turns out not to help much, so the common course is to use as pivot the median of the left, right and center elements. For instance, with input 8, 1, 4, 9, 6, 3, 5, 2, 7, 0 as before, the left element is 8, the right element is 0 and the center (in position (*left* + *right*)/2) element is 6. Thus, the pivot would be *v* = 6. Using median-of-three partitioning clearly eliminates the bad case for sorted input (the partitions become equal in this case) and actually reduces the running time of quicksort by about 5 percent.

There are several partitioning strategies used in practice, but the one described here is known to give good results. It is very easy, as we shall see, to do this wrong or inefficiently, but it is safe to use a known method. The first step is to get the pivot element out of the way by swapping it with the last element. *i* starts at the first element and *j* starts at the next-to-last element. If the original input was the same as before, the following figure shows the current situation.

8 1 4 9 0 3 5 2 7 6

i j

For now we will assume that all the elements are distinct. Later on we will worry about what to do in the presence of duplicates. As a limiting case, our algorithm must do the proper thing if *all* of the elements are identical. It is surprising how easy it is to do the *wrong* thing.

8 1 4 9 0 3 5 2 7 6

i j

We then swap the elements pointed to by *i* and *j* and repeat the process until *i* and *j* cross.

After First Swap

----------------------------

2 1 4 9 0 3 5 8 7 6

i j

Before Second Swap

----------------------------

2 1 4 9 0 3 5 8 7 6

i j

After Second Swap

----------------------------

2 1 4 5 0 3 9 8 7 6

i j

Before Third Swap

----------------------------

2 1 4 5 0 3 9 8 7 6

j i

After Swap with Pivot

---------------------------------------

2 1 4 5 0 3 6 8 7 9

i pivot

For very small files (*n* 20), quicksort does not perform as well as insertion sort. Furthermore, because quicksort is recursive, these cases will occur frequently. A common solution is not to use quicksort recursively for small files, but instead use a sorting algorithm that is efficient for small files, such as insertion sort. An even better idea is to leave the file slightly unsorted and finish up with insertion sort. This works well, because insertion sort is efficient for nearly sorted files. Using this strategy can actually save about 15 percent in the running time (over doing no cutoff at all). A good cutoff range is *n* = 10, although any cutoff between 5 and 20 is likely to produce similar results. This also saves nasty degenerate cases, such as taking the median of three elements when there are only one or two. Of course, if there is a bug in the basic quicksort routine, then the insertion sort will be very, very slow.

The driver for quicksort is shown in Figure 7.12.

The general form of the routines will be to pass the array and the range of the array (*left* and *right*) to be sorted. The first routine to deal with is pivot selection. The easiest way to do this is to sort *a*[*left*], *a*[*right*], and *a*[*center*] in place. This has the extra advantage that the smallest of the three winds up in *a*[*left*], which is where the partitioning step would put it anyway. The largest winds up in a[*right*], which is also the correct place, since it is larger than the pivot. Therefore, we can place the pivot in *a*[*right* - 1] and initialize *i* and *j* to *left* + 1 and *right* - 2 in the partition phase. Yet another benefit is that because *a*[*left*] is smaller than the pivot, it will act as a sentinel for *j*. Thus, we do not need to worry about *j* running past the end. Since *i* will stop on keys equal to the pivot, storing the pivot in *a*[*right* - 1] provides a sentinel for *i*. The code in Figure 7.13 does the median-of-three partitioning with all the side effects described. It may seem that it is only slightly inefficient to compute the pivot by a method that does not actually sort *a*[*left*], *a*[*center*], and *a*[*right*], but, surprisingly, this produces bad results (see Exercise 7.37).

The real heart of the quicksort routine is in Figure 7.14. It includes the partitioning and recursive calls. There are several things worth noting in this implementation. Line 3 initializes *i* and *j* to 1 past their correct values, so that there are no special cases to consider. This initialization depends on the fact that median-of-three partitioning has some side effects; this program will not work if you try to use it without change with a simple pivoting strategy, because *i* and *j* start in the wrong place and there is no longer a sentinel for *j*.

void

quick_sort( input_type a[ ], unsigned int n )

{

q_sort( a, 1, n );

insertion_sort( a, n );

}

/* Return median of left, center, and right. */

/* Order these and hide pivot */

input_type

median3( input_type a[], int left, int right )

{

int center;

center = (left + right) / 2;

if( a[left]>a[center] )

swap( &a[left], &a[center] );

if( a[left]>a[right] )

swap( &a[left], &a[right] );

if( a[center]>a[right] )

swap( &a[center], &a[right] );

/* invariant: a[left]<= a[center]<= a[right] */

swap( &a[center], &a[right-1] ); /* hide pivot */

return a[right-1]; /* return pivot */

}

Finally, lines 5 and 6 show why quicksort is so fast. The inner loop of the algorithm consists of an increment/decrement (by 1, which is fast), a test, and a jump. There is no extra juggling as there is in mergesort. This code is still surprisingly tricky. It is tempting to replace lines 3 through 9 with the statements in Figure 7.15. This does not work, because there would be an infinite loop if *a*[*i*] = *a*[*j*] = *pivot*.

Like mergesort, quicksort is recursive, and hence, its analysis requires solving a recurrence formula. We will do the analysis for a quicksort, assuming a random pivot (no median-of-three partitioning) and no cutoff for small files. We will take *T*(0) = *T*(1) = 1, as in mergesort. The running time of quicksort is equal to the running time of the two recursive calls plus the linear time spent in the partition (the pivot selection takes only constant time). This gives the basic quicksort relation

*T*(*n*) = *T*(*i*) + *T*(*n* - *i* - 1) + *cn*

where *i* = |*S*_{1}| is the number of elements in *S*_{1}. We will look at three cases.

void

q_sort( input_type a[], int left, int right )

{

int i, j;

input_type pivot;

/*1*/ if( left + CUTOFF<= right )

{

/*2*/ pivot = median3( a, left, right );

/*3*/ i=left; j=right-1;

/*4*/ for(;;)

{

/*5*/ while( a[++i]<pivot );

/*6*/ while( a[--j]>pivot );

/*7*/ if( i<j )

/*8*/ swap( &a[i], &a[j] );

else

/*9*/ break;

}

/*10*/ swap( &a[i], &a[right-1] ); /*restore pivot*/

/*11*/ q_sort( a, left, i-1 );

/*12*/ q_sort( a, i+1, right );

}

}

/*3*/ i=left+1; j=right-2;

/*4*/ for(;;)

{

/*5*/ while( a[i]<pivot ) i++;

/*6*/ while( a[j]>pivot ) j--;

/*7*/ if( i<j )

/*8*/ swap( &a[i], &a[j] );

else

/*9*/ break;

}

The pivot is the smallest element, all the time. Then *i* = 0 and if we ignore *T*(0) = 1, which is insignificant, the recurrence is

We telescope, using Equation (7.2) repeatedly. Thus

*T*(*n* -1) = *T*(*n* - 2) + *c*(*n* - 1)

*T*(*n* - 2) = *T*(*n* - 3) + *c*(*n* - 2)

Adding up all these equations yields

Divide both sides of Equation (7.7) by *n*.

We will telescope using this equation.

We add all the equations from (7.7) to (7.11) and note that there are log *n* of them:

*T*(*n*) = *cn* log *n* + *n* = O(*n* log *n*)

Notice that this is the exact same analysis as mergesort, hence we get the same answer.

This is the most difficult part. For the average case, we assume that each of the file sizes for *S*_{1} is equally likely, and hence has probability 1/*n*. This assumption is actually valid for our pivoting and partitioning strategy, but it is not valid for some others. Partitioning strategies that do not preserve the randomness of the subfiles cannot use this analysis. Interestingly, these strategies seem to result in programs that take longer to run in practice.

With this assumption, the average value of *T*(*i*), and hence *T*(*n* - *i* -1), is . Equation (7.1) then becomes

If Equation (7.14) is multiplied by *n*, it becomes

If we subtract (7.16) from (7.15), we obtain

*nT*(*n*)* - *(*n* -1)*T*(*n* -1) = 2*T*(*n* -1) + 2*cn -c*

We rearrange terms and drop the insignificant -*c* on the right, obtaining

We now have a formula for *T(n*) in terms of *T(n* -1) only. Again the idea is to telescope, but Equation (7.18) is in the wrong form. Divide (7.18) by *n(n* + 1):

Adding equations (7.19) through (7.22) yields

The sum is about log* _{e}*, where 0.577 is known as Euler's constant, so

Although this analysis seems complicated, it really is not--the steps are natural once you have seen some recurrence relations. The analysis can actually be taken further. The highly optimized version that was described above has also been analyzed, and this result gets extremely difficult, involving complicated recurrences and advanced mathematics. The effects of equal keys has also been analyzed in detail, and it turns out that the code presented does the right thing.

Quicksort can be modified to solve the *selection problem*, which we have seen in chapters 1 and 6. Recall that by using a priority queue, we can find the *k*th largest (or smallest) element in *O*(*n* + *k* log *n*). For the special case of finding the median, this gives an *O*(*n* log *n*) algorithm.

Since we can sort the file in *O*(*n* log *n*) time, one might expect to obtain a better time bound for selection. The algorithm we present to find the *k*th smallest element in a set *S* is almost identical to quicksort. In fact, the first three steps are the same. We will call this algorithm *quickselect*. Let |*S _{i}*| denote the number of elements in

3. Partition *S* - {*v*} into *S*_{1} and *S*_{2}, as was done with quicksort.

The implementation of quickselect is even simpler than the abstract description might imply. The code to do this shown in Figure 7.16. When the algorithm terminates, the *k*th smallest element is in position *k*. This destroys the original ordering; if this is not desirable, then a copy must be made.

/* q_select places the kth smallest element in a[k]*/

void

q_select( input_type a[], int k, int left, int right )

{

int i, j;

input_type pivot;

/*1*/ if( left + CUTOFF<= right )

{

/*2*/ pivot = median3( a, left, right );

/*3*/ i=left; j=right-1;

/*4*/ for(;;)

{

/*5*/ while( a[++i]<pivot );

/*6*/ while( a[--j]>pivot );

/*7*/ if (i<j )

/*8*/ swap( &a[i], &a[j] );

else

/*9*/ break;

}

/*10*/ swap( &a[i], &a[right-1] ); /* restore pivot */

/*11*/ if( k<i)

/*12*/ q_select( a, k, left, i-1 );

else

/*13*/ if( k>i )

/*14*/ q-select( a, k, i+1, right );

}

else

/*15*/ insert_sort(a, left, right );

}

Using a median-of-three pivoting strategy makes the chance of the worst case occuring almost negligible. By carefully choosing the pivot, however, we can eliminate the quadratic worst case and ensure an *O*(*n*) algorithm. The overhead involved in doing this is considerable, so the resulting algorithm is mostly of theoretical interest. In Chapter 10, we will examine the linear-time worst-case algorithm for selection, and we shall also see an interesting technique of choosing the pivot that results in a somewhat faster selection algorithm in practice.

Throughout our discussion of sorting, we have assumed that the elements to be sorted are simply integers. Frequently, we need to sort large structures by a certain key. For instance, we might have payroll records, with each record consisting of a name, address, phone number, financial information such as salary, and tax information. We might want to sort this information by one particular field, such as the name. For all of our algorithms, the fundamental operation is the swap, but here swapping two structures can be a very expensive operation, because the structures are potentially large. If this is the case, a practical solution is to have the input array contain pointers to the structures. We sort by comparing the keys the pointers point to, swapping pointers when necessary. This means that all the data movement is essentially the same as if we were sorting integers. This is known as *indirect sorting*; we can use this technique for most of the data structures we have described. This justifies our assumption that complex structures can be handled without tremendous loss efficiency.

Although we have *O*(*n* log *n*) algorithms for sorting, it is not clear that this is as good as we can do. In this section, we prove that any algorithm for sorting that uses only comparisons requires (*n* log *n*) comparisons (and hence time) in the worst case, so that mergesort and heapsort are optimal to within a constant factor. The proof can be extended to show that (*n* log *n*) comparisons are required, even on average, for any sorting algorithm that uses only comparisons, which means that quicksort is optimal on average to within a constant factor.

Specifically, we will prove the following result: Any sorting algorithm that uses only comparisons requires log *n*! comparisons in the worst case and log *n*! comparisons on average. We will assume that all *n* elements are distinct, since any sorting algorithm must work for this case.

A *decision tree* is an abstraction used to prove lower bounds. In our context, a decision tree is a binary tree. Each node represents a set of possible orderings, consistent with comparisons that have been made, among the elements. The results of the comparisons are the tree edges.

The decision tree in Figure 7.17 represents an algorithm that sorts the three elements *a, b,* and *c*. The initial state of the algorithm is at the root. (We will use the terms *state* and *node* interchangeably.) No comparisons have been done, so all orderings are legal. The first comparison that *this particular* algorithm performs compares *a* and *b*. The two results lead to two possible states. If *a* < *b*, then only three possibilities remain. If the algorithm reaches node 2, then it will compare *a* and *c*. Other algorithms might do different things; a different algorithm would have a different decision tree. If *a* > *c*, the algorithm enters state 5. Since there is only one ordering that is consistent, the algorithm can terminate and report that it has completed the sort. If *a* < *c*, the algorithm cannot do this, because there are two possible orderings and it cannot possibly be sure which is correct. In this case, the algorithm will require one more comparison.

Every algorithm that sorts by using only comparisons can be represented by a decision tree. Of course, it is only feasible to draw the tree for extremely small input sizes. The number of comparisons used by the sorting algorithm is equal to the depth of the deepest leaf. In our case, this algorithm uses three comparisons in the worst case. The average number of comparisons used is equal to the average depth of the leaves. Since a decision tree is large, it follows that there must be some long paths. To prove the lower bounds, all that needs to be shown are some basic tree properties.

Let T be a binary tree of depth d. Then T has at most 2^{d }leaves.

*A binary tree with L leaves must have depth at least* log *L*.

Immediate from the preceding lemma.

Any sorting algorithm that uses only comparisons between elements requires (n log n) comparisons.

From the previous theorem, log *n*! comparisons are required.

This type of lower-bound argument, when used to prove a worst-case result, is sometimes known as an *information-theoretic* lower bound. The general theorem says that if there are *P* different possible cases to distinguish, and the questions are of the form YES/NO, then log *P* questions are always required in some case by any algorithm to solve the problem. It is possible to prove a similar result for the average-case running time of any comparison-based sorting algorithm. This result is implied by the following lemma, which is left as an exercise: Any binary tree with *L* leaves has an average depth of at least log *L*.

Although we proved in the previous section that any general sorting algorithm that uses only comparisons requires (*n* log *n*) time in the worst case, recall that it is still possible to sort in linear time in some special cases.

So far, all the algorithms we have examined require that the input fit into main memory. There are, however, applications where the input is much too large to fit into memory. This section will discuss *external sorting* algorithms, which are designed to handle very large inputs.

The basic external sorting algorithm uses the *merge* routine from mergesort. Suppose we have four tapes, *T _{a}*1,

If *m* = 3, then after the runs are constructed, the tapes will contain the data indicated in the following figure.

If we have extra tapes, then we can expect to reduce the number of passes required to sort our input. We do this by extending the basic (two-way) merge to a *k*-way merge.

We then need two more passes of three-way merging to complete the sort.

The *k*-way merging strategy developed in the last section requires the use of 2*k *tapes. This could be prohibitive for some applications. It is possible to get by with only *k *+ 1 tapes. As an example, we will show how to perform two-way merging using only three tapes.

Run After After After After After After After

Const. T+ T_{3}T_{2 }+ T_{1}T_{2 }+ T_{1}T_{3 }+ T_{2}T_{3 }+ T_{1}T_{2 }+ T_{1}T_{3 }+ T_{2}_{3}

-------------------------------------------------------------------------

T_{1 }0 13 5 0 3 1 0 1

T_{2 }21 8 0 5 2 0 1 0

T_{3 }13 0 8 3 0 2 1 0

It turns out that the first distribution we gave is optimal. If the number of runs is a Fibonacci number *F _{n}*, then the best way to distribute them is to split them into two Fibonacci numbers

We can extend this to a *k-*way merge, in which case we need *k*th order Fibonacci numbers for the distribution, where the *k*th order Fibonacci number is defined as *F*^{(k)}(*n*) = *F*^{(k)}(*n - *1) + *F*^{(k)}(*n - *2) + + *F*^{(k)}(*n - k*), with the appropriate initial conditions *F*^{(k)}(*n*) = 0, 0 *n* *k - *2, *F*^{(k)}(*k -* 1) =1.

The last item we will consider is construction of the runs. The strategy we have used so far is the simplest possible: We read as many records as possible and sort them, writing the result to some tape. This seems like the best approach possible, until one realizes that as soon as the first record is written to an output tape, the memory it used becomes available for another record. If the next record on the input tape is larger than the record we have just output, then it can be included in the run.

Using this observation, we can give an algorithm for producing runs. This technique is commonly referred to as *replacement selection*. Initially, *m* records are read into memory and placed in a priority queue. We perform a *delete_min*, writing the smallest record to the output tape. We read the next record from the input tape. If it is larger than the record we have just written, we can add it to the priority queue. Otherwise, it cannot go into the current run. Since the priority queue is smaller by one element, we can store this new element in the dead space of the priority queue until the run is completed and use the element for the next run. Storing an element in the dead space is similar to what is done in heapsort. We continue doing this until the size of the priority queue is zero, at which point the run is over. We start a new run by building a new priority queue, using all the elements in the dead space. Figure 7.18 shows the run construction for the small example we have been using, with *m *= 3. Dead elements are indicated by an asterisk.

In this example, replacement selection produces only three runs, compared with the five runs obtained by sorting. Because of this, a three-way merge finishes in one pass instead of two. If the input is randomly distributed, replacement selection can be shown to produce runs of average length 2*m*. For our large example, we would expect 160 runs instead of 320 runs, so a five-way merge would require four passes. In this case, we have not saved a pass, although we might if we get lucky and have 125 runs or less. Since external sorts take so long, every pass saved can make a significant difference in the running time.

3 Elements In Heap Array Output Next Element Read

H[1] H[2] H[3]

---------------------------------------------------------------

Run 1 11 94 81 11 96

81 94 96 81 12*

94 96 12* 94 35*

96 35* 12* 96 17*

17* 35* 12* End of Run. Rebuild Heap

---------------------------------------------------------------

Run 2 12 35 17 12 99

17 35 99 17 28

28 99 35 28 58

35 99 58 35 41

41 99 58 41 75*

58 99 75* 58 end of tape

99 75* 99

75* End of Run. Rebuild Heap

---------------------------------------------------------------

Run 3 75 75

For most general internal sorting applications, either insertion sort, Shellsort, or quicksort will be the method of choice, and the decision of which to use will depend mostly on the size of the input. Figure 7.19 shows the running time obtained for each algorithm on various file sizes.

The data was chosen to be random permutations of *n *integers, and the times given include only the actual time to sort. The code given in Figure 7.2 was used for insertion sort. Shellsort used the code in Section 7.4 modified to run with Sedgewick's increments. Based on literally millions of sorts, ranging in size from 100 to 25 million, the expected running time of Shellsort with these increments is conjectured to be *O*(*n*^{7/6}). The heapsort routine is the same as in Section 7.5. Two versions of quicksort are given. The first uses a simple pivoting strategy and does not do a cutoff. Fortunately, the input files were random. The second uses median-of-three partitioning and a cutoff of ten. Further optimizations were possible. We could have coded the median-of-three routine in-line instead of using a function, and we could have written quicksort nonrecursively. There are some other optimizations to the code that are fairly tricky to implement, and of course we could have used an assembly language. We have made an honest attempt to code all routines efficiently, but of course the performance can vary somewhat from machine to machine.

The highly optimized version of quicksort is as fast as Shellsort even for very small input sizes. The improved version of quicksort still has an *O*(*n*^{2}) worst case (one exercise asks you to construct a small example), but the chances of this worst case appearing are so negligible as to not be a factor. If you need to sort large files, quicksort is the method of choice. But never, ever, take the easy way out and use the first element as pivot. It is just not safe to assume that the input will be random. If you do not want to worry about this, use Shellsort. Shellsort will give a small performance penalty but could also be acceptable, especially if simplicity is required. Its worst case is only *O*(*n*^{4/3}); the chance of that worst case occuring is likewise negligible.

Heapsort, although an *O *(*n *log *n*) algorithm with an apparently tight inner loop, is slower than Shellsort. A close examination of the algorithm reveals that in order to move data, heapsort does two comparisons. Carlsson has analyzed an improvement suggested by Floyd that moves data with essentially only one comparison, but implementing this improvement makes the code somewhat longer. We leave it to the reader to decide whether the extra coding effort is worth the increased speed (Exercise 7.39).

Insertion Sort Shellsort Heapsort Quicksort Quicksort(opt.)

n(O)n^{2}(O)n^{7/6}(Ologn)n(Ologn)n(Ologn)n

---------------------------------------------------------------------------

10 0.00044 0.00041 0.00057 0.00052 .00046

100 0.00675 0.00171 0.00420 0.00284 .00244

1000 0.59564 0.02927 0.05565 0.03153 .02587

10000 58.864 0.42998 0.71650 0.36765 .31532

100000 NA 5.7298 8.8591 4.2298 3.5882

1000000 NA 71.164 104.68 47.065 41.282

7.1 Sort the sequence 3, 1, 4, 1, 5, 9, 2, 6, 5 using insertion sort.

7.2 What is the running time of insertion sort if all keys are equal?

7.3 Suppose we exchange elements *a*[*i*] and *a*[*i *+ *k*], which were originally out of order. Prove that at least 1 and at most 2*k *- 1 inversions are removed.

7.4 Show the result of running Shellsort on the input 9, 8, 7, 6, 5, 4, 3, 2, 1 using the increments { 1, 3, 7 }.

7.5 What is the running time of Shellsort using the two-increment sequence 1, 2 }?

7.6 *a. Prove that the running time of Shellsort is (*n*^{2) }using increments of the form 1, *c*, *c*^{2}, ..., *c ^{i}* for any integer

**b. Prove that for these increments, the average running time is (*n*^{3/2}).

*7.7 Prove that if a *k*-sorted file is then *h*-sorted, it remains *k*-sorted.

**7.8 Prove that the running time of Shellsort, using the increment sequence suggested by Hibbard, is (*n*^{3/2}) in the worst case. *Hint: *You can prove the bound by considering the special case of what Shellsort does when all elements are either 0 or 1. Set *input_data*[*i*] = 1 if *i *is expressible as a linear combination of *h _{t}*,

7.9 Determine the running time of Shellsort for

7.10 Show how heapsort processes the input 142, 543, 123, 65, 453, 879, 572, 434, 111, 242, 811, 102.

7.11 a. What is the running time of heapsort for presorted input?

7.12 Sort 3, 1, 4, 1, 5, 9, 2, 6 using mergesort.

7.13 How would you implement mergesort without using recursion?

7.14 Determine the running time of mergesort for

7.15 In the analysis of mergesort, constants have been disregarded. Prove that the number of comparisons used in the worst case by mergesort is *n*log *n* - 2log *n*^{} + 1.

7.16 Sort 3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5 using quicksort with median-of-three partitioning and a cutoff of 3.

7.17 Using the quicksort implementation in this chapter, determine the running time of quicksort for

7.18 Repeat Exercise 7.17 when the pivot is chosen as

b. the largest of the first two nondistinct keys

*d. the average of all keys in the set

7.19 a. for the quicksort implementation in this chapter, what is the running time when all keys are equal?

7.20 Suppose we choose the middle key as pivot. Does this make it unlikely that quicksort will require quadratic time?

7.21 Construct a permutation of 20 elements that is as bad as possible for quicksort using median-of-three partitioning and a cutoff of 3.

7.22 Write a program to implement the selection algorithm.

7.23 Solve the following recurrence: .

7.24 A sorting algorithm is *stable* if elements with equal keys are left in the same order as they occur in the input. Which of the sorting algorithms in this chapter are stable and which are not? Why?

7.25 Suppose you are given a sorted list of *n* elements followed by â(*n*) randomly ordered elements. How would you sort the entire list if

*^{d. How large can â(n) be for the entire list still to be sortable in O(n) time?
}

7.26 Prove that any algorithm that finds an element *x* in a sorted list of *n* elements requires (log *n* ) comparisons.

7.27 Using Stirling's formula, , give a precise estimate for log *n *!.

7.28 *a. In how many ways can two sorted arrays of *n* elements be merged?

7.29 Prove that sorting *n* elements with integer keys in the range 1 *key * *m* takes *O*(*m + n*) time using bucket sort.

7.30 Suppose you have an array of *n* elements containing only two distinct keys, *true *and *false.* Give an *O(n*) algorithm to rearrange the list so that all *false *elements precede the *true* elements. You may use only constant extra space.

7.31 Suppose you have an array of *n* elements, containing three distinct keys, *true*, *false*, and *maybe*. Give an *O*(*n*) algorithm to rearrange the list so that all *false* elements precede *maybe* elements, which in turn precede *true* elements. You may use only constant extra space.

7.32 a. Prove that any comparison-based algorithm to sort 4 elements requires 5 comparisons.

b. Give an algorithm to sort 4 elements in 5 comparisons.

7.33 a. Prove that 7 comparisons are required to sort 5 elements using any comparison-based algorithm.

*^{b. Give an algorithm to sort 5 elements with 7 comparisons.
}

7.34 Write an efficient version of Shellsort and compare performance when the following increment sequences are used:

7.35 Implement an optimized version of quicksort and experiment with combinations of the following:

a. Pivot: first element, middle element, random element, median of three, median of five.

b. Cutoff values from 0 to 20.

7.36 Write a routine that reads in two alphabetized files and merges them together, forming a third, alphabetized, file.

7.37 Suppose we implement the median of three routine as follows: Find the median of *a*[*left*], *a*[*center*], *a*[*right*], and swap it with *a*[*right*]. Proceed with the normal partitioning step starting *i *at *left* and *j *at *right* - 1 (instead of *left* + 1 and *right* - 2). Assume that *a* [0] = *MIN_DATA*, so that sentinels are present.

b. Suppose the input is in reverse order. What is the running time of this version of quicksort?

7.38 Prove that any comparison-based sorting algorithm requires (*n* log *n*) comparisons on average.

7.39 Consider the following strategy for *percolate*_*down*. We have a hole at node *X*. The normal routine is to compare *X*'s children and then move the child up to *X* if it is larger (in the case of *a* (*m*ax)heap) than the element we are trying to place, thereby pushing the hole down; we stop when it is safe to place the new element in the hole. The alternate strategy is to move elements up and the hole down as far as possible, without testing whether the new cell can be inserted. This would place the new cell in a leaf and probably violate the heap order; to fix the heap order, percolate the new cell up in the normal manner. Write a routine to include this idea, and compare the running time with a standard implementation of heapsort.

7.40 Propose an algorithm to sort a large file using only two tapes.

Knuth's book [10] is a comprehensive, though somewhat dated, reference for sorting. Gonnet and Baeza-Yates [4] has some more recent results, as well as a huge bibliography.

The original paper detailing Shellsort is [21]. The paper by Hibbard [5] suggested the use of the increments 2* ^{k }*- 1 and tightened the code by avoiding swaps. Theorem 7.4 is from [12]. Pratt's lower bound, which uses a more complex method than that suggested in the text, can be found in [14]. Improved increment sequences and upper bounds appear in [9], [20], and [23]; matching lower bounds have been shown in [24]. A recent unpublished result by Poonen shows that no increment sequence gives an

Heapsort was invented by Williams [25]; Floyd [1] provided the linear-time algorithm for heap construction. The analysis of its average case has only recently been obtained [15].

An exact average-case analysis of mergesort has been claimed in [3]; the paper detailing the results is forthcoming. An algorithm to perform merging in linear time without extra space is described in [8].

Quicksort is from Hoare [6]. This paper analyzes the basic algorithm, describes most of the improvements, and includes the selection algorithm. A detailed analysis and empirical study was the subject of Sedgewick's dissertation [19]. Many of the important results appear in the three papers [16], [17], and [18].

Decision trees and sorting optimality are discussed in Ford and Johnson [2]. This paper also provides an algorithm that almost meets the lower bound in terms of number of comparisons (but not other operations). This algorithm was eventually shown to be slightly suboptimal by Manacher [11].

External sorting is covered in detail in [10]. Stable sorting, described in Exercise 7.24, has been addressed by Horvath [7].

1. R. W. Floyd, "Algorithm 245: Treesort 3," *Communications of the ACM 7 *(1964), 701.

2. L. R. Ford and S. M. Johnson, "A Tournament Problem," *American Mathematics Monthly *66 (1959), 387-389.

3. M. Golin and R. Sedgewick, "Exact Analysis of Mergesort,"* Fourth* *SIAM Conference on Discrete Mathematics*, 1988.

4. G. H. Gonnet and R. Baeza-Yates,* Handbook of Algorithms* *and Data Structures*, second edition, Addison-Wesley, Reading, MA, 1991.

5. T. H. Hibbard, "An Empirical Study of Minimal Storage Sorting," *Communications of the ACM* 6 (1963), 206-213.

6. C. A. R. Hoare, "Quicksort," *Computer Journal *5 (1962), 10-15.

7. E. C. Horvath, "Stable Sorting in Asymptotically Optimal Time and Extra Space," *Journal of the* ACM 25 (1978), 177-199.

8. B. Huang and M. Langston, "Practical In-place Merging," *Communications of the ACM *31 (1988), 348-352.

9. J. Incerpi and R. Sedgewick, "Improved Upper Bounds on Shellsort," *Journal of Computer and System Sciences* 31 (1985), 210-224.

10. D. E. Knuth,* The Art of Computer Programming. Volume 3: Sorting and Searching, *second printing, Addison-Wesley, Reading, MA, 1975.

11. G. K. Manacher, "The Ford-Johnson Sorting Algorithm Is Not Optimal,"* Journal of the ACM* 26 (1979), 441-456.

12. A. A. Papernov and G. V. Stasevich, "A Method of Information Sorting in Computer Memories,"* Problems of Information Transmission* 1 (1965), 63-75.

13. C. G. Plaxton and T. Suel, "Improved Lower Bounds for Shellsort," *Proceedings of the Thirty-third Annual IEEE Symposium on the Foundations of Computer Science, *(1992).

14. V. R. Pratt, *Shellsort and Sorting Networks*, Garland Publishing, New York, 1979. (Originally presented as the author's Ph.D. thesis, Stanford University, 1971.)

15. R. Schaffer and R. Sedgewick, "The Analysis of Heapsort," *Journal of Algorithms,* to appear.

16. R. Sedgewick, "Quicksort with Equal Keys," *SIAM Journal on Computing *6 (1977), 240-267.

17. R. Sedgewick, "The Analysis of Quicksort Programs," *Acta Informatica *7 (1977), 327-355.

18. R. Sedgewick, "Implementing Quicksort Programs,"* Communications of the ACM* 21 (1978), 847-857.

19. R. Sedgewick,* Quicksort,* Garland Publishing, New York, 1978. (Originally presented as the author's Ph.D. thesis, Stanford University, 1975.)

20. R. Sedgewick, "A New Upper Bound for Shellsort," *Journal of Algorithms *2 (1986), 159-173.

21. D. L. Shell, "A High-Speed Sorting Procedure," *Communications of the ACM* 2 (1959), 30-32.

22. M. A. Weiss, "Empirical Results on the Running Time of Shellsort," *Computer Journal *34 (1991), 88-91.

23. M. A. Weiss and R. Sedgewick, "More On Shellsort Increment Sequences," *Information Processing Letters* 34 (1990), 267-270.

24. M. A. Weiss and R. Sedgewick, "Tight Lower Bounds for Shellsort," *Journal of Algorithms* 11 (1990), 242-251.

25. J. W. J. Williams, "Algorithm 232: Heapsort," *Communications of the ACM* 7 (1964), 347-348.

26. A. C. Yao, "An Analysis of (*h*, *k*, 1) Shellsort," *Journal of Algorithms* 1 (1980), 14-50.