Data Structures and Algorithms: CHAPTER 8: Sorting

What modifications to the quicksort algorithm of Fig. 8.14 do we have to make to avoid infinite loops when there is a sequence of equal elements?
Show that the modified quicksort has average-case running time O(n log n).

8.1	Here are eight integers: 1, 7, 3, 2, 0, 5, 0, 8. Sort them using (a) bubblesort, (b) insertion sort, and (c) selection sort.
8.2	Here are sixteen integers: 22, 36, 6, 79, 26, 45, 75, 13, 31, 62, 27, 76, 33, 16, 62, 47. Sort them using (a) quicksort, (b) insertion sort, (c) heapsort, and (d) bin sort, treating them as pairs of digits in the range 0-9.
8.3	The procedure Shellsort of Fig. 8.26, sometimes called diminishing-increment sort, sorts an array A[1..n] of integers by sorting n/2 pairs (A[i],A[n/2 + i]) for 1 ?/FONT> i ?/FONT> n/2 in the first pass, n/4 four-tuples (A[i], A[n/4 + i], A[n/2 + i], A[3n/4 + i]) for 1 ?/FONT> i ?/FONT> n/4 in the second pass, n/8 eight-tuples in the third pass, and so on. In each pass the sorting is done using insertion sort in which we stop sorting once we encounter two elements in the proper order. procedure Shellsort ( var A: array[l..n] of integer ); var i, j, incr: integer; begin incr := n div 2; while incr > 0 do begin for i := incr + l to n do begin j := i - incr; while j > 0 do if A[j] > A[j + incr] then begin swap(A[j], A[j + incr]); j := j - incr end else j := 0 { break } end; incr := incr div 2 end end; { Shellsort } Fig. 8.26. Shellsort. Sort the sequences of integers in Exercises 8.1 and 8.2 using Shellsort. Show that if A[i] and A[n/2^k + i] became sorted in pass k (i.e., they were swapped), then these two elements remain sorted in pass k + 1. The distances between elements compared and swapped in a pass diminish as n/2, n/4, . . . , 2, 1 in Fig. 8.26. Show that Shellsort will work with any sequence of distances as long as the last distance is 1. Show that Shellsort works in O (n^1.5) time.
*8.4	Suppose you are to sort a list L consisting of a sorted list followed by a few "random" elements. Which of the sorting methods discussed in this chapter would be especially suitable for such a task?
*8.5	A sorting algorithm is stable if it preserves the original order of records with equal keys. Which of the sorting methods in this chapter are stable?
*8.6	Suppose we use a variant of quicksort where we always choose as the pivot the first element in the subarray being sorted.
8.7	Show that any sorting algorithm that moves elements only one position at a time must have time complexity at least W(n²).
8.8	In heapsort the procedure pushdown of Fig. 8.17 establishes the partially ordered tree property in time O(n). Instead of starting at the leaves and pushing elements down to form a heap, we could start at the root and push elements up. What is the time complexity of this method?
*8.9	Suppose we have a set of words, i.e., strings of the letters a - z, whose total length is n. Show how to sort these words in O(n) time. Note that if the maximum length of a word is constant, binsort will work. However, you must consider the case where some of the words are very long.
*8.10	Show that the average-case running time of insertion sort is W(n²).
*8.11	Consider the following algorithm randomsort to sort an array A[1..n] of integers: If the elements A[1], A[2], . . . , A[n] are in sorted order, stop; otherwise, choose a random number i between 1 and n, swap A[1] and A[i], and repeat. What is the expected running time of randomsort?
*8.12	We showed that sorting by comparisons takes W(n log n) comparisons in the worse case. Prove that this lower bound holds in the average case as well.
*8.13	Prove that the procedure select, described informally at the beginning of Section 8.7, has average case running time of O(n).
*8.14	Implement CONCATENATE for the data structure of Fig. 8.19.
*8.15	Write a program to find the k smallest elements in an array of length n. What is the time complexity of your program? For what value of k does it become advantageous to sort the array?
*8.16	Write a program to find the largest and smallest elements in an array. Can this be done in fewer than 2n - 3 comparisons?
*8.17	Write a program to find the mode (the most frequently occurring element) of a list of elements. What is the time complexity of your program?
*8.18	Show that any algorithm to purge duplicates from a list requires at least W(n log n) time under the decision tree model of computation of Section 8.6.
*8.19	Suppose we have k sets, S₁, S₂, . . . , S_k, each containing n real numbers. Write a program to list all sums of the form s₁ + s₂ + . . . + s_k, where s_i is in S_i, in sorted order. What is the time complexity of your program?
*8.20	Suppose we have a sorted array of strings s₁, s₂, . . . , s_n. Write a program to determine whether a given string x is a member of this sequence. What is the time complexity of your program as a function of n and the length of x?

Bibliographic Notes

Knuth [1973] is a comprehensive reference on sorting methods. Quicksort is due to Hoare [1962] and subsequent improvements to it were published by Singleton [1969] and Frazer and McKellar [1970]. Heapsort was discovered by Williams [1964] and improved by Floyd [1964]. The decision tree complexity of sorting was studied by Ford and Johnson [1959]. The linear selection algorithm in Section 8.7 is from Blum, Floyd, Pratt, Rivest, and Tarjan [1972].

Shellsort is due to Shell [1959] and its performance has been analyzed by Pratt [1979]. See Aho, Hopcroft, and Ullman [1974] for one solution to Exercise 8.9.

† Technically, quicksort is only O(n log n) in the average case; it is O(n²) in the worst case.

‡We could copy A[i], . . . ,A[j] and arrange them as we do so, finally copying the result back into A[i], . . . ,A[j]. We choose not to do so because that approach would waste space and take longer than the in-place arrangement method we use.

† If there is reason to believe nonrandom orders of elements might make quicksort run slower than expected, the quicksort program should permute the elements of the array at random before sorting.

† Since we only want the median, not the entire sorted list of k elements, it may be better to use one of the fast median-finding algorithms of Section 8.7.

† We could at the end, reverse array A, but if we wish A to end up sorted lowest first, then simply apply a DELETEMAX operator in place of DELETEMIN, and partially order A in such a way that a parent has a key no smaller (rather than no larger) than its children.

† In fact, this time is O(n), by a more careful argument. For j in the range n/2 to n/4+1, (8.8) says only one iteration of pushdown's while-loop is needed. For j between n/4 and n/8+1, only two iterations, and so on. The total number of iterations as j ranges between n/2 and 1 is bounded by

Note that the improved bound for lines (1 - 2) does not imply an improved bound for heapsort as a whole; all the time is taken in lines (3 - 5).

† Note that a sequence ranging from 1 to 0 (or more generally, from x to y, where y < x) is deemed to be an empty sequence.

† But in this case, if log n-bit integers can fit in one word, we are better off treating keys as consisting of one field only, of type l..n, and using ordinary binsort.

† We may as well assume all keys are different, since if we can sort a collection of distinct keys we shall surely produce a correct order when some keys are the same.

† In the extreme case, when all keys are equal, the pivot provides no separation at all. Obviously the pivot is the k^th element for any k, and another approach is needed.

Sorting

8.1 The Internal Sorting Model

8.2 Some Simple Sorting Schemes

Insertion Sorting

Selection Sort

Time Complexity of the Methods

Counting Swaps

Limitations of the Simple Algorithms

8.3 Quicksort

The Running Time of Quicksort

Average Case Analysis of Quicksort

Improvements to Quicksort

8.4 Heapsort

Analysis of Heapsort

8.5 Bin Sorting

Analysis of Binsort

Sorting Large Key Sets

General Radix Sorting

Analysis of Radix Sort

8.6 A Lower Bound for Sorting by Comparisons

Decision Trees

The Size of Decision Trees

The Average Case Analysis

8.7 Order Statistics

A Quicksort Variation

A Worst-Case Linear Method for Finding Order Statistics

The Case Where Some Equalities Among Keys Exist

Exercises

Bibliographic Notes