CHAPTER 8: INTRODUCTION TO SEARCHING AND SORTING

Considers how collections of information may be stored to allow

efficient searches

efficient traversals

Develops efficient in-memory sorting algorithms

to show how sorting is used to facilitate searching

Worst-case and average times are developed for

linear search

binary search

maximum entry sort

bubble sort

insertion sort

heapsort

quicksort

Illustrates the use of simulation in analyzing the execution time of a program

8.1: Overview

So far in this book, arrays, lists, binary trees, and trees--the basic data sructures--have been introduced and applied to a variety of problems. Selecting the appropriate data structure requires knowing what operations are to be performed on it. The frequency with which these operations are to be performed, the characteristics of the data to be stored, and the average and worst-case time and storage requirements all combine to determine this choice. Other considerations, such as clarity and adaptability, may also serve as criteria for deciding which data structures to use. It is rarely possible to pick a data structure that is best for all criteria. Instead, trade-offs must be made.

In the case study of family relationships in Section 6.5, one of the important operations was a search of the nametable data base. At that time no decisions were made as to how the nametable data base would be structured to facilitate the search. This chapter and the next will show that manyimplementations are possible and that the appropriate choice is determined by the kinds of operations to be performed on the data base. Nametable might be organized in many ways. It can be implemented as an array, a list, a binary search tree, a balanced binary search tree, or a hash table. The techniques used to create and search these structures are introduced in this and the next chapter.

Dear Ann Landers:

Why is it that whenever I lose anything, it is always

in the last place I look?

--Dizzy Lizzy

Dear Liz:

Because when you find it you stop looking--and

that's the last place you looked.

Although searching and sorting may appear mundane, the speed with which they can be carried out largely determines whether a particular application of the computer is practical (and "cost-effective'") or not. Estimates indicate that about one-fourth of the processing time in computer centers is spent just sorting. The relation between sorting and searching will become apparent in this chapter. If one can be done faster, so can the other. The choice of data structures is always important in algorithm design and is the key to the evolution of good (that is, efficient) searching and sorting procedures. The basic data structures studied in the preceding chapters (arrays, lists, and binary trees) are used in the design of efficient procedures.

The chapter first explores simple linear, binary, and interpolation searches and a number of elementary sorting algorithms, such as the bubble sort and the insertion sort. These lead to more advanced methods for sorting based on a special binary tree called a heap. Next, an application of recursion yields another fast sort, quicksort.

Some of the algorithms developed in the text are fairly complex and difficult, if not impossible, to analyze mathematically. For this reason the last topic considered in the chapter is the use of simulation as a tool in discovering the behavior and efficiency of such complex algorithms.

8.2: Elementary Searches

8.2.1 Linear Search

* To simplify the explanation in this chapter, the 0th array position will be ignored, and data stored in arrays will start in position 1 unless otherwise noted.

Figure 8.1 An Array of Integers to Be Searched for the Key Value 25

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recordarray[MAX];`

`linearsearch(data,n,key,pfound,ploc)`

`/* Searches the n records stored in`

`the array data for the search key.`

`Found will be true only if a record`

`is present with key field equal to`

`the search key value and then loc`

`will contain its array index.`

`*/`

`recordarray data;`

`int n,key,*pfound,*ploc;`

`{`

`*ploc = 1;`

`while ((*ploc <= n) && (data[*ploc].key != key))`

`(*ploc)++;`

`*pfound = (*ploc < n + 1);`

`}`

A special but common case is when the search is always for a key known to be stored in the array, and each key stored is distinct and equally likely to be the search key. This is like standing in front of a locked door with a bunch of keys. You need to find the key that opens the door from among n keys in arbitrary order on your key-ring. The average time for such a search would be O(n/2). That is, on the average, you would look at half the keys, or half the elements of the array, before finding the right one.

Thus the procedure called linear search is so named, not only because it proceeds linearly through an array, but also because its processing time increases linearly with n; when n is doubled, the time required to complete the search is doubled.

8.2.2 Saving Time: A Neat Trick

Occasionally programming tricks can be used to enhance efficiency. In this case paying attention to some details produces a trick that can pay off by saving processing time. The straightforward implementation of the linear search requires a test (loc n) for the end of the array before checking the next element. This is true whether the implementation is a for loop or a while loop. However, it is possible to avoid performing this test.

Place the search key in element n + 1 of data. Then, in implementing the linear search, there is no need to test for the end of the array before checking the next element. Since the array is now considered to be (n + 1) elements in length, we are certain to find the search key before we "fall off the end." After the loop is exited, note the element in which key was found, and test to see if it was the (n + 1)th element, which means that the search was unsuccessful. Thus only one test (data[loc].key ! = key) must be performed. This saves n tests in unsuccessful searches, and i tests in successful searches, when the search key appears in element i of data. This simple modification of the basic linear search can easily produce significant time savings (20 to 50 percent). In general, whenever loops are modified to reduce the tests or processing they perform, execution time is saved each time through the loop. The more efficient linear search implementation appears as follows:

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recordarray[MAX];`

`linearsearch(data,n,key,pfound,ploc)`

`/* Searches the n records stored in`

`the array data for the search key.`

`Found will be true only if a record`

`is present with key field equal to`

`the search key value and then loc`

`will contain its array index.`

`*/`

`recordarray data;`

`int n,key,*pfound,*ploc;`

`{`

`*ploc = 1;`

`data[n + 1].key = key;`

`while (data[*ploc].key != key)`

`(*ploc)++;`

`*pfound = (*ploc < n + 1);`

`}`

Another way to save time in a linear search is to order the records in the array. One way to do this is to store them in decreasing order of key value. The linear search can then be modified by terminating the search as soon as a record is reached whose key value is less than the search key. The desired record cannot be in the array past that point. Although this results in extra comparisons for successful searches, time is saved overall when unsuccessful searches occur frequently enough.

It is also possible, if the frequency with which records will be retrieved is known, to store the records in decreasing order of retrieval frequency. The most frequently retrieved record will then be at the top of the array. This approach significantly improves search efficiency when the desired record is likely to be near the top of the array. In fact, if the frequencies decrease rapidly enough, the average search time for the linear search will be a small constant.

8.2.3 Binary Search

A binary search is also a good strategy for winning the game of twenty questions, which requires guessing the identity of an object after finding out if it is in the animal, the mineral, or the vegetable category. Twenty question with "yes" or "no" answers are allowed. Selecting each question so that about half the remaining possibilities are eliminated, no matter what the answer, is best. (For example, given the animal category, "Is the object male or female"?) In this way 220, or over a million, objects can be distinguished.

In twenty questions it is difficult to select a question that makes close to a 50-50 split of the remaining possibilities. However, if you are asked to determine a number known to lie between 1 and 1024 by asking yes-no questions, it is easy to do so. Ask, "Is it greater than 512?" and continue splitting the remaining possibilities in half. No more than 10 such questions are needed to find the number from among the 210 (1024) initial possibilities. Similarly, when searching for a specific key value among 2n records, at most n key comparisons are needed to determine whether it is present, as long as each comparison splits the remaining collection of records into two parts that differ in size by at most one.

Figure 8.2 Array of Integers Sorted in Decreasing Order

When searching data stored in arbitrary order in an array, there is no reason to expect the search key to be in any particular region of the array. However, when the data have been stored in sorted order, the situation is different. It is possible to check the middle element to see if the search key is there. If it is, the program would note its location and terminate the search. If the key is not there, then if the search key is greater than the key in the middle element, eliminate the middle element and all elements below it from consideration. If the desired record is in data at all, it must be above the middle element. Similarly, if the search key is less than the key in the middle element, eliminate the middle element and all elements above it. In any event, at most half the elements remain to be considered, and the same procedure can then be applied to the appropriate half: either elements 1 to [mid-1] or elements [mid+1] to n. Mid is a variable that points to the middle element.

For example, if mid is 10, then the search key value, 25, is not greater than data[mid], 78. Hence, if it is in data at all, it must be between positions mid+1 and n--11 and 20. Taking 15 as the middle element of these, the new mid value is 15. Key is less than data [mid], 32. Hence we need to search between mid+1 and 20-16 and 20. With a new mid value of 18, key is greater than data [mid], 13. Now search only between 16 and 17. The next mid value is 16, and the key is found.

The basic structure of the binary search algorithm is a loop, as shown in the following function.

`#define TRUE 1`

`#define FALSE 0`

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recorddarray[MAX];`

`binarysearch(data,n,key,pfound,ploc)`

`/* Searches the n records stored in order by`

`key field value(largest first) in the array`

`data for the search key. Found will be true`

`only if a record is present with key field`

`equal to the search key value and then loc`

`will contain its array index.`

`*/`

`recordarray data;`

`int n,key,*pfound,*ploc;`

`{`

`int top, mid,bottom;`

`top = 1;`

`bottom = n;`

`*pfound = FALSE;`

`*ploc = 0;`

`while ((top <= bottom) && !(*pfound))`

`{`

`mid = (top + bottom)/2`

`if (data[mid].key == key)`

`{`

`*pfound = TRUE`

`*ploc = mid;`

`}`

`else if (data[mid].key < key)`

`bottom = mid - 1;`

`else`

`top = mid + 1;`

`}`

`}`

Mid is initialized to the middle element of data, top to 1, and bottom to n. Top and bottom are variables that point to the top and bottom elements of the consecutive elements left to be searched, all others having been eliminated from consideration. In the loop body, test for key equal to data [mid]. If the equality is true, the program exits the loop and the search is done, since key has been found at element mid. If not, top and bottom must be updated to reflect the new sequence of elements that are to be searched. This updating must set bottom to mid whenever key is greater than data [mid], and top to mid + 1 otherwise. The test for continuing the loop involves checking whether key was found, and whether any elements are left to be searched between top and bottom. The search is terminated as soon as the desired entry is found or as soon as it is concluded that the entry is not present. Loc will be zero if the key value was not found and will index its location in data if found.

8.2.4 Timing the Binary Search

In each pass through the loop, the search involves at most half the elements that were under consideration during the preceding pass through the loop. Starting with n elements, after one pass through the loop at most 1/2n elements are left, after two passes through the loop at most 1/2(1/2)n elements, after three passes at most 1/2(1/2)2n elements, and after k passes at most 1/2(1/2)k-1n or 1/2kn elements. This number, 1/2kn, is less than 1 when -k + 1g n < 0 or when k > 1g n. This means that the loop cannot be executed more than lg n times. If n is 20, for example, then 1g 20 = 4.32, and 4.32 is 5. Thus in the worst case, the binary search is O(1gn). If each record stored is searched with equal frequency, then the average time to find a stored record is actually about (1g n) - 1.(The proof is not given here.)

8.2.6 Efficiency Comparisons

Of course the binary and extrapolation searches require sorted data. When only a few searches are to be made or when n is small, it may not pay to spend time to sort the data first. For repeated searching or large n, the efficiencies of the binary and extrapolation searches easily overcome the cost of sorting.

8.3: Elementary Sorts

8.3.1 Maximum Entry Sort

Perhaps the most straightforward way to sort numbers in an array into decreasing* order is to begin by placing the largest number in element 1. The next largest number is then placed into element 2, and so on. The maximum entry sort repeatedly finds the next largest array entry and places it in its proper position in the array. Suppose the point has been reached in this procedure where the k largest numbers have been properly placed into the first k elements of the array data. Assume a variable next points to the element of data into which the next largest element is to be placed. Next must now be updated to next + 1 so that it contains the value k + 1, as indicated in Figure 8.3.

* The sorting programs assume that decreasing order is desired. They can easily be modified to sort in increasing order.

Assume that a function maxloc exists, with parameters data, next, and n, the array length. Maxloc returns with its value indexing the largest element between the nextth and the nth elements of data. The sort may then be obtained by a loop in which maxloc is invoked (n - 1 times, with next taking on values 1, 2, . . . , n - 1, respectively, on each call. Within the loop, when each call to maxloc is made, an exchange between data [next] and data[maxloc] occurs.

Maxloc might be implemented by traversing the elements next to n of data. Each time a new element is accessed, it is compared to a variable, largest, containing the largest entry seen so far in this traversal. If the new element is greater than largest, largest is set to the new element value, and maxloc is set to point to the location of the new element in data.

This algorithm is probably similar to what most peole do instinctively when confronted with a small sorting problem. It may be implemented as in the procedure maxentrysort.

Figure 8.3 Ready to Determine the Next Largest

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recordarray[MAX];`

`maxloc (data,next,n)`

`/* Returns the array index of the`

`largest entry in data between the`

`entries indexed by next and n.`

`*/`

`recordarray data;`

`int next,n;`

`{`

`int largest,i,tmaxloc;`

`tmaxloc = next;`

`largest = data[next].key;`

`for (i = next + 1; i <= n; i++)`

`if (data[i].key > largest)`

`{`

`tmaxloc = i;`

`largest = data[i].key;`

`return (tmaxloc);`

`}`

`}`

`maxentrysort (data, n)`

`/* Sorts the n records stored in array data`

`in descending order.`

`*/`

`recordarray data;`

`int n;`

`{`

`int next, loc;`

`for (next=1; next<=n - 1; next++)`

`{`

`loc = maxloc (data,next,n);`

`interchange (data, next, loc);`

`}`

`}`

`interchange (data, i, j)`

`/* Interchanges the entries of`

`array data indexed by i and j.`

`*/`

`recordarray data;`

`int i,j;`

`{`

`record tempdata;`

`tempdata.key = data[i].key;`

`data[i].key = data[j].key;`

`data[j].key = tempdata.key;`

`}`

How much time does this implementation of the maximum entry sort take? One interchange is made on each call to maxloc for a total of n - 1 interchanges. The number of comparisons is (n - k) on the call to maxloc, when next has the value k. The total number of comparisons is (n - 1) + (n - 2) + . . . + 2 + 1, which sums to [n(n - 1)]/2. Thus the time is determined by the time of the (n - 1) interchanges plus the time of the [n(n - 1)]/2 comparisons. Each interchange takes a constant amount of time, and each comparison takes, at most, a constant amount of time. The result is O(n2) time. This means that if n is doubled or tripled, the time will increase by a factor of about 22, or 4, and 32, or 9, respectively, when n is large. For n = 105, the time is roughly proportional to 1/2 1010, which is within a factor of 100 of 1013, the guide for feasibility presented in Chapter 1. Quicker algorithms are obviously needed for large n, and even perhaps for small n. The next elementary sort technique offers some improvement.

8.3.2 Bubble Sort

In the example, after pass 1, the array looks like Figure 8.4(b). Since at least one interchange was made, another pass is made, yielding Figure 8.4(c). Again, at least one interchange was made, so the program must make another pass.

Will this process continue forever making passes, or eventually stop? Clearly, once it stops since no interchanges were made on the last pass, the array is sorted. If any number were out of order, at least one interchange would have been made during that pass.

Notice that after the first pass the smallest entry, 3, was at the bottom of the array. After the second pass the two smallest entries, 10 and 3, were in order at the bottom of the array. This was not accidental. Notice that the two smallest entries will never be moved, and the third smallest element (13 in this case) will be encountered on the next pass. Once it is encountered, it will always be interchanged with the adjacent lower record to which it is compared. It will eventually come to rest just above the second smallest entry.

Figure 8.4 Bubble Sort

`#define TRUE 1`

`#define FALSE 0`

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recordarray[MAX];`

`bubblesort (data, n)`

`/* Sorts the n records stored in array data`

`in descending order.`

`*/`

`recordarray data;`

`int n;`

`{`

`int i,done;`

`done = FALSE;`

`while (!done)`

`{`

`done = TRUE;`

`for (i=1;i<=n-1;i++)`

`if (data [i +1].key > data [i].key)`

`{`

`interchange (data, i, i + 1);`

`done = FALSE;`

`}`

`}`

`}`

8.3.4 Insertion Sort

In this example the same array, data, is used as for the bubble sort. Each record is processed in turn, starting from the second. Figure 8.5(a) represents the situation after processing of the first six records. The seventh is yet to be processed. It is processed, as are all the other records, by assuming the preceding records are already in sorted order. Notice that records 1-6 are in order. To insert the seventh record, 75, into its proper place among the first seven records, compare record 7 with its predecessor, record 6. Since 75 exceeds 10, it should go somewhere higher up in the array, so compare it to the predecessor of record 6, record 5. Since 75 exceeds 32, compare it to the predecessor of record 5, record 4.

Figure 8.5 Insertion Sort

Again, 75 exceeds that element's value of 32, so compare it to its predecessor's value, 65. The last comparison is with 78. Since 75 does not exceed 78, record 7 must be inserted between record 2, where 78 is now, and record 3. To do this, move records 3 to 6 down one element, respectively, and place 75 in record 3. The result is Figure 8.5(b).

K has been updated by 1, so it points to the next record to be inserted. This algorithm for an insertion sort can be described by a loop structure in which k varies from 2 to n, as in the following function.

`#define TRUE 1`

`#define FALSE 0`

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recordarray [MAX];`

`insertionsort (data, n)`

`/* Sorts the n records stored in array data`

`in descending order.`

`*/`

`recordarray data;`

`int n;`

`{`

`int k, i, inserted;`

`for (k=2;k<=n; k++)`

`{`

`i = k;`

`inserted = FALSE;`

`while ((i > 1) && !inserted)`

`if (data[i - l].key < data[i].key)`

`{`

`interchange(data,i,i - 1);`

`i--;`

`}`

`else`

`inserted = TRUE;`

`}`

`}`

8.3.5 Timing the Insertion Sort

In the three implementations for the sorting algorithms, it was assumed that the interchanges taking place were actually interchanges of records. When records are large, it takes appreciably more time to move a record than it does to move a single pointer to the record. Consequently significant amounts of time can be saved by keeping an array, p, of indexes or pointers to the records. The entries of this array can then be kept in order, so that p[i] indexes or points to the ith record in sorted order. Interchanging indexes or pointers takes less time than for records but uses more memory (which is an added expense). This is another example of trading time for storage. In the procedures shown for maximum entry sort, bubble sort, and insertion sort, the interchange operations would then apply to the array p, and the comparison operations would apply to the actual records indexed or pointed to by i.

8.3.6 Attempted Improvements

What else can be tried to improve the insertion sort? Aha, lists! Recall that lists are convenient for insertions, so simply represent the records in a list rather than an array. Then, once you know where to insert a record, merely change two pointers. Thus the 1 + 2 + + (n - 1) or [n(n - 1)]1/2 insertion time is reduced to time O(n). The list, coupled with the binary search reduction in comparison time to n 1g n, seems to give a new implementation of the insertion sort with worst-case time n 1g n. The fly in this ointment is that the ability to do a binary search has disappeared. It is no longer possible to locate the middle record by selection, which is what makes the binary search with sorted arrays so fast. Just the first record's location is known; the middle record can be found only by traversing the list and counting records until the middle one is reached.

Figure 8.6 Selection of Records by Means of a Pointer Array

Wait! We are not yet defeated in our quest for a faster insertion sort. Use a pointer array, assuming enough storage is available, to tell where all the records are on the list. This allows selection of the list records as in Chapter 2. The situation is then as shown in Figure 8.6.

Numberlist is the head of the list of records in the illustration. Suppose the first six records have been properly processed as in the insertion sort, and the seventh is to be processed next. A binary search can now be done, since the middle record, among 1 to 6, will be indexed by the middle entry of elements 1 to 6 of the pointer array. Thus it is possible, in 1g 6 time, to determine where the seventh record is to be inserted. The insertion will require the changing of two pointers. But here's the rub! To maintain the pointer array, it must have its seventh element inserted in the corresponding position of the pointer array. We have merely succeeded in transferring the insertion problem to the pointer array; we have not eliminated it!

Although these attempts to find a faster sorting algorithm have so far not been successful, they provide insight into the binary search and sorting problems. First, notice that except for the binary search, all the searching and sorting algorithms so far considered may be applied to records stored in lists as well as in arrays. Observe that to achieve both a search in binary search time (lg n) and fast insertions and deletions, a linked type structure is needed, as well as some way to eliminate from consideration a significant fraction of the records. A second observation is that the original insertion sort would be faster if the length of the array in which each insertion is made were shorter. We must somehow break up the array of records to be sorted, so that the individual insertions do not involve all of the previous records. These observations lead to the ideas behind heapsort and quicksort, two faster sorting algorithms.

8.4: Heapsort: A Faster Sort

8.4.1 Heapsorts

Figure 8.8(a) is a heap storing the same information. It is one of many heaps that can be created to store the information. Since the path from every node to the root is in sorted order, the record at the root of a heap must be as great as any other in the tree. The same holds for all subtrees. The record at the root of any subtree must be at least as great as any other record in the subtree. The root record may then be taken as the first record in a sorted output of the information stored in the tree.

Figure 8.8 Heaps

To perform a heapsort:

1. Create a heap.

2. While the heap is not empty

a. output the record at the root of the heap,

b. remove it from the heap, and

c. reheap.

As stated, the algorithm is not detailed enough. It does not specify how to create a heap (step 1) or how to reheap (step 2).

8.4.2 Creating a Heap

All other paths remain sorted, as before. Now compare 425 at node 10 with its predecessor. Since it exceeds 123, interchange again to obtain Figure 8.9(b). Finally, compare 425 with its predecessor's value, and interchange to obtain Figure 8.9(c).

At this point, when 425 is compared to its predecessor's value, the 425 has reached its proper place. All paths must now be in order, since the path from node10 to the root is in order, and the interchanges have not disturbed the order along any other path.

This gives us the means to create a heap:

Start with records stored in the complete binary tree and process each node in turn, starting with node 2 and ending with node n.

The processing of the ith node is exactly the same processing as was done on the new record 425. Since nodes 1 to i - 1 will already have been processed prior to processing node i, the binary tree consisting of nodes 1 to i - 1 will already be a heap. Processing node i then, in effect, treats the record at node i as a new record and creates a new heap. For the example of twenty records, this procedure leads to the heap of Figure 8.8(a).

8.4.3 Timing the Heap Creation

`d = lg2(n + 1)`

This is important. Since the heaps dealt with initially in our heapsort algorithm are complete, their depth is O(lg (n)).

We now analyze the time required to create a heap by this algorithm. Starting from a complete binary tree is important. For instance, if we started with the binary tree shown in Figure 8.10, this algorithm would be equivalent to the original insertion sort of Section 8.3.4.

The heap creation algorithm does a total amount of work determined by the sum of the number of comparisons and interchanges needed in processing each node. A node at depth k cannot require more than k comparisons and k interchanges in its processing. The worst-case time will be achieved for any initial tree in which the numbers stored at nodes 1, 2, 3, . . . , n form an increasing sequence. For instance, if the integer i is stored at node i, for i = 1, 2, . . . , n, we obtain such an increasing sequence. If n is 15, the tree will be as shown in Figure 8.11.

Figure 8.10 An Incomplete Binary Tree

It can be shown that the total time for such a tree to be turned into a heap by the algorithm is proportional to , where n is the number of nodes at depth k, d is the depth, and k is the number of comparisons and interchanges required for each node at depth k. This is because every record will work its way to the root of the tree when it is processed. Since nk = 2k-1, this sum will be proportional to n lg n. The efficiency of this method can be increased somewhat by first determining where a record to be processed will end up along its path to the root, and then shifting everything down before inserting it. This efficiency could also have been achieved with the insertion sort.

8.4.4 Better Heap Creation

For a tree of depth d, a better procedure for creating a heap is to start at node number 2d-1 - 1, the rightmost node at depth d - 1. Even better, start at n/2, the integer part of n/2, the first nonterminal node at depth d - 1. Process each node in turn, from node n/2 to node 1. The processing at each node i is to turn the tree with node i as its root into a heap. This can be done, assuming that its left and right subtrees are already heaps, as in the following example.

Consider the initial complete binary tree in Figure 8.12(a). Start at node 5 ( = 11/2) (83 is stored there) and compare its successors 94 and 7, determining that 94 is the larger. Compare 94 with 83; since 94 exceeds 83, interchange 94 and 83 to obtain Figure 8.12(b).

The subtree with node 5 (94) as root is now a heap. Move to node 4, and the similar comparisons reveal that it is already a heap. Move to node 3, compare its successors 29 and 30, and then compare the larger to 16. Since 30 exceeds 16, interchange 30 and 16 to obtain Figure 8.12(c).

The subtree of node 3 is a heap, so move to node 2. Compare its successors 57 and 94, and then compare the larger to 14. Interchange 94 and 14 to obtain Figure 8.12(d).

At this point, notice that all nodes processed so far must be the roots of subtrees that remain heaps, except perhaps for the root of the subtree, where 14 now appears after the interchange. This subtree need not be a heap and, in fact, is not in this case. You can make it a heap by applying the same process to the node where 14 is currently stored, node 5. Compare its two successors and then compare the larger of those with 14. Then interchange 14 and 83 to obtain Figure 8.12(e).

Figure 8.12

The subtree with root node 2 is now a heap, and the processing of node 2 is complete. Notice that all nodes processed so far are roots of subtrees that are heaps. The last node to be processed is now node 1. Compare its successors, 94 and 30. Since the larger, 94, exceeds 18, interchange to obtain Figure 8.12(f).

The 18 from the node being processed, node 1, has now moved down to node 2. Hence its subtree may no longer be a heap. Apply the process to node 2. The 18 can move down again and will continue to move down until it either becomes situated at a terminal node or becomes the root of a subtree that remains a heap. In this case, it should be compared with 83, the larger of its two successors, and interchanged to get Figure 8.12(g).

At this point, you can see that 18 is the root of a heap by comparing it to the larger of its two successors. Processing of node 1 has been completed, and the heap has been created. The heap obtained from the original example with this algorithm is Figure 8.12(h). Notice that this heap differs from that in Figure 8.8(a). As mentioned earlier, more than one heap can be constructed to store the same data.

Without analyzing this heap creation algorithm in detail, the result can be stated. This procedure is actually only O(n). This heap creation algorithm is faster than that of the last section because few nodes must move down long paths. In the other algorithm, many nodes may move up long paths. A use for this linear time heap creation algorithm will be demonstrated in another context in Section 9.2, Priority Queues.

8.4.5 Some Details of Heap Implementation

Notice that the first heap creation algorithm requires testing the record being inserted along a path to the root against its current predecessor. Before this can be done, the program must test for whether or not there is a predecessor--that is, whether the record has already reached the root. This test must precede every comparison. Similarly, in the second heap creation algorithm, the record being shifted down must be tested against its successors. Before doing this, it is necessary to test whether there are successors.

8.4.6 Reheaping

One is to allow the "gap" created at the root by removal of its record to shift down by comparing its two successors and interchanging the larger of the two successors with the gap, then repeating this process until the gap is at a terminal node. A convenient way to do this is to replace the record removed from the root by a "negative infinity" and invoke the shift down function referred to earlier for the second algorithm.

A second way to reheap is to replace the gap with the rightmost record at greatest depth. (Actually, any record with no successors will do.) The same shift function may be invoked to reheap.

Since the length of the longest path in a complete binary tree is O(lg n), the reheaping after each record is removed can take time at most O(lg n). Because n records will ultimately be removed, the total time required for phase II will be at most O(n lg n). For either implementation of phase I, the total heapsort time is O(n lg n). This was achieved by "spreading out" the records along paths of a complete binary tree, and, in effect, doing insertions only along the path from a record to the root.

8.4.7 An Implementation for Heapsort

A detailed version of the heapsort algorithm can now be developed. You have now seen a number of ways to create a heap and to reheap. We will use the second heap creation algorithm for phase I and the second reheaping algorithm for phase II. Both algorithms use the shift function.

To proceed we must decide on an implementation of the binary tree data structure. The binary tree used in heapsort is initially complete. The required processing involves predecessors and successors of nodes. Sequential representation for the binary tree is ideal, so we use it. The binary tree is stored in the data array and consists of n records.

Storage is needed for the sorted version of data. Although another array may seem necessary for this purpose, we will use only the data array. The algorithm, after creating a heap (in data), continually takes the root record as the next record in the sorted output and then reheaps. Before reheaping, the rightmost record at the lowest level of the heap is placed at the root. The storage vacated by that rightmost record will no longer be needed. We choose to store the output record there. You should convince yourself that this will result in the records appearing in data in reverse sorted order upon termination. To remedy this, we write shift so that it produces min heaps. The records will then appear in correct sorted order upon termination.

Shift has parameters data, root, and last. Root points to the root of a subtree, and last points to the node of the original heap that is currently the rightmost node at greatest depth. The nodes last+1 to n are currently storing the already sorted n-last output records. Whenever shift is invoked, the left and right subtrees of root (excluding nodes last+1 to n) are assumed to be heaps. Shift returns after reheaping, by allowing the root record to move down to its final position. This is done by first storing the key value and record of the initial root record in keyvalue and recordvalue, respectively. Ptr points to the node currently being considered as the final position for the initial root record. This will be the final position if either of the following two conditions is met.

Figure 8.13 (a) Initial Heap and (b) Heap after First Three Entries Have Been Output

1. The successor record of ptr with minimum key value has a key value greater than or equal to keyvalue, or

2. Ptr points to a terminal node or to a node whose successors are nodes among last+1 to n.

Condition 2 can be recognized by the test (succ<=last), where succ points to the left successor node of ptr. Condition 1 requires finding the minimum key value for the successors of ptr. Ptr may not have a right successor, or that successor may represent an output record. This can be tested for by the test (succ<last).

`#define MAX 100`

`typedef struct`

`{`

`int key;`

`}record;`

`typedef record recordarray[MAX];`

`heapsort (data,n)`

`/* Sorts the n records stored in array data`

`in descending order.`

`*/`

`recordarray data;`

`int n;`

`{`

`int root,last;`

`root = n/2;`

`last = n;`

`while (root > 0)`

`{`

`shift(data,root,last);`

`root--;`

`}`

`root = 1;`

`while (last > 1)`

`{`

`interchange(data,1,last);`

`last--;`

`shift(data,root,last);`

`}`

`}`

`shift (data,root,last)`

`/* Makes the subtree of root into`

`a heap (min heap). The left and`

`right subtrees of root must be`

`heaps. All subtrees are stored`

`in array data in sequential`

`representation. Only the nodes`

`between root and last are`

`considered to be in the subtrees.`

`*/`

`recordarray data;`

`int root,last;`

`{`

`int ptr,succ;`

`int keyvalue;`

`record recordvalue;`

`ptr = root;`

`succ = 2 * ptr;`

`if (succ < last)`

`if (data[succ + 1].key < data[succ].key)`

`succ++;`

at this point, succ points to the smallest of ptr's successors

`keyvalue = data[root].key;`

`copy(&recordvalue,&data[root]);`

the purpose of the while loop is to determine the proper place for the original root record; ptr will

point to that location when the loop is exited

`while((succ <= last) && (data[succ].key < keyvalue))`

`{`

`copy(&data[ptr],&data[succ]);`

`ptr = succ;`

`succ = 2 * ptr`

`if (succ < last)`

`if (data[succ + 1].key < data[succ].key)`

`succ++;`

`}`

`copy(&data[ptr],&recordvalue);`

`}`

8.4.8 Heapsort is Uniformly Fast

As presented, heapsort is efficient. Unlike the other sorts considered, which are array- or list-oriented, it is tree-oriented. They are O(n2) in the worst case, whereas heapsort is O(n lg n). Moreover, we shall see in Section 8.7 that it gives uniformly good sort times, no matter how ordered or unordered the original data. Although times are somewhat faster when the data is initially either fairly well ordered or reverse ordered, the range of these times is small compared to the other sorts. This is to be expected, since the construction of the heap takes little time and the reheaping tends to destroy any order present in the data. In effect it appears to be in random order. Actually, during reheaping, when shift works on larger entries at the root, it takes O(lg n) time, since they tend toward the bottom of the tree, while the opposite is true for smaller entries. Because most entries are near the bottom of the tree and are larger, their shifting makes up the bulk of the sorting time. Thus this time should be relatively insensitive to initial order in the data. This uniformity of sort times is one of its main virtues, since it eliminates the great variation in time from O(n) to O(n2) of the other sorts. In addition, it is fast.

8.5: Quicksort: Another Fast Sort

One way to develop quicksort is to attempt a recursive solution to the problem of sorting an array data of n records. The general recursive method immediately prescribes guidelines--break up the original problem into smaller versions of the original problem and then achieve the solution from their solutions. Here, this dictates that the original array be broken down into component arrays, which can be sorted independently and from which the final sorted array can be derived.

Merely splitting data into two components and then sorting each component independently is not sufficient to guarantee that the resultant array is sorted, with no further processing required. This is because some entries in the upper component may be smaller than some entries in the lower component. For example, suppose the initial array is data, shown in Figure 8.14(a), and the upper and lower components consist, respectively, of entries 1 to 4 and 5 to 8. Sorting these components independently yields Figure 8.14(b), which is clearly not sorted.

Figure 8.14 Sorting an Array by Component

Notice, however, that if all entries in the upper component were no less than all entries in the lower component, then sorting each component independently would always lead to a result that is sorted. For example, suppose the initial array is Figure 8.14(c). Taking the upper and lower components as entries 1 to 4, and 5 to 8, respectively, and independently sorting them produces Figure 8.14(d). This array is now sorted.

To carry out this procedure for an arbitrary array requires finding a partitioning entry. Normally a partitioning entry, such as 13 in position 5, will not occur in the initial array to be sorted. For example, no such partitioning entry (separating larger entries above from smaller entries below) occurs in Figure 8.14(a). This means that the initial array must be rearranged so that a partitioning entry may be found and subsequently used to separate the components.

In keeping with the top-down approach, assume that this task (rearranging the initial array and determining a partitioning entry) is carried out by the function partition(i,j). The arguments i and j contain pointers to entries of the array, as shown in Figure 8.15. Partition (i,j) returns an index pointer to a partitioning entry of the array consisting of all entries between i and j. Thus, if partition (i,j) returns a value p, then all entries between i and p-1 are at least as great as data [p].key, and all entries between p+1 and j are no greater than data [p].key. The pointer p separates the upper and lower components of i,j.

8.5.1 Two Quicksort Procedures

Recursive Version

`quicksort(i,j)`

`/* Sorts the records stored in array data`

`between positions indexed by i and j in`

`descending order.`

`*/`

`int i,j;`

`{`

`int p;`

`if(i < j)`

`{`

`p = partition(i,j);`

`quicksort(i,p-1);`

`quicksort(p+1,j);`

`}`

`}`

To sort a global array of n records, quicksort(1,n) is invoked. The crux of the algorithm is the partition function, which will be discussed shortly.

Every time a call is made to quicksort, two new obligations are incurred, and a current obligation is postponed. For instance, the initial call to quicksort generates the initial obligation: sort the n entries of data. During the execution of this call, partition determines a partitioning entry, and two new obligations are incurred. These are represented by the two recursive calls to quicksort, one for each component. The current obligation must then be postponed to carry out these new obligations. Each recursive call also generates two new obligations, requiring the execution of the current recursive call to be postponed.

As we have seen previously, what can be done recursively can also be done iteratively. Quicksort can be written nonrecursively by using a stack to store one of the new obligations, while the other can be processed immediately. It turns out that the choice of which of the two obligations to stack, and which to deal with immediately, is critical to the algorithm's storage requirements. The partition function does not necessarily split an array into two equal parts, as evidenced by our earlier example. Selecting the larger of the two components to stack results in a worst-case stack depth of O (lg n). An arbitrary choice instead leads to a worst-case stack depth of O (n). The difference in storage needed for the stack can be significant.

Nonrecursive Version

`quicksort(i,j)`

`/* Sorts the records stored in array data`

`between positions indexed by i and j in`

`descending order.`

`*/`

`int i,j;`

`{`

`int p,top,bottom;`

`stack s;`

`setstack(&s);`

`push(j,&s);`

`push(i&s);`

`while(!empty(&s))`

`{`

`pop(&s,&top);`

`pop(&s,&bottom);`

`while(top < bottom)`

`{`

`p = partition(top,bottom);`

`if((p - top) > (bottom - p))`

`{`

`push(p-l,&s);`

`push(top,&s);`

`top = p + l`

`}`

`else`

`{`

`push(bottom,&s);`

`push(p+1,&s);`

`bottom = p - 1;`

`}`

`}`

`}`

`}`

8.5.2 The Partition Function

A clever way to do this requires two pointers, upper and lower. Initially, upper is set to i, the first array index, and lower to j, the last array index. The initial situation is shown in Figure 8.16(a).

Lower is now moved upward until an entry larger than data [upper].key is encountered. In this case, 15 will be that entry. It is then interchanged with data[upper]; see Figure 8.16(b). At this point, attention is shifted to upper, which is moved downward until an entry smaller than data[lower].key is encountered. This entry will be 10, and it is interchanged with data[lower], yielding Figure 8.16(c).

Figure 8.16 Sorting an Array by Means of a Partitioning Function

Attention now shifts back to lower, which is again moved upward until an entry greater than data[upper].key is encountered. This will be 30, and it is interchanged with data[upper], resulting in Figure 8.16(d).

Upper is then moved downward until an entry smaller than data[lower].key is encountered. In this case no such entry appears, and when upper reaches lower, no such entry can appear. This is so because, at all times, any entries above upper are greater than the original first entry (13 in the example), and any entries below lower are less than the original first entry. Hence, whichever pointer is being moved, only entries between upper and lower can result in an interchange. When upper and lower coincide, no entries are left to interchange. Furthermore, when upper and lower coincide, they must be pointing to the original first entry or basis, since at all times at least one of them points to it. The final situation, then, is as shown in Figure 8.16(e).

This procedure clearly carries out the task of partition(i,j), which is to return the value of upper(or lower). Notice that, whether upper or lower is the focus of attention, it is always the original first entry (13 here) that is the basis for comparison.

Figure 8.17 Partitioning an Array by Means of a Temporary Variable, SAVERECORD

`partition(i,j)`

`/* Rearranges the entries in array`

`data between positions indexed by`

`i and j and returns an array index`

`which guarantees that all entries`

`above it are larger than the entry`

`it indexes, and all entries below it`

`are smaller.`

`*/`

`int i,j;`

`{`

`int upper,lower;`

`record saverecord;`

`upper = i;`

`lower = j;`

`copy(&saverecord,&data[i]);`

`while(upper != lower)`

`{`

`while((upper < lower) && (saverecord.key >=`

`data[lower].key))`

`lower--;`

`if(upper != lower)`

`copy(&data[upper],&data[lower]);`

`while((upper < lower) && (saverecord.key <=`

`data[upper].key))`

`upper++;`

`if(upper != lower)`

`copy(&data[lower],&data[upper]);`

`}`

`copy(&data[upper],&saverecord);`

`return(upper);`

`}`

It is important to recognize that whenever partition (i,j) is invoked, it processes only entries between i and j of the data array and deals with each of these entries exactly once. Each entry is compared with saverecord, and perhaps copied into data [upper] or data [lower]. Consequently, a call to partition (i,j) takes time O(j - i + 1), since there are j - i + 1 entries between i and j.

8.5.3 Analyzing Iterative Quicksort

Initially, the stack is empty, and the array is as shown in Figure 8.18(a). When quicksort (1,8) is invoked, it calls partition (1,8), which returns with p pointing to the partitioning entry and the rearranged array of Figure 8.18(b).

The upper component array is the larger, so (1,4) is placed on the stack, and the lower component array is processed. This leads to a call to partition (6,8). When partition returns, the situation is Figure 8.18(c).

Figure 8.18 Iterative Quicksort

The current upper component array is larger, so (6,7) is placed on the stack, and the lower component array is processed. Since the lower component contains no entries, (6,7) is removed from the stack and partition(6,7) is invoked, returning the situation depicted in Figure 8.18(d).

Now (6,6) is placed on the stack, and the lower component is processed. Again, this component has no entries, so (6,6) is removed from the stack. Since top (= 6) is not less than bottom (= 6), the outer while loop body is now executed, causing (1,4) to be removed from the stack. At this time the stack is empty, and the array is as shown in Figure 8.18(e). Partition(1,4) is now invoked, and returns, yielding Figure 8.18(f).

Now (1,3) is placed on the stack. The lower component again contains no entries, so (1,3) is removed from the stack, and partition(1,3) is invoked. The result is Figure 8.18(g).

At this point (2,3) is placed on the stack. Since the upper component contains no entries, (2,3) is removed from the stack, and partition(2,3) is invoked. It returns with Figure 8.18(h).

Now (2,2) is placed on the stack. The lower component has no entries, so (2,2) is removed from the stack. Since top (= 2) and bottom (= 2), the outer while loop tests for an empty stack. As the stack is empty, quicksort terminates.

It should not be clear that each time the inner loop body of quicksort is executed, and top < bottom, p is set to point to an entry of the array that has been correctly placed in its final sorted position. If top = bottom, then the one entry is already in its proper sorted position. The inner loop body is thus executed at most n times. Ignoring the time taken by partition, the inner loop body takes a constant time to execute. Each time the inner loop body executes, it generates one new problem that is placed on the stack and another new problem that is processed immediately. As soon as this latter problem has fewer than two entries (top bottom), the inner loop is exited. Unless the stack is empty, the outer loop removes the next problem from the stack, and the inner loop then functions again. Consequently, each entry that is placed in its proper position requires, at most, an execution of the inner loop body or an execution of the inner loop body plus the additional constant time required by the outer loop test plus removal of a problem from the stack. This means that the total time is, at most, c1 + c2 x n plus the total time taken by partition, for some constants c1 and c2.

8.5.4 The Worst and Best Cases

To analyze the total time of partition, recall that each execution takes time O(length of the component array it is called to work on). Since the length of a component is at most n, with partition being invoked at most n times, total time is at most O(n2). Hence quicksort has a worst-case time O(n2). You should convince yourself that the worst case occurs whenever the initial array is in sorted or reverse sorted order.

The best case (fastest execution) actually occurs when the partitioning entry returned by partition splits the array in half. In this case, partition is called to work on one component array of length n, two component arrays of length n/2, four component arrays of length n/4, . . . , 2k component arrays of length n/2k, etc. By an argument similar to the worst-case analysis of the binary search, about 1g n such calls will be made to partition. Its total time will then be O(n 1g n).

While the worst-case time for quicksort is poor, its best-case time is fast--in fact, better than heapsort. The average time for quicksort is also O(n 1g n). This assumes that each ordering of the n entries is equally likely to appear in the initial array. Evidently, cases differing significantly from the best occur rarely, and this is why quicksort is aptly named.

The storage requirements for quicksort, beyond the initial array, are a few variables plus the stack depth. Since the larger of each generated component array problem is stacked, the next problem to be stacked must have length at most one-half the current top stack entry length. For example, in the preceding section, note that when (6,7) was placed on the stack, its length was no greater than half the length of the preceding stack component (1,4). Again, by an argument similar to the worst-case binary search analysis, at most O(1g n) such problems can appear on the stack at any time, so the storage requirements for quicksort are O(1g n). Thus, quicksort, while faster on the average than heapsort, does not have the guaranteed worst-case O(n 1g n) time of heapsort. Moreover, quicksort requires additional storage O(1g n) to achieve its faster average time.

For smaller values of n, the simpler sorts, like the insertion sort, will be even faster than quicksort. Of course, if the initial array is known to be nearly sorted, the insertion sort will be faster anyway. Clearly, one way to speed up quicksort is to modify it by using a faster sort, such as insertion sort, when small component arrays are encountered. It is also possible to enhance the performance of quicksort by taking more care in the selection of the entry to be used as a basis by partition. Instead of taking the first entry of a component as the basis, the median of three entries may be selected, or an entry may be selected at random.

8.6: Simulation of an Algorithm

The simulation program to be written must explore the time requirements, as measured by int1, int2, comp1, and comp2, for various values of n, such as 20, 40, 60, 80, and 100. This will suggest how the execution time depends on n. Alternatively, the actual execution times could be measured. Intuitively, the algorithm will have fastest execution time when the input is nearly sorted, the slowest time when the input is nearly reverse sorted. For random initial orderings, the time can be expected to fall between these two extremes. This may not be the case but seems likely a priori. Consequently, we want to generate a number of samples from each of these three distinct classes of input, execute the treesort routine for each sample, and collect statistics on the numbers of interchanges and comparisons based on the samples taken.

We will choose twenty-five samples of each type of input, for a total number of executions of treesort of (twenty-five samples for each input class and value of n) x 3 input classes x 5 values of n, or 375 runs. One way to see if twenty-five samples is enough to achieve good statistics is to run an additional twenty-five. If the results do not vary significantly, then the twenty-five samples, or perhaps even fewer, are sufficient. If necessary, we can use a larger number of samples. Remember, the twenty-five samples are taken from a possible number of n! samples. It may seem the height of foolishness to base estimates on such a ridiculously small fraction of the possibilities. Thank goodness for statistics (when properly applied).

`                          n = 40`

`                       -------------`

`  Input Class          Min  Ave  Max`

`------------------------------------`

`"Nearly sorted"  int1   *    *    *`

`                 int2   *    *    *`

`                 comp1  *    *    *`

`                 comp2  *    *    *`

`"Random"         int1   *    *    *`

`                 int2   *    *    *`

`                 comp1  *    *    *`

`                 comp2  *    *    *`

`"Nearly reverse  int1   *    *    *`

`  sorted"        int2   *    *    *`

`                 comp1  *    *    *`

`                 comp2  *    *    *`

For each of the twenty-five samples for a given input class, the task is to find the following statistics for int1, int2, comp1, and comp2: their minimum, average, and maximum over all twenty-five sample values. Other statistics, such as the range and standard deviation, could be calculated if needed. We will have the simulation program print this information for each value of n, as indicated for n = 40 in the chart on the previous page. The *'s represent actual numeric values that will be output. A total of thirty-six values will be printed for each n; 5 X 36 =180 statistics.

The program will keep statistics tables, one table for each input class for the current value of n. An algorithm for the simulation program is as follows:

1. Set n to 20.

2. While n 100

a.Set ns to 1.

b. Initialize statistics tables.

c. While ns 25,

i. Generate a sample from each input class, call treesort to work on each input, and then update the appropriate statistics tables.

Set ns to ns+1.

d. Output the statistics tables for the current value of n and for each input class.

e. Set n to n+20.

Task (i) may be carried out as follows:

a. Generate a "random" sample for a of size n and call treesort(n, a, sa, int1, int2, comp1, comp2).

b. Update the statistics table for "random" input.

c. Generate a "nearly sorted" input for a of size n by making 20 percent of n random interchanges on sa. Copy sa into a.

d. Generate a "nearly reverse sorted" input for a of size n by copying sa in reverse order in ra. Call treesort(n, a, sa, int1, int2, comp1, comp2).

e. Update the statistics tables for "nearly sorted" input.

f. Call treesort(n, ra, sa, int1, int2, comp1, comp2).

g. Update the statistics table for "nearly reverse sorted" input.

h. Set ns to ns+1.

At this point we can explain in more detail what is meant by the three classes of input. Assume that all input arrays, a, will contain integers between 1 and 1,000. Random inputs for a of size n are generated by placing the n integers into a in such a way that all orderings of the n numbers are equally likely. Consider the two arrays shown in Figure 8.19.

Array (b) has been obtained from array (a), which is sorted, by making two interchanges. These were interchanging a1[2] with al[6], and interchanging a1[4] with a1[9] . These two interchanges represent 20 percent of n interchanges, since n is 10. A nearly sorted input for a of size n is obtained by making 20 percent of n such interchanges in an initially sorted array. The interchanges are to be selected at random. This means that each interchange is obtained by picking a value for i at random, so that each of the locations 1 to n of sa is equally likely to be selected. An integer j may be similarly obtained. Then sa[i] and sa[j] are interchanged. The only details still left to implement in the simulation algorithm are how to generate the required random integers to be placed in a and used for interchange locations, how to update the statistics tables, and how to finalize the tables before printing them.

Figure 8.19 Array (b) Obtained from Sorted Array (a) by Two Interchanges

To generate an integer between 1 and 1,000, take the value rng returns, multiply it by 1,000, take the integer part of the result, and add 1 to it. For example, if the value returned is 0.2134, multiplying by 1,000 gives 213.4. Its integer part is 213; adding 1 yields 214. In fact, if any number strictly between 0.213 and 0.214 had been returned, 214 would have been generated. Hence 214 would be generated with probability 1/1,000, the length of the interval from 0.213 to 0.214. The same is true for each integer between 1 and 1,000. Carrying out this procedure n times will fill the array a to generate a "random" input (See Exercise 23 for a more exact approach). An alternative method in C to generate an integer at random between integers low and high uses the library function rand. Evaluating rand()%(high-low+1) + low yields the desired integer between low and high.

The statistics tables can keep current running minimums, maximums, and the accumulated sum of the appropriate statistic. When each call to treesort returns, the minimum, maximum, and accumulated sum may be updated for each statistic, int1,int2,comp1, and comp2. When the inner loop of the algorithm is exited, and task d is to be carried out, the statistics tables are ready to be printed, except for the twelve averages. They must be finalized in task d by dividing the accumulated sums by 25.

The algorithm presented in this section can be used directly to obtain simulation results for treesort. Simply replacing the call to treesort in task (i) by a call to suitably modified versions of the other sort programs will produce their simulation results as well. To use simulation in the analysis of other programs, not necessarily sorts, requires some insight into what input is appropriate to generate and what statistics are meaningful to collect. This is not always so straightforward as in our example, but the general procedure and goal for the simulation of an algorithm should now be clear.

8.7: Simulation Results for the Sorts

Each entry in the table is the average of 20 runs on a VAX 730 under VMS Version 4.0. For each n the five values correspond to input that is sorted, half sorted--half random, random, half reverse sorted--half random, and reverse sorted. If the entries for heapsort seem counter to your intuition, remember that its implementation sees sorted input as reverse sorted and reverse sorted input as sorted. This is because it uses a min heap to produce descending order.

8.8: Synopsis of Search and Sort Efficiencies

Linear searches of arrays and binary searches of sorted arrays can be performed, respectively, in O(n) and O(1g n) time. Under special conditions, linear searches can be fast.

Simple sorting algorithms such as the bubble and insertion sort have worst-case time O(n2) but are O(n) when dealing with almost sorted data. Quicksort gives very fast average times and heapsort does sorting in worst-case time O(n 1g n).

Table 8.1 Simulation Results in Actual Execution Time (milliseconds)*

` n    Bubble Sort                        Insertion Sort                     Modified Insertion Sort`

`-----------------------------------------------------------------------------------------------------------`

` 10    1       2       6       8      9     1      2      5      7      7     0     2       4      6      9`

` 20    1       8      23      33     36     1      5     16     28     30     1     6      17     25     30`

` 40    2      31     101     138    148     2     16     67    110    121     5    16      65    107    120`

`100    4     217     646     863    923     7     89    412    678    761     6    87     402    660    744`

`200    8     966    2653    3463   3724    12    364   1657   2706   3058    11   354    1612   2639   2993`

`400   17    3933   11277   14732  15858    24   1583   7645  12489  14595    20  1373    6352  10581  12157`

`-----------------------------------------------------------------------------------------------------------`

`     srt'd  half   rnd'm    half   rev    srt'd  half   rnd'm  half    rev   srtd  half   rndm   half   rev`

`            srt'd           rev   srt'd         srt'd          rev   srt'd        srt'd          rev  srt'd`

`             half          srt'd                 half         srt'd                half         srt'd`

`            rnd'm          half                 rnd'm         half                rnd'm          half`

`                           rnd'm                              rnd'm                            rnd'm`

` n    Heapsort                          Iterative Quicksort                  Recursive Quicksort`

`-----------------------------------------------------------------------------------------------------------`

` 10     9      9      8      8      7     10      9     9      8     10      5     5      3      3      4`

` 20    18     19     18     18     15     21     19    15     17     23     13    11     10     11     14`

` 40    43     43     41     40     37     51     47    33     38     55     35    31     21     25     36`

`100   130    126    122    119    115    196    171    94    106    202    156   133     65     74    158`

`200   291    282    277    264    256    626    521   202    231    633    581   484    149    169    602`

`400   640    624    610    587    572   2175   1750   432    492   2195   2042  1644    318    372   2056`

`-----------------------------------------------------------------------------------------------------------`

`    srt'd   half  rnd'm    half    rev  srt'd   half  rnd'm   half   rev   srtd   half   rndm   half   rev`

`           srt'd           rev    srt'd        srt'd          rev   srt'd        srt'd          rev  srt'd`

`            half          srt'd                 half         srt'd                half         srt'd`

`           rnd'm           half                rnd'm          half               rnd'm          half`

`                          rnd'm                              rnd'm                             rnd'm`

`-----------------------------------------------------------------------------------------------------------`

* These results were produced by Sharon Doherty.

Simulation is a basic tool for the analysis of the storage and timing requirements of algorithms although we have used it only for analyzing sorts.

As searching as this chapter may have been, it is hoped that it didn't leave you out of sorts!

Exercises

b. For what values of n will the binary search be faster than the linear search of the improved procedure in the worst case?

a. Determine the worst-case times for the sorts of the procedures given in the text for the maximum entry sort, the bubble sort, and the insertion sort.

b. For each value of n, determine which of the three sorts has least worst-case time.

a. The bubble sort?

b. The insertion sort?

16, 18, 22, 15, 4, 50, 17, 31, 4, 90, 6, 25

Create the heap produced by

a. The first heap creation algorithm of Section 8.4.

b. The second heap creation algorithm of Section 8.4.

c. How many comparisons and interchanges were required in Exercise 11(a) and Exercise 11(b)?

a. The reheaped heap after the root element is removed using the first reheaping algorithm of Section 8.4.

b. The reheaped heap after the root element is removed using the second reheaping algorithm of Section 8.4.

c. How many comparisons and interchanges were required in Exercises 12(a) and12(b)?

a. Convince yourself that this will generate a "'true" random input.

b. Write a procedure to generate this "true" random input.