8.1: Overview

So far in this book, arrays, lists, binary trees, and trees?/FONT>the basic data sructures?/FONT>have been introduced and applied to a variety of problems. Selecting the appropriate data structure requires knowing what operations are to be performed on it. The frequency with which these operations are to be performed, the characteristics of the data to be stored, and the average and worst-case time and storage requirements all combine to determine this choice. Other considerations, such as clarity and adaptability, may also serve as criteria for deciding which data structures to use. It is rarely possible to pick a data structure that is best for all criteria. Instead, trade-offs must be made.

In the case study of family relationships in Section 6.5, one of the important operations was a search of the nametable data base. At that time no decisions were made as to how the nametable data base would be structured to facilitate the search. This chapter and the next will show that manyimplementations are possible and that the appropriate choice is determined by the kinds of operations to be performed on the data base. Nametable might be organized in many ways. It can be implemented as an array, a list, a binary search tree, a balanced binary search tree, or a hash table. The techniques used to create and search these structures are introduced in this and the next chapter.

Searching and sorting masses of data to retrieve specific information and organize it for manipulation and presentation are nothing new. These basic operations were performed and studied before the advent of computers.Volumes have been written on these topics, yet they are still being researched. Even the general public and the press are evidently interested in searching, as you can see from the following letter and response:

Dear Ann Landers:

Why is it that whenever I lose anything, it is always

in the last place I look?

?/FONT>Dizzy Lizzy

Dear Liz:

Because when you find it you stop looking?/FONT>and

that's the last place you looked.

Although searching and sorting may appear mundane, the speed with which they can be carried out largely determines whether a particular application of the computer is practical (and "cost-effective'") or not. Estimates indicate that about one-fourth of the processing time in computer centers is spent just sorting. The relation between sorting and searching will become apparent in this chapter. If one can be done faster, so can the other. The choice of data structures is always important in algorithm design and is the key to the evolution of good (that is, efficient) searching and sorting procedures. The basic data structures studied in the preceding chapters (arrays, lists, and binary trees) are used in the design of efficient procedures.

Normally, the objects stored, searched for, and sorted are records. In sorting, one field of the records, called the sort key, is chosen as the basis for the sort. Thus, in a payroll application, an employee name field might be the key for an alphabetical ordering, while a salary field might serve as the basis for a sort to determine an ordering by amount of earnings. At times the sort key can be a combination of fields. As an example, we may desire to sort with name as the primary key and zip code as the secondary key. For simplicity it will be assumed in this chapter that the objects being sorted are integer numbers. However, any variable type could have been assumed. For sorting purposes, the only requirement is that, given any two objects, it is possible to tell which precedes the other.

Searching normally involves comparing records to a given search key value. The goal is to determine whether there exists among the records of the data base a record with key equal to that of the search key. If the key value can be matched, then a pointer to the record with the matching value must be returned. The basic principle for shortening search times is to organize the data base so that the search can be focused quickly on the part of the data base that will contain the desired record, if it is present.

The chapter first explores simple linear, binary, and interpolation searches and a number of elementary sorting algorithms, such as the bubble sort and the insertion sort. These lead to more advanced methods for sorting based on a special binary tree called a heap. Next, an application of recursion yields another fast sort, quicksort.

Some of the algorithms developed in the text are fairly complex and difficult, if not impossible, to analyze mathematically. For this reason the last topic considered in the chapter is the use of simulation as a tool in discovering the behavior and efficiency of such complex algorithms.

The chapter concludes with a summary of the strengths and weaknesses of each approach introduced. Knuth [1973b] and Wirth [1976] discuss many searching and sorting algorithms that are not dealt with in this text. Knuth also contains mathematical analyses of many of these algorithms.