8.2.3 Binary Search

Certainly no one does a linear search to look for a phone number in the telephone directory. Imagine trying to do so in the directory for New York City or any other large city! But there are ways to carry out the search so that the name is found in a few seconds, even from among a few million names. The structure of the telephone directory is the clue to the solution, which takes advantage of the alphabetical ordering of the entries in the phone book.

Assume that the numbers stored in data are arranged in decreasing order, as in Figure 8.2. They are said to be sorted in decreasing order. Since a telephone book is sorted in alphabetical order, one of the things most people do by habit when using the telephone directory is to flip the directory open toward the front, middle, or back depending on the location in the alphabet of the name being sought. If the page opened to has names farther down in the alphabet, the search name cannot be on that page or anywhere in the directory to its right. It can only be to the left of the selected page. This process of elimination is behind the search technique called binary search. It is not difficult to see why this search technique is called binary. Every time a test is made in the search, there are two choices. In the telephone book example, search either half of the remaining pages for the name.

A binary search is also a good strategy for winning the game of twenty questions, which requires guessing the identity of an object after finding out if it is in the animal, the mineral, or the vegetable category. Twenty question with "yes" or "no" answers are allowed. Selecting each question so that about half the remaining possibilities are eliminated, no matter what the answer, is best. (For example, given the animal category, "Is the object male or female"?) In this way 220, or over a million, objects can be distinguished.

In twenty questions it is difficult to select a question that makes close to a 50-50 split of the remaining possibilities. However, if you are asked to determine a number known to lie between 1 and 1024 by asking yes-no questions, it is easy to do so. Ask, "Is it greater than 512?" and continue splitting the remaining possibilities in half. No more than 10 such questions are needed to find the number from among the 210 (1024) initial possibilities. Similarly, when searching for a specific key value among 2n records, at most n key comparisons are needed to determine whether it is present, as long as each comparison splits the remaining collection of records into two parts that differ in size by at most one.

Figure 8.2 Array of Integers Sorted in Decreasing Order

When searching data stored in arbitrary order in an array, there is no reason to expect the search key to be in any particular region of the array. However, when the data have been stored in sorted order, the situation is different. It is possible to check the middle element to see if the search key is there. If it is, the program would note its location and terminate the search. If the key is not there, then if the search key is greater than the key in the middle element, eliminate the middle element and all elements below it from consideration. If the desired record is in data at all, it must be above the middle element. Similarly, if the search key is less than the key in the middle element, eliminate the middle element and all elements above it. In any event, at most half the elements remain to be considered, and the same procedure can then be applied to the appropriate half: either elements 1 to [mid-1] or elements [mid+1] to n. Mid is a variable that points to the middle element.

For example, if mid is 10, then the search key value, 25, is not greater than data[mid], 78. Hence, if it is in data at all, it must be between positions mid+1 and n?/FONT>11 and 20. Taking 15 as the middle element of these, the new mid value is 15. Key is less than data [mid], 32. Hence we need to search between mid+1 and 20-16 and 20. With a new mid value of 18, key is greater than data [mid], 13. Now search only between 16 and 17. The next mid value is 16, and the key is found.

The basic structure of the binary search algorithm is a loop, as shown in the following function.

#define TRUE 1

#define FALSE 0

#define MAX 100

typedef struct

{

int key;

}record;

typedef record recorddarray[MAX];

binarysearch(data,n,key,pfound,ploc)

/* Searches the n records stored in order by

key field value(largest first) in the array

data for the search key. Found will be true

only if a record is present with key field

equal to the search key value and then loc

will contain its array index.

*/

recordarray data;

int n,key,*pfound,*ploc;

{

int top, mid,bottom;

>Comment

top = 1;

bottom = n;

*pfound = FALSE;

*ploc = 0;

>Comment

while ((top <= bottom) && !(*pfound))

{

>Comment

mid = (top + bottom)/2

>Comment

if (data[mid].key == key)

{

>Comment

*pfound = TRUE

>Comment

*ploc = mid;

}

>Comment

else if (data[mid].key < key)

bottom = mid - 1;

>Comment

else

top = mid + 1;

}

}

Mid is initialized to the middle element of data, top to 1, and bottom to n. Top and bottom are variables that point to the top and bottom elements of the consecutive elements left to be searched, all others having been eliminated from consideration. In the loop body, test for key equal to data [mid]. If the equality is true, the program exits the loop and the search is done, since key has been found at element mid. If not, top and bottom must be updated to reflect the new sequence of elements that are to be searched. This updating must set bottom to mid whenever key is greater than data [mid], and top to mid + 1 otherwise. The test for continuing the loop involves checking whether key was found, and whether any elements are left to be searched between top and bottom. The search is terminated as soon as the desired entry is found or as soon as it is concluded that the entry is not present. Loc will be zero if the key value was not found and will index its location in data if found.