In this chapter, we switch our attention from the *implementation* of algorithms to the *design* of algorithms. Most of the algorithms that we have seen so far are straightforward and simple. Chapter 9 contains some algorithms that are much more subtle, and some require an argument (in some cases lengthy) to show that they are indeed correct. In this chapter, we will focus on five of the common types of algorithms used to solve problems. For many problems, it is quite likely that at least one of these methods will work. Specifically, for each type of algorithm we will

See the general approach.

Look at several examples (the exercises at the end of the chapter provide many more examples).

Discuss, in general terms, the time and space complexity, where appropriate.

The first type of algorithm we will examine is the *greedy algorithm*. We have already seen three greedy algorithms in Chapter 9: Dijkstra's, Prim's, and Kruskal's algorithms. Greedy algorithms work in phases. In each phase, a decision is made that appears to be good, without regard for future consequences. Generally, this means that some *local* *optimum* is chosen. This "take what you can get now" strategy is the source of the name for this class of algorithms. When the algorithm terminates, we hope that the local optimum is equal to the *global optimum*. If this is the case, then the algorithm is correct; otherwise, the algorithm has produced a suboptimal solution. If the absolute best answer is not required, then simple greedy algorithms are sometimes used to generate approximate answers, rather than using the more complicated algorithms generally required to generate an exact answer.

There are several real-life examples of greedy algorithms. The most obvious is the coin-changing problem. To make change in U.S. currency, we repeatedly dispense the largest denomination. Thus, to give out seventeen dollars and sixty-one cents in change, we give out a ten-dollar bill, a five-dollar bill, two one-dollar bills, two quarters, one dime, and one penny. By doing this, we are guaranteed to minimize the number of bills and coins. This algorithm does not work in all monetary systems, but fortunately, we can prove that it does work in the American monetary system. Indeed, it works even if two-dollar bills and fifty-cent pieces are allowed.

We are given jobs *j*_{1}, *j*_{2}, . . . , *j _{n}*, all with known running times

As an example, suppose we have the four jobs and associated running times shown in Figure 10.1. One possible schedule is shown in Figure 10.2. Because *j*_{1} finishes in 15 (time units), *j*_{2} in 23, *j*_{3} in 26, and *j*_{4} in 36, the average completion time is 25. A better schedule, which yields a mean completion time of 17.75, is shown in Figure 10.3.

The schedule given in Figure 10.3 is arranged by shortest job first. We can show that this will always yield an optimal schedule. Let the jobs in the schedule be *j _{i}*1,

Job Time

---------

j_{1 15}

j_{2 8}

j_{3 3}

j_{4 10}

Notice that in Equation (10.2), the first sum is independent of the job ordering, so only the second sum affects the total cost. Suppose that in an ordering there exists some *x* > *y* such that *t _{ix}* <

This result indicates the reason the operating system scheduler generally gives precedence to shorter jobs.

We can extend this problem to the case of several processors. Again we have jobs *j*_{1}, *j*_{2}, . . . , *j _{n}*, with associated running times

Figure 10.5 shows an optimal arrangement to minimize mean completion time. Jobs *j _{1}*,

The algorithm to solve the multiprocessor case is to start jobs in order, cycling through processors. It is not hard to show that no other ordering can do better, although if the number of processors *P* evenly divides the number of jobs *n*, there are many optimal orderings. This is obtained by, for each 0 *i* <*n/P*, placing each of the jobs *j _{iP}*+1 through

Job Time

---------

j_{1 }3

j_{2 5}

j_{3 6}

j_{4 }10

j_{5 }11

j_{6 }14

j_{7 15}

j_{8 18}

j_{9 }20

We close this section by considering a very similar problem. Suppose we are only concerned with when the last job finishes. In our two examples above, these completion times are 40 and 38, respectively. Figure 10.7 shows that the minimum final completion time is 34, and this clearly cannot be improved, because every processor is always busy.

Although this schedule does not have minimum mean completion time, it has merit in that the completion time of the entire sequence is earlier. If the same user owns all these jobs, then this is the preferable method of scheduling. Although these problems are very similar, this new problem turns out to be *NP*-complete; it is just another way of phrasing the knapsack or bin-packing problems, which we will encounter later in this section. Thus, minimizing the final completion time is apparently much harder than minimizing the mean completion time.

In this section, we consider a second application of greedy algorithms, known as *file compression*.

Suppose we have a file that contains only the characters *a, e, i, s, t*, plus blank spaces and *newlines*. Suppose further, that the file has ten *a*'s, fifteen *e*'s, twelve *i*'s, three *s*'s, four *t*'s, thirteen blanks, and one *newline*. As the table in Figure 10.8 shows, this file requires 174 bits to represent, since there are 58 characters and each character requires three bits.

Character Code Frequency Total Bits

--------------------------------------

a000 10 30

e001 15 45

i010 12 36

s011 3 9

t100 4 12

space101 3 39

newline110 1 3

--------------------------------------

Total 174

The binary code that represents the alphabet can be represented by the binary tree shown in Figure 10.9.

The tree in Figure 10.9 has data only at the leaves. The representation of each character can be found by starting at the root and recording the path, using a 0 to indicate the left branch and a 1 to indicate the right branch. For instance, *s *is reached by going left, then right, and finally right. This is encoded as 011. This data structure is sometimes referred to as a *trie*. If character *c _{i} *is at depth

A better code than the one given in Figure 10.9 can be obtained by noticing that the *newline* is an only child. By placing the *newline* symbol one level higher at its parent, we obtain the new tree in Figure 10.9. This new tree has cost of 173, but is still far from optimal.

Notice that the tree in Figure 10.10 is a *full tree*: All nodes either are leaves or have two children. An optimal code will always have this property, since otherwise, as we have already seen, nodes with only one child could move up a level.

Putting these facts together, we see that our basic problem is to find the full binary tree of minimum total cost (as defined above), where all characters are contained in the leaves. The tree in Figure 10.11 shows the optimal tree for our sample alphabet. As can be seen in Figure 10.12, this code uses only 146 bits.

Character Code Frequency Total Bits

------------------------=--------------

a001 10 30

e01 15 30

i10 12 24

s00000 3 15

t0001 4 16

space11 13 26

newline00001 1 5

---------------------------------------

Total 146

Throughout this section we will assume that the number of characters is *C*. Huffman's algorithm can be described as follows: We maintain a forest of trees. The *weight* of a tree is equal to the sum of the frequencies of its leaves. *C* - 1 times, select the two trees, *T*_{1} and *T*_{2}, of smallest weight, breaking ties arbitrarily, and form a new tree with subtrees *T*_{l} and *T*_{2}. At the beginning of the algorithm, there are *C* single-node trees-one for each character. At the end of the algorithm there is one tree, and this is the optimal Huffman coding tree.

A worked example will make the operation of the algorithm clear. Figure 10.13 shows the initial forest; the weight of each tree is shown in small type at the root. The two trees of lowest weight are merged together, creating the forest shown in Figure 10.14. We will name the new root *T*1, so that future merges can be stated unambiguously. We have made *s* the left child arbitrarily; any tiebreaking procedure can be used. The total weight of the new tree is just the sum of the weights of the old trees, and can thus be easily computed. It is also a simple matter to create the new tree, since we merely need to get a new node, set the left and right pointers, and record the weight.

Now there are six trees, and we again select the two trees of smallest weight. These happen to be *T*1 and *t*, which are then merged into a new tree with root *T*2 and weight 8. This is shown in Figure 10.15. The third step merges *T*2 and *a*, creating *T*3, with weight 10 + 8 = 18. Figure 10.16 shows the result of this operation.

After the third *merge* is completed, the two trees of lowest weight are the single-node trees representing *i* and the blank space. Figure 10.17 shows how these trees are merged into the new tree with root *T*4. The fifth step is to merge the trees with roots *e* and *T*3, since these trees have the two smallest weights. The result of this step is shown in Figure 10.18.

Finally, the optimal tree, which was shown in Figure 10.11, is obtained by merging the two remaining trees. Figure 10.19 shows this optimal tree, with root *T*6.

If we maintain the trees in a priority queue, ordered by weight, then the running time is *O*(*C* log *C*), since there will be one *build_heap*, 2*C* - 2 *delete_mins*, and *C* - 2 *inserts*, on a priority queue that never has more than *C* elements. A simple implementation of the priority queue, using a linked list, would give an *O* (*C*^{2}) algorithm. The choice of priority queue implementation depends on how large *C* is. In the typical case of an ASCII character set, *C* is small enough that the quadratic running time is acceptable. In such an application, virtually all the running time will be spent on the disk I/O required to read the input file and write out the compressed version.

There are two details that must be considered. First, the encoding information must be transmitted at the start of the compressed file, since otherwise it will be impossible to decode. There are several ways of doing this; see Exercise 10.4. For small files, the cost of transmitting this table will override any possible savings in compression, and the result will probably be file expansion. Of course, this can be detected and the original left intact. For large files, the size of the table is not significant.

The second problem is that as described, this is a two-pass algorithm. The first pass collects the frequency data and the second pass does the encoding. This is obviously not a desirable property for a program dealing with large files. Some alternatives are described in the references.

In this section, we will consider some algorithms to solve the *bin packing* problem. These algorithms will run quickly but will not necessarily produce optimal solutions. We will prove, however, that the solutions that are produced are not too far from optimal.

We are given *n* items of sizes *s*_{1}, *s*_{2}, . . . , *s*_{n}. All sizes satisfy 0 < *s _{i}* 1. The problem is to pack these items in the fewest number of bins, given that each bin has unit capacity. As an example, Figure 10.20 shows an optimal packing for an item list with sizes 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8.

There are two versions of the bin packing problem. The first version is *on-line* bin packing. In this version, each item must be placed in a bin before the next item can be processed. The second version is the *off-line* bin packing problem. In an off-line algorithm, we do not need to do anything until all the input has been read. The distinction between on-line and off-line algorithms was discussed in Section 8.2.

To show that an on-line algorithm cannot always give an optimal solution, we will give it particularly difficult data to work on. Consider an input sequence *I*_{1} of *m* small items of weight followed by *m* large items of weight , 0 < < 0.01. It is clear that these items can be packed in *m *bins if we place one small item and one large item in each bin. Suppose there were an optimal on-line algorithm *A* that could perform this packing. Consider the operation of algorithm *A* on the sequence *I*_{2}, consisting of only *m* small items of weight . *I*_{2} can be packed in [*m*/2] bins. However, *A* will place each item in a separate bin, since *A *must yield the same results on *I*_{2} as it does for the first half of *I*_{1}, since the first half of *I*_{1} is exactly the same input as *I*_{2}. This means that *A* will use twice as many bins as is optimal for *I*_{2}. What we have proven is that there is no optimal algorithm for on-line bin packing.

Probably the simplest algorithm is *next fit.* When processing any item, we check to see whether it fits in the same bin as the last item. If it does, it is placed there; otherwise, a new bin is created. This algorithm is incredibly simple to implement and runs in linear time. Figure 10.21 shows the packing produced for the same input as Figure 10.20.

Not only is next fit simple to program, its worst-case behavior is also easy to analyze.

To see that this bound is tight, suppose that the* n* items have size *s _{i}* = 0.5 if

Although next fit has a reasonable performance guarantee, it performs poorly in practice, because it creates new bins when it does not need to. In the sample run, it could have placed the item of size 0.3 in either B_{1} or B_{2}, rather than create a new bin.

The *first fit* strategy is to scan the bins in order and place the new item in the first bin that is large enough to hold it. Thus, a new bin is created only when the results of previous placements have left no other alternative. Figure 10.24 shows the packing that results from first fit on our standard input.

A simple method of implementing first fit would process each item by scanning down the list of bins sequentially. This would take *O*(*n ^{2}*). It is possible to implement first fit to run in

A moment's thought will convince you that at any point, at most one bin can be more than half empty, since if a second bin were also half empty, its contents would fit into the first bin. Thus, we can immediately conclude that first fit guarantees a solution with at most twice the optimal number of bins.

See the references at the end of the chapter.

An example where first fit does almost as poorly as the previous theorem would indicate is shown in Figure 10.25. The input consists of 6*m* items of size , followed by 6*m* items of size , followed by 6*m* items of size . One simple packing places one item of each size in a bin and requires 6*m* bins. First fit requires 10*m* bins.

When first fit is run on a large number of items with sizes uniformly distributed between 0 and 1, empirical results show that first fit uses roughly 2 percent more bins than optimal. In many cases, this is quite acceptable.

Although next fit has a reasonable performance guarantee, it performs poorly in practice, because it creates new bins when it does not need to. In the sample run, it could have placed the item of size 0.3 in either B_{1} or B_{2}, rather than create a new bin.

The *first fit* strategy is to scan the bins in order and place the new item in the first bin that is large enough to hold it. Thus, a new bin is created only when the results of previous placements have left no other alternative. Figure 10.24 shows the packing that results from first fit on our standard input.

A simple method of implementing first fit would process each item by scanning down the list of bins sequentially. This would take *O*(*n ^{2}*). It is possible to implement first fit to run in

A moment's thought will convince you that at any point, at most one bin can be more than half empty, since if a second bin were also half empty, its contents would fit into the first bin. Thus, we can immediately conclude that first fit guarantees a solution with at most twice the optimal number of bins.

See the references at the end of the chapter.

An example where first fit does almost as poorly as the previous theorem would indicate is shown in Figure 10.25. The input consists of 6*m* items of size , followed by 6*m* items of size , followed by 6*m* items of size . One simple packing places one item of each size in a bin and requires 6*m* bins. First fit requires 10*m* bins.

When first fit is run on a large number of items with sizes uniformly distributed between 0 and 1, empirical results show that first fit uses roughly 2 percent more bins than optimal. In many cases, this is quite acceptable.

Another common technique used to design algorithms is *divide and conquer*. Divide and conquer algorithms consist of two parts:

*Divide:* Smaller problems are solved recursively (except, of course, base cases).

*Conquer:* The solution to the original problem is then formed from the solutions to the subproblems.

We have already seen several divide and conquer algorithms. In Section 2.4.3, we saw an *O *(*n* log *n*) solution to the maximum subsequence sum problem. In Chapter 4, we saw linear-time tree traversal strategies. In Chapter 7, we saw the classic examples of divide and conquer, namely mergesort and quicksort, which have *O *(*n* log *n*) worst-case and average-case bounds, respectively.

We have also seen several examples of recursive algorithms that probably do not classify as divide and conquer, but merely reduce to a single simpler case. In Section 1.3, we saw a simple routine to print a number. In Chapter 2, we used recursion to perform efficient exponentiation. In Chapter 4, we examined simple search routines for binary search trees. In Section 6.6, we saw simple recursion used to merge leftist heaps. In Section 7.7, an algorithm was given for selection that takes linear average time. The disjoint set *find* operation was written recursively in Chapter 8. Chapter 9 showed routines to recover the shortest path in Dijkstra's algorithm and other procedures to perform depth-first search in graphs. None of these algorithms are really divide and conquer algorithms, because only one recursive call is performed.

We have also seen, in Section 2.4, a very bad recursive routine to compute the Fibonacci numbers. This could be called a divide and conquer algorithm, but it is terribly inefficient, because the problem really is not divided at all.

10.2.1. Running Time of Divide and Conquer Algorithms

All the efficient divide and conquer algorithms we will see divide the problems into subproblems, each of which is some fraction of the original problem, and then perform some additional work to compute the final answer. As an example, we have seen that mergesort operates on two problems, each of which is half the size of the original, and then uses *O*(*n*) additional work. This yields the running time equation (with appropriate initial conditions)

T(n) = 2T(n/2) +O(n)

We saw in Chapter 7 that the solution to this equation is *O*(*n* log *n*). The following theorem can be used to determine the running time of most divide and conquer algorithms.

*The solution to the equation T(n)* = *aT*(*n/b)* + (*n ^{k}*),

Following the analysis of mergesort in Chapter 7, we will assume that *n* is a power of *b*; thus, let *n* = *b ^{m}*. Then

T(b) =^{m}aT(bl)+(^{m-}b)^{k}^{m}

If we divide through by *a ^{m}*, we obtain the equation

We can apply this equation for other values of *m*, obtaining

We use our standard trick of adding up the telescoping equations (10.3) through (10.6). Virtually all the terms on the left cancel the leading terms on the right, yielding

If *a* > *b ^{k}*, then the sum is a geometric series with ratio smaller than 1. Since the sum of infinite series would converge to a constant, this finite sum is also bounded by a constant, and thus Equation (10.10) applies:

*T*(*n*) = *O*(*a ^{m}*) =

*T*(*n*) = *O*(*a ^{m }*log

Finally, if *a* < *b ^{k}*, then the terms in the geometric series are larger than 1, and the second formula in Section 1.2.3 applies. We obtain

proving the last case of the theorem.

There are two important cases that are not covered by Theorem 10.6. We state two more theorems, leaving the proofs as exercises. Theorem 10.7 generalizes the previous theorem.

*The solution to the equation* *T*(*n*) = *aT*(*n*/*b*) + (*n ^{k }*log

, *then the solution to the equation * *is* *T*(*n*) = *O*(*n*).

The input to our first problem is a list *P* of points in a plane. If *p*_{l} = (*x*_{1}, *y*_{1})* and p*_{2} = (*x*_{2}*, y*_{2}), then the Euclidean distance between *p*l and *p*2 is [(*x*_{1} - *x*_{2})^{2} + (*y*_{l} - *y*_{2})^{2}]^{l/2}. We are required to find the closest pair of points. It is possible that two points have the same position; in that case that pair is the closest, with distance zero.

Figure 10.29 shows a small sample point set *P*. Since the points are sorted by *x* coordinate, we can draw an imaginary vertical line that partitions the points set into two halves, *P _{l}* and

We can compute *d _{l}* and

Let * = min(*d_{l}*, *d_{r}*). The first observation is that we only need to compute *d_{c}* if *d_{c}* improves on *. If *d _{c}* is such a distance, then the two points that define

There are two strategies that can be tried to compute *d _{c}*. For large point sets that are uniformly distributed, the number of points that are expected to be in the strip is very small. Indeed, it is easy to argue that only points are in the strip on average. Thus, we could perform a brute force calculation on these points in

/* Points are all in the strip */

for( i=0; i<NUM_POINTS_IN_STRIP; i++ )

for( j=i+1; j<NUM_POINTS_IN_STRIP; j++ )

if(dist(p)_{i},p_{j }<)

=dist(p);_{i},p_{j}

/* Points are all in the strip and sorted by y coordinate */

for( i=0; i<NUM_POINTS_IN_STRIP; i++ )

for( j=i+1; j<NUM_POINTS_IN_STRIP; j++ )

if (piandpj's coordinates differ by more than )

break; /* goto nextpi*/

else

if(dist(p) < )_{i}, p_{j}

=dist(p);_{i}, p_{j}

In the worst case, all the points could be in the strip, so this strategy does not always work in linear time. We can improve this algorithm with the following observation: The *y* coordinates of the two points that define *d _{c}* can differ by at most

This extra test has a significant effect on the running time, because for each *p _{i} *only a few points

In the worst case, for any point *p _{i}*, at most 7 points

Because at most seven points are considered for each *p _{i}*, the time to compute a

The problem is that we have assumed that a list of points sorted by y coordinate is available. If we perform this sort for each recursive call, then we have *O*(*n* log *n*) extra work: this gives an O(*n log*^{2}* n*) algorithm. This is not all that bad, especially when compared to the brute force *O*(*n*^{2}). However, it is not hard to reduce the work for each recursive call to *O*(*n*), thus ensuring an *O*(*n *log *n*) algorithm.

We will maintain two lists. One is the point list sorted by *x* coordinate, and the other is the point list sorted by *y* coordinate. We will call these lists *P* and *Q*, respectively. These can be obtained by a preprocessing sorting step at cost *O*(*n *log *n*) and thus does not affect the time bound. *P _{l}* and

This strategy ensures that the entire algorithm is *O *(*n *log* n*), because only *O *(*n*) extra work is performed.

The *selection problem* requires us to find the *k*th smallest element in a list *S* of *n *elements. Of particular interest is the special case of finding the median. This occurs when *k = **n*/2.

In Chapters 1, 6, 7 we have seen several solutions to the selection problem. The solution in Chapter 7 uses a variation of quicksort and runs in *O*(*n*) average time. Indeed, it is described in Hoare's original paper on quicksort.

Although this algorithm runs in linear average time, it has a worst case of *O *(*n*^{2}). Selection can easily be solved in *O*(*n *log* n*) worst-case time by sorting the elements, but for a long time it was unknown whether or not selection could be accomplished in *O*(*n*) worst-case time. The *quickselect* algorithm outlined in Section 7.7.6 is quite efficient in practice, so this was mostly a question of theoretical interest.

Recall that the basic algorithm is a simple recursive strategy. Assuming that *n* is larger than the cutoff point where elements are simply sorted, an element *v*, known as the pivot, is chosen. The remaining elements are placed into two sets, *S*_{1} and *S*_{2}. *S*_{1} contains elements that are guaranteed to be no larger than *v*, and *S*_{2} contains elements that are no smaller than *v*. Finally, if *k* |*S*_{1}|, then the *k*th smallest element in *S* can be found by recursively computing the *k*th smallest element in *S*_{1}. If *k* = |*S*_{1}| + 1, then the pivot is the *k*th smallest element. Otherwise, the *k*th smallest element in *S* is the (*k* - |*S*_{1}| -1 )st smallest element in *S*_{2}. The main difference between this algorithm and quicksort is that there is only one subproblem to solve instead of two.

In order to obtain a linear algorithm, we must ensure that the subproblem is only a fraction of the original and not merely only a few elements smaller than the original. Of course, we can always find such an element if we are willing to spend some time to do so. The difficult problem is that we cannot spend too much time finding the pivot.

For quicksort, we saw that a good choice for pivot was to pick three elements and use their median. This gives some expectation that the pivot is not too bad, but does not provide a guarantee. We could choose 21 elements at random, sort them in constant time, use the 11th largest as pivot, and get a pivot that is even more likely to be good. However, if these 21 elements were the 21 largest, then the pivot would still be poor. Extending this, we could use up to *O *(*n */ log* n*) elements, sort them using heapsort in *O*(*n*) total time, and be almost certain, from a statistical point of view, of obtaining a good pivot. In the worst case, however, this does not work because we might select the *O *(*n */ log *n*) largest elements, and then the pivot would be the [*n - O*(*n */ log* n*)]th largest element, which is not a constant fraction of *n*.

The basic idea is still useful. Indeed, we will see that we can use it to improve the expected number of comparisons that quickselect makes. To get a good worst case, however, the key idea is to use one more level of indirection. Instead of finding the median from a sample of random elements, we will find the median from a *sample of medians*.

The basic pivot selection algorithm is as follows:

1. Arrange the *n* elements into *n*/5 groups of 5 elements, ignoring the (at most four) extra elements.

2. Find the median of each group. This gives a list *M* of *n*/5 medians.

3. Find the median of *M*. Return this as the pivot, *v*.

We will use the term *median-of-median-of-five partitioning* to describe the quickselect algorithm that uses the pivot selection rule given above. We will now show that median-of-median-of-five partitioning guarantees that each recursive subproblem is at most roughly 70 percent as large as the original. We will also show that the pivot can be computed quickly enough to guarantee an *O *(*n*) running time for the entire selection algorithm.

Let us assume for the moment that *n* is divisible by 5, so there are no extra elements. Suppose also that *n*/5 is odd, so that the set *M* contains an odd number of elements. This provides some symmetry, as we shall see. We are thus assuming, for convenience, that *n* is of the form 10*k* + 5. We will also assume that all the elements are distinct. The actual algorithm must make sure to handle the case where this is not true. Figure 10.36 shows how the pivot might be chosen when *n* = 45.

In Figure 10.36, *v* represents the element which is selected by the algorithm as pivot. Since *v* is the median of nine elements, and we are assuming that all elements are distinct, there must be four medians that are larger than *v* and four that are smaller. We denote these by *L* and *S*, respectively. Consider a group of five elements with a large median (type *L*). The median of the group is smaller than two elements in the group and larger than two elements in the group. We will let *H* represent the *huge* elements. These are elements that are known to be larger than a large median. Similarly, *T* represents the *tiny* elements, which are smaller than a small median. There are 10 elements of type *H*: Two are in each of the groups with an *L *type median, and two elements are in the same group as *v*. Similarly, there are 10 elements of type *T*.

Elements of type *L* or *H* are guaranteed to be larger than *v*, and elements of type *S* or *T* are guaranteed to be smaller than *v*. There are thus guaranteed to be 14 large and 14 small elements in our problem. Therefore, a recursive call could be on at most 45 - 14 - 1 = 30 elements.

*The running time of quickselect using median-of-median-of-five partitioning is *O(*n*)*.*

The algorithm consists of two recursive calls of size 0.7*n* and 0.2*n*, plus linear extra work. By Theorem 10.8, the running time is linear.

Reducing the Average Number of Comparisons

Divide and conquer can also be used to reduce the expected number of comparisons required by the selection algorithm. Let us look at a concrete example. Suppose we have a group *S* of 1,000 numbers and are looking for the 100th smallest number, which we will call *x*. We choose a subset *S*'* of *S* consisting of 100 numbers. We would expect that the value of *x* is similar in size to the 10th smallest number in *S*'*. More specifically, the fifth smallest number in *S*'* is almost certainly less than *x*, and the 15th smallest number in *S*'* is almost certainly greater than *x*.

More generally, a sample *S**'* of *s* elements is chosen from the *n* elements. Let be some number, which we will choose later so as to minimize the average number of comparisons used by the procedure. We find the (*v*_{1}* = ks/n - *)th and (*v2 = ks/n + *)th smallest elements in *S*'. Almost certainly, the *k*th smallest element in *S* will fall between *v*_{1} and *v*_{2}, so we are left with a selection problem on 2 elements. With low probability, the *k*th smallest element does not fall in this range, and we have considerable work to do. However, with a good choice of *s* and , we can ensure, by the laws of probability, that the second case does not adversely affect the total work.

If an analysis is performed, we find that if *s = n*^{2/3 }log^{1/3 }*n* and = *n*^{1/3 }log^{2/3 }*n*, then the expected number of comparisons is *n + k + O*(*n*^{2/3 }log^{1/3 }*n*), which is optimal except for the low-order term. (If *k > n/*2, we can consider the symmetric problem of finding the (*n - k*)th largest element.)

Most of the analysis is easy to do. The last term represents the cost of performing the two selections to determine *v*_{1} and *v*_{2}. The average cost of the partitioning, assuming a reasonably clever strategy, is equal to *n* plus the expected rank of *v _{2 }*in

This analysis shows that finding the median requires about 1.5*n* comparisons on average. Of course, this algorithm requires some floating-point arithmetic to compute *s*, which can slow down the algorithm on some machines. Even so, experiments have shown that if correctly implemented, this algorithm compares favorably with the quickselect implementation in Chapter 7.

In this section we describe a divide and conquer algorithm that multiplies two *n*-digit numbers. Our previous model of computation assumed that multiplication was done in constant time, because the numbers were small. For large numbers, this assumption is no longer valid. If we measure multiplication in terms of the size of numbers being multiplied, then the natural multiplication algorithm takes quadratic time. The divide and conquer algorithm runs in subquadratic time. We also present the classic divide and conquer algorithm that multiplies two *n* by *n *matrices in subcubic time.

xy=x10_{l}y_{l}^{8}+ (x+_{l}y_{r}x)10_{r}y_{l}^{4}+x_{r}y_{r}

T(n) = 4T(n/2) +O(n)

From Theorem 10.6, we see that T(*n*) = *O*(*n*^{2}), so, unfortunately, we have not improved the algorithm. To achieve a subquadratic algorithm, we must use less than four recursive calls. The key observation is that

x_{l}y_{r}+ x_{r}y_{l}= (x_{l}- x_{r})(y_{r}- y_{l}) + x_{l}y_{l}+ x_{r}y_{r}

Thus, instead of using two multiplications to compute the coefficient of 10^{4}, we can use one multiplication, plus the result of two multiplications that have already been performed. Figure 10.37 shows how only three recursive subproblems need to be solved.

It is easy to see that now the recurrence equation satisfies

T(n) = 3T(n/2) +O(n),

and so we obtain *T*(*n*) = *O*(*n*^{log23}) = *O*(*n*^{1.59}). To complete the algorithm, we must have a base case, which can be solved without recursion.

A fundamental numerical problem is the multiplication of two matrices. Figure 10.38 gives a simple *O*(*n*^{3}) algorithm to compute **C** = **AB**, where **A**, **B**, and **C** are *n* *n* matrices. The algorithm follows directly from the definition of matrix multiplication. To compute *C _{i}*,

For a long time it was assumed that (*n*^{3}) was required for matrix multiplication. However, in the late sixties Strassen showed how to break the (*n*^{3}) barrier. The basic idea of Strassen's algorithm is to divide each matrix into four quadrants, as shown in Figure 10.39. Then it is easy to show that

C_{1,1}=A_{1,1}B_{1,1}+A_{1,2}B_{2,1}

C_{1,2}=A_{1,1}B_{1,2}+A_{1,2}B_{2,2}

C_{2,1}=A_{2,1}B_{1,1 }+A_{2,2}B_{2,1}

C_{2,2}=A_{2,1}B_{1,2}+A_{2,2}B_{2,2}

/* Standard matrix multiplication. Arrays start at 0 */

void

matrix_multiply( matrix A, matrix B, matrix C, unsigned int n )

{

int i, j, k;

for( i=0; i<n; i++ ) /* Initialization */

for( j=O; j<n; j++ )

C[i][j] = 0.0;

for( i=0; i<n; i++ )

for( j=0; j<n; j++ )

for( k=0; k<n; k++ )

C[i][j] += A[i][k] * B[k][j];

}

As an example, to perform the multiplication **AB**

we define the following eight *n*/2 by *n*/2 matrices:

T(n) = 8T(n/2) +O(n^{2}).

From Theorem 10.6, we see that *T*(*n*) = *O*(*n*^{3}), so we do not have an improvement. As we saw with integer multiplication, we must reduce the number of subproblems below 8. Strassen used a strategy similar to the integer multiplication divide and conquer algorithm and showed how to use only seven recursive calls by carefully arranging the computations. The seven multiplications are

M_{1}= (A_{1,2}-A_{2,2})(B_{2,1}+B_{2,2})

M_{2}= (A_{1,1}+A_{2,2})(B_{1,1}+B_{2,2})

M_{3}= (A_{1,1}-A_{2,1})(B_{1,1}+B_{1,2})

M_{4}= (A_{1,1}+A_{1,2})B_{2,2}

M_{5}=A_{1,1}(B_{1,2}-B_{2,2})

M_{6}=A_{2,2}(B_{2,1}-B_{1,1})

M_{7}= (A_{2,1}+A_{2,2})B_{1,1}

Once the multiplications are performed, the final answer can be obtained with eight more additions.

C_{1,1}=M_{1}+M_{2}-M_{4}+M_{6}

C_{1,2}=M_{4}+M_{5}

C_{1,3}=M_{6}+M_{7}

C_{1,4}=M_{2}-M_{3}+M_{5}-M_{7}

It is straightforward to verify that this tricky ordering produces the desired values. The running time now satisfies the recurrence

T(n) = 7T(n/2) +O(n^{2}).

The solution of this recurrence is *T*(*n*) = *O*(*n*^{log27}) = *O*(*n*^{2.81}).

As usual, there are details to consider, such as the case when *n* is not a power of two, but these are basically minor nuisances. Strassen's algorithm is worse than the straightforward algorithm until *n* is fairly large. It does not generalize for the case where the matrices are sparse (contain many zero entries), and it does not easily parallelize. When run with floating-point entries, it is less stable numerically than the classic algorithm. Thus, it is has only limited applicability. Nevertheless, it represents an important theoretical milestone and certainly shows that in computer science, as in many other fields, even though a problem seems to have an intrinsic complexity, nothing is certain until proven.

In the previous section, we have seen that a problem that can be mathematically expressed recursively can also be expressed as a recursive algorithm, in many cases yielding a significant performance improvement over a more naïve exhaustive search.

In Chapter 2, we saw that the natural recursive program to compute the Fibonacci numbers is very inefficient. Recall that the program shown in Figure 10.40 has a running time T(*n*) that satisfies T(*n*) *T*(*n* - 1) + *T*(*n* - 2). Since *T*(*n*) satisfies the same recurrence relation as the Fibonacci numbers and has the same initial conditions, *T*(*n*) in fact grows at the same rate as the Fibonacci numbers, and is thus exponential.

On the other hand, since to compute *F _{n}*, all that is needed is

The reason that the recursive algorithm is so slow is because of the algorithm used to simulate recursion. To compute *F _{n}*, there is one call to

/* Compute Fibonacci numbers as described in Chapter 1 */

unsigned int

fib( unsigned int n )

{

if( n<= 1 )

return 1;

else

return( fib( n-1 ) + fib( n-2 ) );

}

unsigned int

fibonacci( unsigned int n )

{

unsigned int i, last, next_to_last, answer;

if( n<= 1 )

return 1;

last = next_to_last = 1;

for( i = 2; i<= n; i++ )

{

answer = last + next_to_last;

next_to_last = last;

last = answer;

}

return answer;

}

double

eval( unsigned int n )

{

int i;

double sum;

if( n == 0 )

return 1.0;

else

{

sum = 0.0;

for( i=0; i<n; i++ )

sum += eval(i);

return( 2.0 * sum / n + n );

}

}

As a second example, we saw in Chapter 7 how to solve the recurrence with *C*(0) = 1. Suppose that we want to check, numerically, whether the solution we obtained is correct. We could then write the simple program in Figure 10.43 to evaluate the recursion.

Once again, the recursive calls duplicate work. In this case, the running time *T*(*n*) satisfies because, as shown in Figure 10.44, there is one (direct) recursive call of each size from 0 to *n* -1, plus *O*(*n*) additional work (where else have we seen the tree shown in Figure 10.44?). Solving for *T*(*n*), we find that it grows exponentially. By using a table, we obtain the program in Figure 10.45. This program avoids the redundant recursive calls and runs in *O*(*n*^{2}). It is not a perfect program; as an exercise, you should make the simple change that reduces its running time to *O*(*n*).

double

eval( unsigned int n )

{

int i,j;

double sum, answer;

double *c;

c = (double*) malloc( sizeof (double)*(n+1) );

if( c == NULL )

fatal_error("Out of space!!!");

c[0] = 1.0;

for( i=1; i<=n; i++ ) /* EvaluateC, 1_{i}in*/

{

sum = 0.0;

/* i-1 */

for( j=0; j<i; j++ ) /* EvaluateC*/_{j}

/* j=0 */

sum += c[j];

c[i] = 2.0 * sum/i + i;

}

answer = c[n];

free( c );

return answer;

}

Suppose we are given four matrices, **A, B, C,** and **D,** of dimensions **A** = 50 X 10, **B** = 10 X 40, **C** = 40 X 30, and **D** = 30 X 5. Although matrix multiplication is not commutative, it is associative, which means that the matrix product **ABCD** can be parenthesized, and thus evaluated, in any order. The obvious way to multiply two matrices of dimensions *p* X *q* and *q* X *r*, respectively, uses *pqr* scalar multiplications. (Using a theoretically superior algorithm such as Strassen''s algorithm does not significantly alter the problem we will consider, so we will assume this performance bound.) What is the best way to perform the three matrix multiplications required to compute **ABCD**?

The solution of this recurrence is the well-known Catalan numbers, which grow exponentially. Thus, for large *n*, an exhaustive search through all possible orderings is useless. Nevertheless, this counting argument provides a basis for a solution that is substantially better than exponential. Let *c _{i}* be the number of columns in matrix

If we want to print out the actual ordering of the multiplications in addition to the final answer *M*_{1,n}, then we can use the ideas from the shortest-path algorithms in Chapter 9. Whenever we make a change to *M _{Left,Right}*, we record the value of

Although the emphasis of this chapter is not coding, it is worth noting that many programmers tend to shorten variable names to a single letter. *c, i*, and *k* are used as single-letter variables because this agrees with the names we have used in the description of the algorithm, which is very mathematical. However, it is generally best to avoid *l* as a variable name, because "l" looks too much like 1 and can make for very difficult debugging if you make a transcription error.

Returning to the algorithmic issues, this program contains a triply nested loop and is easily seen to run in *O(n ^{3}*) time. The references describe a faster algorithm, but since the time to perform the actual matrix multiplication is still likely to be much larger than the time to compute the optimal ordering, this algorithm is still quite practical.

/* Compute optimal ordering of matrix multiplication */

/* c contains number of columns for each of the n matrices */

/* c[0] is the number of rows in matrix 1 */

/* Minimum number of multiplications is left in M[1][n] */

/* Actual ordering can be computed via */

/* another procedure using last_change */

/* M and last_change are indexed starting at 1, instead of zero */

void

opt_matrix( int c[], unsigned int n, two_d_array M,

two_d_array last_change)

{

int i, k, Left, Right, this_M;

for( Left = 1; Left<= n; Left++ )

M[Left][Left] = 0;

for( k = 1; k<n; k++) /* k is Right-Left */

for( Left = 1; Left<= n-k; Left++ )

{ /* for each position */

Right = Left + k;

M[Left][Right] = INT_MAX;

for( i = Left; i<Right; i++ )

{

this_M = M[Left][i] + M[i+1][Right]

+ c[Left-1] * c[i] * c[Right];

if( this_M<M[Left][Right] ) /* Update min */

{

M[Left][Right] = this_M;

last_change[Left][Right] = i;

}

}

}

}

Our second dynamic programming example considers the following input: We are given a list of words, *w _{1}, w_{2},..., w_{n}*, and

As an example, Figure 10.47 shows seven words along with their probability of occurrence in some context. Figure 10.48 shows three possible binary search trees. Their searching costs are shown in Figure 10.49.

The first tree was formed using a greedy strategy. The word with the highest probability of being accessed was placed at the root. The left and right subtrees were then formed recursively. The second tree is the perfectly balanced search tree. Neither of these trees is optimal, as demonstrated by the existence of the third tree. From this we can see that neither of the obvious solutions works.

This is initially surprising, since the problem appears to be very similar to the construction of a Huffman encoding tree, which, as we have already seen, can be solved by a greedy algorithm. Construction of an optimal binary search tree is harder, because the data is not constrained to appear only at the leaves, and also because the tree must satisfy the binary search tree property.

A dynamic programming solution follows from two observations. Once again, suppose we are trying to place the (sorted) words *w _{Left}, w_{Left+}*1

If *Left > Right*, then the cost of the tree is 0; this is the *NULL* case, which we always have for binary search trees. Otherwise, the root costs *p _{i}*. The left subtree has a cost of

From this equation, it is straightforward to write a program to compute the cost of the optimal binary search tree. As usual, the actual search tree can be maintained by saving the value of *i* that minimizes *C _{Left,Right.}* The standard recursive routine can be used to print the actual tree.

Word Probability

-----------------

a 0.22

am 0.18

and 0.20

egg 0.05

if 0.25

the 0.02

two 0.08

Input Tree1 Tree#2 Tree#3#

-----------------------------------------------------------------

Word Probability Access Cost Access Cost Access Cost

w_{i}p_{i }Once Sequence Once Sequence Once Sequence

-----------------------------------------------------------------

a 0.22 2 0.44 3 0.66 2 0.44

am 0.18 4 0.72 2 0.36 3 0.54

and 0.20 3 0.60 3 0.60 1 0.20

egg 0.05 4 0.20 1 0.05 3 0.15

if 0.25 1 0.25 3 0.75 2 0.50

the 0.02 3 0.06 2 0.04 4 0.08

two 0.08 2 0.16 3 0.24 3 0.24

-----------------------------------------------------------------

Totals 1.00 2.43 2.70 2.15

Figure 10.51 shows the table that will be produced by the algorithm. For each subrange of words, the cost and root of the optimal binary search tree are maintained. The bottommost entry, of course, computes the optimal binary search tree for the entire set of words in the input. The optimal tree is the third tree shown in Fig. 10.48.

The precise computation for the optimal binary search tree for a particular subrange, namely *am..if,* is shown in Figure 10.52. It is obtained by computing the minimum-cost tree obtained by placing *am, and, egg,* and *if* at the root. For instance, when *and* is placed at the root, the left subtree contains *am..am* (of cost 0.18, via previous calculation), the right subtree contains *egg..if* (of cost 0.35), and , for a total cost of 1.21.

Our third and final dynamic programming application is an algorithm to compute shortest weighted paths between every pair of points in a directed graph *G* = (*V*, *E*). In Chapter 9, we saw an algorithm for the *single-source shortest-path *problem, which finds the shortest path from some arbitrary vertex *s* to all others. That algorithm (Dijkstra's) runs in *O*(*V*^{2}) time on dense graphs, but substantially faster on sparse graphs. We will give a short algorithm to solve the all-pairs problem for dense graphs. The running time of the algorithm is *O*(*V*^{3}), which is not an asymptotic improvement over *V* iterations of Dijkstra's algorithm but could be faster on a very dense graph, because its loops are tighter. The algorithm also performs correctly if there are negative edge costs, but no negative-cost cycles; Dijkstra's algorithm fails in this case.

Let us recall the important details of Dijkstra's algorithm (the reader may wish to review Section 9.3). Dijkstra's algorithm starts at a vertex *s* and works in stages. Each vertex in the graph is eventually selected as an intermediate vertex. If the current selected vertex is *v*, then for each *w* *V*, we set *d _{w}* = min(

Dijkstra's algorithm provides the idea for the dynamic programming algorithm: we select the vertices in sequential order. We will define *D _{k,i,j}* to be the weight of the shortest path from

/* Compute All-Shortest Paths */

/* A[] contains the adjacency matrix */

/* with A[i][i] presumed to be zero */

/* D[] contains the values of shortest path */

/* |V| is the number of vertices */

/* A negative cycle exists iff */

/* d[i][j] is set to a negative value at line 9 */

/* Actual Path can be computed via another procedure using path */

/* All arrays are indexed starting at 0 */

void

all_pairs( two_d_array A, two_d_array D, two_d_array path )

{

int i, j, k;

/*1*/ for( i = 0; i<|V|; i++ ) /* Initialize D and path */

/*2*/ for( j = 0; j<|V|; j++ )

{

/*3*/ D[i][j] = A[i][j];

/*4*/ path[i][j] = NOT_A_VERTEX;

}

/*5*/ for( k = 0; k<|v|; k++ )

/* Consider each vertex as an intermediate */

/*6*/ for( i = 0; i<|V|; i++ )

/*7*/ for( j = 0; j<|V|; j++ )

/*8*/ if( d[i][k] + d[k][j]<d[i][j] )

/*update min */

{

/*9*/ d[i][j] = d[i][k] + d[k][j];

/*10*/ path[i][j] = k;

}

}

As Figure 10.53 shows, when *k* > 0 we can write a simple formula for *D _{k,i,j}*. The shortest path from

D,_{k}i,j= min{D1,_{k -}i,j,D1,_{k -}i,k+D1,_{k -}k,j}

The time requirement is once again *O*(*|V*|* ^{3}*). Unlike the two previous dynamic programming examples, this time bound has not been substantially lowered by another approach. Because the

However, using *k* as an *intermediate* vertex on a path that starts or finishes with *k* does not improve the result unless there is a negative cycle. Thus, only one matrix is necessary, because *D _{k-}*1,

On a complete graph, where every pair of vertices is connected (in both directions), this algorithm is almost certain to be faster than |V| iterations of Dijkstra's algorithm, because the loops are so tight. Lines 1 through 4 can be executed in parallel, as can lines 6 through 10. Thus, this algorithm seems to be well-suited for parallel computation.

Dynamic programming is a powerful algorithm design technique, which provides a starting point for a solution. It is essentially the divide and conquer paradigm of solving simpler problems first, with the important difference being that the simpler problems are not a clear division of the original. Because subproblems are repeatedly solved, it is important to record their solutions in a table rather than recompute them. In some cases, the solution can be improved (although it is certainly not always obvious and frequently difficult), and in other cases, the dynamic programming technique is the best approach known.

In some sense, if you have seen one dynamic programming problem, you have seen them all. More examples of dynamic programming can be found in the exercises and references.

Suppose you are a professor who is giving weekly programming assignments. You want to make sure that the students are doing their own programs or, at the very least, understand the code they are submitting. One solution is to give a quiz on the day that each program is due. On the other hand, these quizzes take time out of class, so it might only be practical to do this for roughly half of the programs. Your problem is to decide when to give the quizzes.

Since our algorithms require random numbers, we must have a method to generate them. Actually, true randomness is virtually impossible to do on a computer, since these numbers will depend on the algorithm, and thus cannot possibly be random. Generally, it suffices to produce *pseudorandom* numbers, which are numbers that appear to be random. Random numbers have many known statistical properties; pseudorandom numbers satisfy most of these properties. Surprisingly, this too is much easier said than done.

^{ç}We will use random in place of pseudorandom in the rest of this section.

The standard method to generate random numbers is the linear congruential generator, which was first described by Lehmer in 1951. Numbers *x*_{1}, *x*_{2}, . . . are generated satisfying

x+ 1 =_{i}ax._{i}mod m

7, 5, 2, 3, 10, 4, 6, 9, 8, 1, 7, 5, 2, . . .

5, 3, 4, 9, 1, 5, 3, 4, . . .

It is also common to return a random real number in the open interval (0, 1) (0 and 1 are not possible values); this can be done by dividing by *m*. From this, a random number in any closed interval [*a, b*] can be computed by normalizing. This yields the "obvious" routine in Figure 10.54 which, unfortunately, works on few machines.

The problem with this routine is that the multiplication could overflow; although this is not an error, it affects the result and thus the pseudo-randomness. Schrage gave a procedure in which all of the calculations can be done on a 32-bit machine without overflow. We compute the quotient and remainder of *m/a* and define these as *q* and *r*, respectively. In our case, *q* = 127,773, *r* = 2,836, and *r* < *q*. We have

unsigned int seed; /* global variable */

#define a 16807 /* 7^5 */

#define m 2147483647 /* 2^31 - 1 */

double

random( void )

{

seed = ( a * seed ) % m;

return( ( (double) seed ) / m );

}

Since , we can replace the leading *ax _{i} *and obtain

Since *m* = *aq* + *r*, it follows that *aq* - *m* = -*r*. Thus, we obtain

A quick check shows that because *r* < *q*, all the remaining terms can be calculated without overflow (this is one of the reasons for chosing *a* = 7^{5}). Furthermore, (*x _{i}*) = 1 only if the remaining terms evaluate to less than zero. Thus (

This program works as long as *INT_MAX* 2^{31 }- 1. One might be tempted to assume that all machines have a random number generator at least as good as the one in Figure 10.55 in their standard library. Sadly, this is not true. Many libraries have generators based on the function

x+1 = (_{i}ax+_{i}c)mod2^{b}

where *b* is chosen to match the number of bits in the machine's integer, and *c* is odd. These libraries also return *x _{i}*, instead of a value between 0 and 1. Unfortunately, these generators always produce values of

x+1 = (16807_{i}x+ 1)_{i}mod(2^{31 }- 1)

would somehow be even more random. This illustrates how fragile these generators are.

[16807(1319592028) + 1]mod(2^{31}-1) = 1319592028,

so if the seed is 1,319,592,028, the generator gets stuck in a cycle of period 1.

unsigned int seed; /* global variable */

#define a 16807 /* 7^5 */

#define m 2147483647 /* 2^31 - 1*/

#define q 127773 /* m/a */

#define r 2836 /* m%a */

double

random( void )

{

int tmp_seed;

tmp_seed = a * ( seed % q ) - r * (seed / q );

if( tmp_seed>= 0)

seed = tmp_seed;

else

seed = tmp_seed + m;

return( ( (double) seed ) / m );

}

Our first use of randomization is a data structure that supports both searching and insertion in *O*(log *n*) expected time. As mentioned in the introduction to this section, this means that the running time for each operation on *any input sequence* has expected value *O*(log *n*), where the expectation is based on the random number generator. It is possible to add deletion and all the operations that involve ordering and obtain expected time bounds that match the average time bounds of binary search trees.

The simplest possible data structure to support searching is the linked list. Figure 10.56 shows a simple linked list. The time to perform a search is proportional to the number of nodes that have to be examined, which is at most *n*.

Figure 10.57 shows a linked list in which every other node has an additional pointer to the node two ahead of it in the list. Because of this, at most *n*/2 + 1 nodes are examined in the worst case.

We can extend this idea and obtain Figure 10.58. Here, every fourth node has a pointer to the node four ahead. Only *n*/4 + 2 nodes are examined.

The limiting case of this argument is shown in Figure 10.59. Every 2* ^{i}*th node has a pointer to the node 2

The problem with this data structure is that it is much too rigid to allow efficient insertion. The key to making this data structure usable is to relax the structure conditions slightly. We define a *level k node* to be a node that has *k* pointers. As Figure 10.59 shows, the *i*th pointer in any level *k* node (*k ** i*) points to the next node with at least *i* levels. This is an easy property to maintain; however, Figure 10.59 shows a more restrictive property than this. We thus drop the restriction that the *i*th pointer points to the node 2* ^{i}* ahead, and we replace it with the less restrictive condition above.

When it comes time to insert a new element, we allocate a new node for it. We must at this point decide what level the node should be. Examining Figure 10.59, we find that roughly half the nodes are level 1 nodes, roughly a quarter are level 2, and, in general, approximately 1/2* ^{i}* nodes are level

Given this, the skip list algorithms are simple to describe. To perform a *find*, we start at the highest pointer at the header. We traverse along this level until we find that the next node is larger than the one we are looking for (or ). When this occurs, we go to the next lower level and continue the strategy. When progress is stopped at level 1, either we are in front of the node we are looking for, or it is not in the list. To perform an *insert*, we proceed as in a *find*, and keep track of each point where we switch to a lower level. The new node, whose level is determined randomly, is then spliced into the list. This operation is shown in Figure 10.61.

A cursory analysis shows that since the expected number of nodes at each level is unchanged from the original (nonrandomized) algorithm, the total amount of work that is expected to be performed traversing to nodes on the same level is unchanged. This tells us that these operations have *O*(log *n*) *expected* costs. Of course, a more formal proof is required, but it is not much different from this.

In this section we examine the problem of determining whether or not a large number is prime. As was mentioned at the end of Chapter 2, some cryptography schemes depend on the difficulty of factoring a large, 200-digit number into two 100-digit primes. In order to implement this scheme, we need a method of generating these two primes. The problem is of major theoretical interest, because nobody now knows how to test whether a *d*-digit number *n* is prime in time polynomial in *d*. For instance, the obvious method of testing for the divisibility by odd numbers from 3 to requires roughly divisions, which is about 2* ^{d/2}*. On the other hand, this problem is not thought to be

In this chapter, we will give a polynomial-time algorithm that can test for primality. If the algorithm declares that the number is not prime, we can be certain that the number is not prime. If the algorithm declares that the number is prime, then, with high probability but not 100 percent certainty, the number is prime. The error probability does not depend on the particular number that is being tested but instead depends on random choices made by the algorithm. Thus, this algorithm occasionally makes a mistake, but we will see that the error ratio can be made arbitrarily negligible.

The key to the algorithm is a well-known theorem due to Fermat.

*Fermat's Lesser Theorem: If p is prime, and 0 < a < p, then a ^{p-}*1

A proof of this theorem can be found in any textbook on number theory.

Although this seems to work, there are numbers that fool even this algorithm for most choices of *a*. One such set of numbers is known as the *Carmichael numbers*. These are not prime but satisfy *a ^{n}*-1 1(

In Chapter 7, we proved a theorem related to quadratic probing. A special case of this theorem is the following:

*If p is prime and 0 < x < p, the only solutions to x*^{2}* *1*(mod p) are x = *1*, p -* 1*.*

Therefore, if at any point in the computation of *a*^{n-1}*mod n* we discover a violation of this theorem, we can conclude that *n* is definitely not prime. If we use *power*, from Section 2.4.4, we see that there will be several opportunities to apply this test. We modify this routine to perform operations *mod n*, and apply the test of Theorem 10.11. This strategy is implemented in Figure 10.62. Because *power* needs to return two pieces of information, we pass the address of these items ( *result* and *what_n_is* ) by pointers.

Recall that if *test_prime* returns *DEFINITELY_COMPOSITE*, it has *proven* that *n* cannot be prime. The proof is nonconstructive, because it gives no method of actually finding the factors. It has been shown that for any (sufficiently large) *n*, at most (*n* - 9)/4 values of *a* fool this algorithm. Thus, if *a* is chosen at random, and the algorithm answers *PROBABLY_PRIME*, then the algorithm is correct at least 75 percent of the time. Suppose *test_prime* is run 50 times. The probability that the algorithm is fooled once is at most 1/4. Thus, the probability that 50 independent random trials fool the algorithm is never more than 1/4^{50} = 2^{-100}. This is actually a very conservative estimate, which holds for only a few choices of *n*. Even so, one is more likely to see a hardware error than an incorrect claim of primality.

The last algorithm design technique we will examine is *backtracking*. In many cases, a backtracking algorithm amounts to a clever implementation of exhaustive search, with generally unfavorable performance. This is not always the case, however, and even so, in some cases, the savings over a brute force exhaustive search can be significant. Performance is, of course, relative: An *O*(*n*^{2}) algorithm for sorting is pretty bad, but an *O*(*n*^{5}) algorithm for the traveling salesman (or any *NP*-complete) problem would be a landmark result.

A practical example of a backtracking algorithm is the problem of arranging furniture in a new house. There are many possibilities to try, but typically only a few are actually considered. Starting with no arrangement, each piece of furniture is placed in some part of the room. If all the furniture is placed and the owner is happy, then the algorithm terminates. If we reach a point where all subsequent placement of furniture is undesirable, we have to undo the last step and try an alternative. Of course, this might force another undo, and so forth. If we find that we undo all possible first steps, then there is no placement of furniture that is satisfactory. Otherwise, we eventually terminate with a satisfactory arrangement. Notice that although this algorithm is essentially brute force, it does not try all possibilities directly. For instance, arrangements that consider placing the sofa in the kitchen are never tried. Many other bad arrangements are discarded early, because an undesirable subset of the arrangement is detected. The elimination of a large group of possibilities in one step is known as *pruning*.

We will see two examples of backtracking algorithms. The first is a problem in computational geometry. Our second example shows how computers select moves in games, such as chess and checkers.

Suppose we are given *n* points, *p*_{1}, *p*_{2}, . . . , *p _{n}*, located on the

enum test_result { PROBABLY_PRIME, DEFINITELY_COMPOSITE };

typedef enum test_result test_result;

/* Compute result = a^{p}mod n. */

/* If at any point x^{2}1(mod n) is detected with x 1, x n - 1, */

/* then set what_n_is to DEFINITELY_COMPOSITE */

/* We are assuming very large integers, so this is pseudocode. */

void

power( unsigned int a, unsigned int p, unsigned int n,

unsigned int *result, test_result *what_n_is )

{

unsigned int x;

/*1*/ if( p = 0 ) /* Base case */

/*2*/ *result = 1;

else

{

/*3*/ power( a, p/2, n, &x, what_n_is );

/*4*/ *result = (x * x) % n;

/* Check whetherx^{2}1(mod n),x1,xn- 1 */

/*5*/ if( (*result = 1) && (x != 1) && (x != n-1) )

/*6*/ *what_n_is = DEFINITELY_COMPOSITE;

/* If p is odd, we need one morea*/

/*7*/ if( (p % 2) = 1 )

/*8*/ *result = (*result * a) % n;

}

}

/* test_prime: Test whethern3 is prime using one value ofa*/

/* repeat this procedure as many times as needed */

/* for desired error rate */

test_result

test_prime( unsigned int n )

{

unsigned int a, result;

test_result what_n_is;

/*9*/ a = rand_int( 2, n-2 ); /* choosearandomly from 2..n-2 */

/*10*/ what_n_is = PROBABLY_PRIME;

/* Computea-1^{n}mod n*/

/*11*/ power( a, n-1, n, &result, &what_n_is );

/*12*/ if( ( result != 1)|| (what_n_is = DEFINITELY_COMPOSITE) )

/*13*/ return DEFINITELY_COMPOSITE;

else

/*14*/ return PROBABLY_PRIME;

}

Let *D* be the set of distances, and assume that | *D *| = *m* = *n*(*n* - 1)/2. As an example, suppose that

D= {1, 2, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 10}

Figure 10.63 shows a decision tree representing the actions taken to arrive at the solution. Instead of labeling the branches, we have placed the labels in the branches' destination nodes. A node with an asterisk indicates that the points chosen are inconsistent with the given distances; nodes with two asterisks have only impossible nodes as children, and thus represent an incorrect path.

int

turnpike(intx[], dist_setD, unsigned intn)

{

/*1*/x[1] = 0;

/*2*/x[n] = delete_max(D);

/*3*/x[n -1] = delete_max(D);

/*4*/ if(x[n]-x[n-1]D)

{

/*5*/ delete(x[n]-x[n - 1],D);

/*6*/ return place(x, D, n,2,n- 2); }

else

/*7*/ return FALSE;

}

The pseudocode to implement this algorithm is mostly straightforward. The driving routine, *turnpike*, is shown in Figure 10.64. It receives the point array *x* (which need not be initialized), the distance array *D*, and *n*.^{*} If a solution is discovered, then TRUE will be returned, the answer will be placed in *x*, and *D* will be empty. Otherwise, FALSE will be returned, *x* will be undefined, and the distance array *D* will be untouched. The routine sets *x*_{1}, *x _{n}*-1, and

^{*}We have used one-letter variable names, which is generally poor style, for consistency with the worked example. We also, for simplicity, do not give the type of variables.

The more difficult part is the backtracking algorithm, which is shown in Figure 10.65. Like most backtracking algorithms, the most convenient implementation is recursive. We pass the same arguments plus the boundaries *Left* and *Right*; *x _{Left}*, . . . ,

The analysis of the algorithm involves two factors. Suppose lines 9 through 11 and 18 through 20 are never executed. We can maintain *D* as a balanced binary search (or splay) tree (this would require a code modification, of course). If we never backtrack, there are at most *O*(*n*^{2}) operations involving *D*, such as deletion and the finds implied at lines 4 and 12 to 13. This claim is obvious for deletions, since *D* has *O*(*n*^{2}) elements and no element is ever reinserted. Each call to *place* uses at most *2n finds, *and since* place* never backtracks in this analysis, there can be at most *2n*^{2} *finds*. Thus, if there is no backtracking, the running time is *O*(*n*_{2 }log *n*).

Of course, backtracking happens, and if it happens repeatedly, then the performance of the algorithm is affected. No polynomial bound on the amount of backtracking is known, but on the other hand, there are no pathological examples that show that backtracking must occur more than *O*(1) times. Thus, it is entirely possible that this algorithm is *O*(*n*^{2 }log *n*). Experiments have shown that if the points have integer coordinates distributed uniformly and randomly from [0, *D*_{max}], where *D*_{max} = (*n*^{2}), then, almost certainly, at most one backtrack is performed during the entire algorithm.

As our last application, we will consider the strategy that a computer might use to play a strategic game, such as checkers or chess. We will use, as an example, the much simpler game of tic-tac-toe, because it makes the points easier to illustrate.

/_{*}Backtracking algorithm to place the points_{*}/

/_{*}x[left]...x[right]._{*}/

/_{*}x[1]...[left-1] and x[right+1]...x[n]

/_{*}are already tentatively placed_{* /}

/* If place returns true,

/* then x[left]...x[right] will have value. */

int

place( intx[ ], dist_setD, unsigned intn, intLeft, intRight)

{

intd_max, found = FALSE;

/_{*}1_{*}/ ifDis empty then

/_{*}2_{*}/ return TRUE;

/_{*}3_{*}/d_max= find_max(D);

/_{*}Check if settingx[Right] =d_maxis feasible._{*}/

/_{*}4_{*}/ if( |x[j]-d_max|Dfor all 1j<LeftandRight<jn)

{

/_{*}5_{*}/x[Right] =d_max; /_{*}Tryx[Right] =d_max_{*}/

/_{*}6_{*}/ for( 1j<Left,Right<jn)

/_{*}7_{*}/ delete( |x[j]-d_max|,D);

/_{*}8_{*}/ found = place(x,D,n,Left,Right-1 );

/_{*}9_{*}/ if( !found ) /_{*}Backtrack_{*}/

/_{*}10_{*}/ for( 1j<Left,Right<jn) /_{ }Undo the deletion_{*}/

/_{*}11_{*}/ insert( |x[j]-d_max:|D);

}

/_{*}If first attempt failed, try to see if setting_{*}/

/_{*}x[Left]=x[n]-d_maxis feasible_{*}/

/_{*}12_{*}/ if( !found && (|x[n]-d_max-x[j]|D

/_{*}13_{*}/ for all 1j<LeftandRight<jn) )

{

/*14*/x[Left]= x[n]-d_max; / * Same logic as before */

/*15*/for( 1j < Left, Right < jn )

/*16*/ delete( |x[n]-d_max-x[j] |, D );

/*17*/ found = place(x,D,n,Left+ 1,Right);

/_{*}18_{*}/ if( !found ) /_{*}Backtrack; undo the deletion_{*}/

/_{*}19_{*}/ for( 1j< Left, Right <jn)

/_{*}20_{*}/ insert( |x[n]-d_max-x[j]|,D);

}

/_{*}21_{*}/ return found;

}

The general strategy is to use an evaluation function to quantify the "goodness" of a position. A position that is a win for a computer might get the value of +1; a draw could get 0; and a position that the computer has lost would get a - 1. A position for which this assignment can be determined by examining the board is known as a *terminal* position.

A *successor position* of *P* is any position *P _{s}* that is reachable from

The code in Figure 10.66 makes the computer's strategy more clear. Lines 1 through 4 evaluate immediate wins or draws. If neither of these cases apply, then the position is nonterminal. Recalling that *value* should contain the maximum of all possible successor positions, line 5 initializes it to the smallest possible value, and the loop in lines 6 through 13 searches for improvements. Each successor position is recursively evaluated in turn by lines 8 through 10. This is recursive, because, as we will see, the procedure *find_human_move* calls *find_comp_move*. If the human's response to a move leaves the computer with a more favorable position than that obtained with the previously best computer move, then the *value* and *best_move* are updated. Figure 10.67 shows the procedure for the human's move selection. The logic is virtually identical, except that the human player chooses the move that leads to the lowest-valued position. Indeed, it is not difficult to combine these two procedures into one by passing an extra variable, which indicates whose turn it is to move. This does make the code somewhat less readable, so we have stayed with separate routines.

Since these routines must pass back both the value of the position and the best move, we pass the address of two variables that will get this information, by using pointers. The last two parameters now answer the question "WHERE?" instead of "WHAT? "

/_{*}Recursive procedure to find best move for computer_{*}/

/_{*}best_move points to a number from 1-9 indicating square._{*}/

/_{*}Possible evaluations satisfy COMP_LOSS<DRAW < COMP_WIN_{*}/

/_{*}Complementary procedure find_human_move is below_{*}/

/_{*}board_type is an array; thus board can be changed by place ( )_{*}/

void

find_comp_move( board_type board, int_{*}best_move, int_{*}value )

{

int dc, i, response; /_{*}dc means don't care_{*}/

/_{*}1_{*}/ if( full_board( board ) )

/_{*}2_{*/ *}value = DRAW;

else

/_{*}3_{*}/ if( immediate_comp_win( board, best_move ) )

/_{*}4_{*}/_{ *}value = COMP_WIN;

else

{

/_{*}5_{*}/_{ *}value = COMP_LOSS;

/_{*}6_{*}/ for( i=1; i<=9; i++ ) /_{*}try each square_{*}/

{

/_{*}7_{*}/ if( is_empty( board, i ) )

{

/_{*}8_{*}/ place( board, i, COMP );

/_{*}9_{*}/ find_human_move( board, &dc, &response );

/_{*}10_{*}/ unplace( board, i ); /_{*}Restore board_{*}/

/_{*}11_{*}/ if( response >_{*}value ) /_{*}Update best move_{*}/

{

/_{*}12_{*}/_{ *}value = response;

/_{*}13_{*}/_{ *}best_move = i;

}

}

}

}

}

void

find_human_move( board_type board, int_{*}best_move, int *value )

{

int dc, i, response; /_{*}dc means don't care_{*}/

/_{*}1_{*}/ if( full_board( board ) )

/_{*}2_{*}/_{ *}value = DRAW;

else

/_{*}3_{*}/ if( immediate_human_win( board, best_move ) )

/_{*}4_{*}/_{ *}value = COMP_LOSS;

else

{

/_{*}5_{*}/_{ *}value = COMP_WIN;

/_{*}6_{*}/ for( i=1; i<=9; i++ ) /_{*}try each square_{*}/

{

/_{*}7_{*}/ if( is_empty( board, i ) )

{

/_{*}8_{*}/ place( board, i, HUMAN );

/_{*}9_{*}/ find_comp_move( board, &dc, &response );

/_{*}10_{*}/ unplace( board, i ); /_{*}Restore board_{*}/

/_{*}11_{*}/ if( response <_{*}value ) /_{*}Update best move_{*/}

{

/*_{12}*_{/}*_{value = response;}

/_{*}13_{*}/_{ *}best_move = i;

}

}

}

}

}

As an example, in Figure 10.66, *best_move* contains the address where the best move can be placed. *find_comp_move* can examine or alter the data at that address by accessing ^{*}*best_move*. Line 9 shows how the calling routine should behave. Since the caller has two integers prepared to store the data, and *find_human_move* only wants the addresses of these two integers, the address operator (&) is used.

If the & operator is not used at line 9, and both *dc* and *response* are zero (which would be typical of uninitialized data), the *find_human_move* will try to place its best move and position value in memory location zero. Of course, this is not what was intended, and will almost certainly result in a program crash (try it!). This is the most common error when using the *scanf* family of library routines.

We leave supporting routines as an exercise. The most costly computation is the case where the computer is asked to pick the opening move. Since at this stage the game is a forced draw, the computer selects square 1.* A total of 97,162 positions were examined, and the calculation took 2.5 seconds on a VAX 8800. No attempt was made to optimize the code. When the computer moves second, the number of positions examined is 5,185 if the human selects the center square, 9,761 when a corner square is selected, and 13,233 when a noncorner edge square is selected.

^{*}We numbered the squares starting from the top left and moving right. However, this is only important for the supporting routines.

For more complex games, such as checkers and chess, it is obviously infeasible to search all the way to the terminal nodes.ç In this case, we have to stop the search after a certain depth of recursion is reached. The nodes where the recursion is stopped become terminal nodes. These terminal nodes are evaluated with a function that estimates the value of the position. For instance, in a chess program, the evaluation function measures such variables as the relative amount and strength of pieces and positional factors. The evaluation function is crucial for success, because the computer's move selection is based on maximizing this function. The best computer chess programs have surprisingly sophisticated evaluation functions.

^{ç}It is estimated that if this search were conducted for chess, at least 10^{100} positions would be examined for the first move. Even if the improvements described later in this section were incorporated, this number could not be reduced to a practical level.

Nevertheless, for computer chess, the single most important factor seems to be number of moves of look-ahead the program is capable of. This is sometimes known as *ply*; it is equal to the depth of the recursion. To implement this, an extra parameter is given to the search routines.

The basic method to increase the look-ahead factor in game programs is to come up with methods that evaluate fewer nodes without losing any information. One method which we have already seen is to use a table to keep track of all positions that have been evaluated. For instance, in the course of searching for the first move, the program will examine the positions in Figure 10.68. If the values of the positions are saved, the second occurrence of a position need not be recomputed; it essentially becomes a terminal position. The data structure that records this is known as a *transposition table;* it is almost always implemented by hashing. In many cases, this can save considerable computation. For instance, in a chess endgame, where there are relatively few pieces, the time savings can allow a search to go several levels deeper.

Probably the most significant improvement one can obtain in general is known as *-* pruning. Figure 10.69 shows the trace of the recursive calls used to evaluate some hypothetical position in a hypothetical game. This is commonly referred to as a *game tree*. (We have avoided the use of this term until now, because it is somewhat misleading: no tree is actually constructed by the algorithm. The game tree is just an abstract concept.) The value of the game tree is 44.

Figure 10.70 shows the evaluation of the same game tree, with several unevaluated nodes. Almost half of the terminal nodes have not been checked. We show that evaluating them would not change the value at the root.

First, consider node *D*. Figure 10.71 shows the information that has been gathered when it is time to evaluate *D*. At this point, we are still in *find_human_move* and are contemplating a call to *find_comp_move* on *D*. However, we already know that *find_human_move* will return at most 40, since it is a *min* node. On the other hand, its *max* node parent has already found a sequence that guarantees 44. Nothing that *D* does can possibly increase this value. Therefore, *D* does not need to be evaluated. This pruning of the tree is known as pruning. An identical situation occurs at node *B*. To implement pruning, *get_comp_move* passes its tentative maximum () to *get_human_move*. If the tentative minimum of *get_human_move* falls below this value, then *get_human_move* returns immediately.

A similar thing happens at nodes *A* and *C*. This time, we are in the middle of a *find_comp_move* and are about to make a call to *find_human_move* to evaluate *C*. Figure 10.72 shows the situation that is encountered at node *C*. However, the *sfind_human_move*, at the *min* level, which has called *find_comp_move*, has already determined that it can force a value of at most 44 (recall that low values are good for the human side). Since *find_comp_move* has a tentative maximum of 68, nothing that *C* does will affect the result at the *min* level. Therefore, *C* should not be evaluated. This type of pruning is known as pruning; it is the symmetric version of pruning. When both techniques are combined, we have - pruning.

Implementing - pruning requires surprisingly little code. It is not as difficult as one might think, although many programmers have a very hard time doing it without looking at a reference book. Figure 10.73 shows half of the - pruning scheme (minus type declarations); you should have no trouble coding the other half.

/* Same as before, but perform - pruning. */

/* The main routine should make the call with = COMP_LOSS,

= COMP_WIN. */

void

find_comp_move( board_type board, int *best_move, int *value,

int , int )

{

int dc, i, response; /* dc means don't care */

/*1*/ if( full_board( board ) )

/*2*/ *value = DRAW;

else

/*3*/ if( immediate-comp_win( board, best_move ) )

/*4*/ *value = COMP_WIN;

else

{

/*5*/ *value = ;

/*6*/ for( i=1; (i<=9) && (*value<); i++) /* try each square */

{

/*7*/ if( is_empty( board, i ) )

{

/*8*/ place( board, i, COMP );

/*9*/ find_human_move( board, &dc, &response, *value, );

/*10*/ unplace( board, i ); /* Restore board */

/*11*/ if( response>* value ) /* Update best move */

{

/*12*/ *value = response;

/*13*/ *best_move = i;

}

}

}

}

}

10.1 Show that the greedy algorithm to minimize the mean completion time for multiprocessor job scheduling works.

10.2 The input is a set of jobs *j*_{1}, *j*_{2}, . . . , *j _{n}*, each of which takes one time unit to complete. Each job

(a) Give an *O*(*n*^{2}) greedy algorithm to solve the problem.

10.3 A file contains only colons, spaces, newline, commas, and digits in the following frequency: colon (100), space (605), newline (100), commas (705), 0 (431), 1 (242), 2 (176), 3 (59), 4 (185), 5 (250), 6 (174), 7 (199), 8 (205), 9 (217). Construct the Huffman code.

10.4 Part of the encoded file must be a header indicating the Huffman code. Give a method for constructing the header of size at most *O*(*n*) (in addition to the symbols), where *n* is the number of symbols.

10.5 Complete the proof that Huffman's algorithm generates an optimal prefix code.

10.6 Show that if the symbols are sorted by frequency, Huffman's algorithm can be implemented in linear time.

10.7 Write a program to implement file compression (and uncompression) using Huffman's algorithm.

*10.8 Show that any on-line bin-packing algorithm can be forced to use at least the optimal number of bins, by considering the following sequence of items: *n* items of size , *n* items of size , *n* items of size .

10.9 Explain how to implement first fit and best fit in *O(n* log *n)* time.

10.10 Show the operation of all of the bin-packing strategies discussed in Section 10.1.3 on the input 0.42, 0.25, 0.27, 0.07, 0.72, 0.86, 0.09, 0.44, 0.50, 0.68, 0.73, 0.31, 0.78, 0.17, 0.79, 0.37, 0.73, 0.23, 0.30.

10.11 Write a program that compares the performance (both in time and number of bins used) of the various bin packing heuristics.

*10.14 *n* points are placed in a unit square. Show that the distance between the closest pair is *O*(*n*^{-1/2}).

*10.15 Argue that for the closest-points algorithm, the average number of points in the strip is (. *Hint: Use the result of the previous exercise*.

10.16 Write a program to implement the closest-pair algorithm.

10.17 What is the asymptotic running time of quickselect, using a median-of-median-of-three partitioning strategy?

10.18 Show that quickselect with median-of-median-of-seven partitioning is linear. Why is median-of-median-of-seven partitioning not used in the proof?

10.19 Implement the quickselect algorithm in Chapter 7, quickselect using median-of-median-of-five patitioning, and the sampling algorithm at the end of Section 10.2.3. Compare the running times.

10.20 Much of the information used to compute the median-of-median-of-five is thrown away. Show how the number of comparisons can be reduced by more careful use of the information.

*10.21 Complete the analysis of the sampling algorithm described at the end of Section 10.2.3, and explain how the values of and *s* are chosen.

10.22 Show how the recursive multiplication algorithm computes *xy*, where *x* = 1234 and *y* = 4321. Include all recursive computations.

10.23 Show how to multiply two complex numbers *x* = *a* + *bi* and *y* = *c* + *di *using only three multiplications.

x_{l}y_{r}+x_{r}y_{l}= (x_{l}+x)(_{r}y_{l}+y) -_{r}x_{l}y_{l}-x_{r}y_{r}

10.25 * (a) Show how to multiply two numbers by solving five problems that are roughly one-third of the original size.

**(b) Generalize this problem to obtain an *O*(*n*^{1+}) algorithm for any constant * > 0.*

(c) Is the algorithm in part (b) better than *O*(*n* log *n*)?

10.26 Why is it important that Strassen's algorithm does not use commutativity in the multiplication of 2 X 2 matrices?

10.27 Two 70 X 70 matrices can be multiplied using 143,640 multiplications. Show how this can be used to improve the bound given by Strassen's algorithm.

10.28 What is the optimal way to compute A_{1}A_{2}A_{3}A_{4}A_{5}A_{6}, where the dimensions of the matrices are: A_{l}: 10 X 20, A_{2}: 20 X 1, A_{3}: 1 X 40, A_{4}: 40 X 5, A_{5}: 5 X 30, A_{6}: 30 X 15?

10.29 Show that none of the following greedy algorithms for chained matrix multiplication work. At each step

(a) Compute the cheapest multiplication.

(b) Compute the most expensive multiplication.

10.30 Write a program to compute the best ordering of matrix multiplication. Include the routine to print out the actual ordering.

10.31 Show the optimal binary search tree for the following words, where the frequency of occurrence is in parentheses: *a* (0.18), *and* (0.19), *I* (0.23), *it *(0.21) , *or* (0.19).

*10.32 Extend the optimal binary search tree algorithm to allow for unsuccessful searches. In this case, *q _{j}*, for 1

*10.33 Suppose C* _{i,i}* = 0 and that otherwise

Suppose that **W** satisfies the *quadrangle inequality*, namely, for all *i* *i*' *j* *j*',

W+_{i, j}W'_{i',j}W+_{i',j}W_{i, j}'

Suppose further, that **W** is *monotone*: If *i* *i*' and *j*' *j*', then *Wi,j* *W _{i}*'

(a) Prove that **C** satisfies the quadrangle inequality.

R_{i, j}R+1_{i, j}R+1,_{i}j+1

(c) Show that **R** is nondecreasing along each row and column.

(d) Use this to show that all entries in **C** can be computed in *O*(*n*^{2}) time.

(e) Which of the dynamic programming algorithms can be solved in *O*(*n*^{2}) using these techniques?

10.34 Write a routine to reconstruct the shortest paths from the algorithm in Section 10.3.4.

10.35 Examine the random number generator on your system. How random is it?

10.36 Write the routines to perform insertion, deletion, and searching in skip lists.

10.37 Give a formal proof that the expected time for the skip list operations is *O*(log *n*).

10.38 Figure 10.74 shows a routine to flip a coin, assuming that *random* returns an integer (which is prevalent in many systems). What is the expected performance of the skip list algorithms if the random number generator uses a modulus of the form *m* = 2* ^{b}* (which is unfortunately prevalent on many systems)?

10.39 (a) Use the exponentiation algorithm to prove that 2^{340} 1(*mod* 341).

(b) Show how the randomized primality test works for *n* = 561 with several choices of *a*.

10.40 Implement the turnpike reconstruction algorithm.

10.41 Two point sets are *homometric* if they yield the same distance set and are not rotations of each other. The following distance set gives two distinct point sets: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17 . Find the two point sets.

enum coin_side { heads, tails };

typedef enum coin_side coin_side;

coin_side

flip( void )

{

if( ( rand() % 2 ) == 0 )

return heads;

else

return tails;

}

10.42 Extend the reconstruction algorithm to find *all* homometric point sets given a distance set.

10.43 Show the result of - pruning the tree in Figure 10.75.

10.44 (a) Does the code in Figure 10.73 implement * *pruning or pruning?

(b) Implement the complementary routine.

10.45 Write the remaining procedures for tic-tac-toe.

10.46 The *one-dimensional circle packing problem* is as follows: You have *n* circles of radii *r*_{1},* r*_{2},* . . . , r _{n.}* These circles are packed in a box such that each circle is tangent to the bottom of the box, and are arranged in the original order. The problem is to find the width of the minimum-sized box.

Figure 10.76 shows an example with circles of radii 2, 1, 2 respectively. The minimum-sized box has width

*10.47 Suppose that the edges in an undirected graph *G* satisfy the triangle inequality: *c _{u,v} + c_{v,w} * c

*10.48 You are a tournament director and need to arrange a round robin tournament among *n = 2 ^{k}* players. In this tournament, everyone plays exactly one game each day; after

10.49 (a) Prove that in a round robin tournament it is always possible to arrange the players in an order *p _{i}*1,

*10.50 We are given a set *P = p*_{1},* p*_{2}, . . . , *p _{n}* of

*10.51 A *convex polygon* is a polygon with the property that any line segment whose endpoints are on the polygon lies entirely within the polygon. The *convex hull* problem consists of finding the smallest (area) convex polygon which encloses a set of points in the plane. Figure 10.78 shows the convex hull for a set of 40 points. Give an *O(n *log *n)* algorithm to find the convex hull.

*10.52 Consider the problem of right-justifying a paragraph. The paragraph contains a sequence of words *w*_{1}*, w*_{2}*, . . . ,w _{n}* of length

*10.53 The longest increasing subsequence problem is as follows: Given numbers a_{1}, a_{2}, . . ., a_{n}, find the maximum value of k such that a_{i1} < a_{i2} < < a_{ik}, and i_{1} < i_{2} < < i_{k}. As an example, if the input is 3, 1, 4, 1, 5, 9, 2, 6, 5, the maximum increasing subsequence has length four ( 1, 4, 5, 9 among others ). Give an O(n^{2}) algorithm to solve the longest increasing subsequence problem.

*10.54 The longest common subsequence problem is as follows: Given two sequences A = a_{1}, a_{2}, . . . , a_{m}, and B = b_{1}, b_{2}, . . . , b_{n}, find the length, k, of the longest sequence C = c_{1}, c_{2}, . . . , c_{k} such that C is a subsequence of both A and B. As an example, if

A= d, y, n, a, m, i, c

B= p, r, o, g, r, a, m, m, i, n, g,

*10.55 The *pattern matching problem *is as follows: Given a string *S* of text, and a pattern *P,* find the first occurrence of *P* in *S.* *Approximate pattern matching *allows* k* mismatches of three types:

1. A character can be in *S* that is not in *P.*

2. A character can be in *P* that is not in *S.*

3. *P* and *S* can differ in a position.

*10.56 One form of the *knapsack* *problem *is as follows: We are given a set of integers *A = a*_{1}*, a _{2}, . . . , a_{n}* and an integer

(a) Give an algorithm that solves the knapsack problem in *O(nK)* time.

(b) Why does this not show that *P = NP?*

*10.57 You are given a currency system with coins of (decreasing) value c_{1}*,* c_{2}, . . . , *c _{n}* cents.

(a) Give an algorithm that computes the minimum number of coins required to give *K* cents in change.

(b) Give an algorithm that computes the number of different ways to give *K* cents in change.

*10.58 Consider the problem of placing eight queens on an (eight by eight) chess board. Two queens are said to attack each other if they are on the same row, column, or (not necessarily main) diagonal.

(a) Give a randomized algorithm to place eight nonattacking queens on the board.

(b) Give a backtracking algorithm to solve the same problem.

(c) Implement both algorithms and compare the running time.

distance

shortest(s, t, G)

{

distanced,tmp;_{t}

if( s == t )

return 0;

d= ;_{t}

for each vertexvadjacent tos

{

tmp = shortest(v,t,G);

if(c,_{s}v+ tmp<d)_{t}

d=_{t}c,_{s}v+ tmp;

}

returnd_{t}

}

*10.59 In the game of chess, a knight in row *r* and column *c* may move to row 1 *r'* *B* and column 1 *c'* *B* (where *B* is the size of the board) provided that either

|r - r'| = 2 and |c-c'| = 1

|r - r'| = 1 and |c-c'| = 2

(a) If *B* is odd, show that a knight's tour cannot exist.

(b) Give a backtracking algorithm to find a knight's tour.

10.60 Consider the recursive algorithm in Figure 10.79 for finding the shortest weighted path in an acyclic graph, from *s* to *t*.

(a) Why does this algorithm not work for general graphs?

(b) Prove that this algorithm terminates for acyclic graphs.

(c) What is the worst-case running time of the algorithm?

The original paper on Huffman codes is [21]. Variations on the algorithm are discussed in [29], [31], and [32]. Another popular compression scheme is Ziv-Lempel encoding [52], [53]. Here the codes have a fixed length but represent strings instead of characters. [3] and [34] are good surveys of the common compression schemes.

The analysis of bin-packing heuristics first appeared in Johnson's Ph.D. thesis and was published in [22]. The improved lower bound for on-line bin packing given in Exercise 10.8 is from [50]; this result has been improved further in [35]. [44] describes another approach to on-line bin packing.

Theorem 10.7 is from [6]. The closest points algorithm appeared in [45]. [47] describes the turnpike reconstruction problem and its applications. Two books on the relatively new field of computational geometry are [14] and [40]. [2] contains the lecture notes for a computational geometry course taught at MIT; it includes an extensive bibliography.

The linear-time selection algorithm appeared in [8]. [17] discusses the sampling approach that finds the median in 1.5*n* expected comparisons. The *O*(*n*^{1.59}) multiplication is from [23]. Generalizations are discussed in [9] and [24]. Strassen's algorithm appears in the short paper [48]. The paper states the results and not much else. Pan [38] gives several divide and conquer algorithms, including the one in Exercise 10.27. The best known bound is *O*(*n*^{2.376}), which is due to Coppersmith and Winograd [13].

The classic references on dynamic programming are the books [4] and [5]. The matrix ordering problem was first studied in [19]. It was shown in [20] that the problem can be solved in *O*(*n* log *n*) time.

An *O*(*n*^{2}) algorithm was provided for the construction of optimal binary search trees by Knuth [25]. The all-pairs shortest-path algorithm is from Floyd [16]. A theoretically better *O*(*n*^{3}(log log*n*/log*n*)^{l/3}) algorithm is given by Fredman [18], but not surprisingly, it is not practical. Under certain conditions, the running time of dynamic programs can automatically be improved by a factor of *n* or more. This is discussed in Exercise 10.33, [15], and [51].

The discussion of random number generators is based on [39]. Park and Miller attribute the portable implementation to Schrage [46]. Skip lists are discussed by Pugh in [41]. The randomized primality-testing algorithm is due to Miller [36] and Rabin [43]. The theorem that at most (*n* - 9)/4 values of *a* fool the algorithm is from Monier [37]. Other randomized algorithms are discussed in [42].

More information on - pruning can be found in [1], [26], and [27]. The top programs that play chess, checkers, Othello, and backgammon have all achieved world class status. [33] describes an Othello program. The paper appears in a special issue on computer games (mostly chess); this issue is a gold mine of ideas. One of the papers describes the use of dynamic programming to solve chess endgames completely when only a few pieces are left on the board. Related research has resulted in the change of the 50-move rule in certain cases.

Exercise 10.41 is solved in [7]. It is the only known case of a homometric point set with no duplicate distances. Determining whether any others exist for *n* > 6 is open. Christofides [12] gives a solution to Exercise 10.47, and also an algorithm which generates a tour at most optimal. Exercise 10.52 is discussed in [28]. Exercise 10.55 is solved in [49]. An *O*(*kn*) algorithm is given in [30]. Exercise 10.57 is discussed in [10], but do not be misled by the title of the paper.

1. B. Abramson, "Control Strategies for Two-Player Games," *ACM Computing Surveys, *21 (1989), 137-161.

2. A. Aggarwal and J. Wein, *Computational Geometry: Lecture Notes for* *18.409*, MIT Laboratory for Computer Science, 1988.

3. T. Bell, I. H. Witten, and J. G. Cleary, "Modeling for Text Compression," *ACM Computing Surveys,* 21 (1989), 557-591.

4. R. E. Bellman, *Dynamic Programming,* Princeton University Press, Princeton, NJ, 1957.

5. R. E. Bellman and S. E. Dreyfus, *Applied Dynamic Programming,* Princeton University Press, Princeton, NJ, 1962.

6. J. L. Bentley, D. Haken, and J. B. Saxe, "A General Method for Solving Divide-and-Conquer Recurrences," *SIGACT News,* 12 (1980), 36-44.

7. G. S. Bloom, "A Counterexample to the Theorem of Piccard," *Journal of Combinatorial Theory A* (1977), 378-379.

8. M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan, "Time Bounds for Selection," *Journal of Computer and System Sciences* 7 (1973), 448-461.

9. A. Borodin and J. I. Munro, *The Computational Complexity of Algebraic and Numerical Problems,* American Elsevier, New York, 1975.

10. L. Chang and J. Korsh, "Canonical Coin Changing and Greedy Solutions," *Journal of the ACM* 23 (1976), 418-422.

12. N. Christofides, "Worst-case Analysis of a New Heuristic for the Traveling Salesman Problem," *Management Science Research Report #388,* Carnegie-Mellon University, Pittsburgh, PA, 1976.

13. D. Coppersmith and S. Winograd, "Matrix Multiplication via Arithmetic Progressions," *Proceedings of the Nineteenth Annual ACM Symposium of the Theory of Computing *(1987), 1-6.

14. H. Edelsbrunner, *Algorithms in Combinatorial Geometry,* Springer-Verlag, Berlin, 1987.

15. D. Eppstein, Z. Galil, R. Giancarlo, "Speeding up Dynamic Programming," *Proceedings of the Twenty-ninth Annual IEEE Symposium on the Foundations of Computer Science, *(1988), 488-495.

16. R. W. Floyd, "Algorithm 97: Shortest Path," *Communications of the ACM 5* (1962), 345.

17. R. W. Floyd and R. L. Rivest, "Expected Time Bounds for Selection," *Communications of the ACM* 18 (1975), 165-172.

18. M. L. Fredman, "New Bounds on the Complexity of the Shortest Path Problem," *SIAM Journal on Computing 5* (1976), 83-89.

19. S. Godbole, "On Efficient Computation of Matrix Chain Products," *IEEE Transactions on Computers* 9 (1973), 864-866.

20. T. C. Hu and M. R. Shing, "Computations of Matrix Chain Products, Part I," *SIAM Journal on Computing* 11 (1982), 362-373.

21. D. A. Huffman, "A Method for the Construction of Minimum Redundancy Codes," *Proceedings of the IRE* 40 (1952), 1098-1101.

22. D. S. Johnson, A. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham, "Worst-case Performance Bounds for Simple One-Dimensional Packing Algorithms," *SIAM Journal on Computing,* 3 (1974), 299-325.

23. A. Karatsuba and Y. Ofman, "Multiplication of Multi-digit Numbers on Automata," *Doklady Akademii Nauk SSSR* 145 (1962), 293-294.

24. D. E. Knuth, *The Art of Computer Programming, Vol 2: Seminumerical Algorithms, *second edition, Addison-Wesley, Reading, MA, 1981.

25. D. E. Knuth, "Optimum Binary Search Trees," *Acta Informatica* 1 (1971), 14-25.

26. D. E. Knuth and R. W. Moore, "Estimating the Efficiency of Backtrack Programs," *Mathematics of Computation* 29, (1975) 121-136.

27. D. E. Knuth, "An Analysis of Alpha-Beta Cutoffs," *Artificial Intelligence* 6 (1975), 293-326.

28. D. E. Knuth, *T _{E}X and Metafont, New Directions in Typesetting,* Digital Press, Bedford, MA, 1981.

29. D. E. Knuth, "Dynamic Huffman Coding,"*Journal of Algorithms* 6 (1985), 163-180.

30. G. M. Landau and U. Vishkin, "Introducing Efficient Parallelism into Approximate String Matching and a New Serial Algorithm," *Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing* (1986), 220-230.

31. L. L. Larmore, "Height-Restricted Optimal Binary Trees," *SlAM Journal on Computing *16 (1987), 1115-1123.

32. L. L. Larmore and D. S. Hirschberg, "A Fast Algorithm for Optimal Length-Limited Huffman Codes," *Journal of the ACM* 37 (1990), 464-473.

33. K. Lee and S. Mahajan, "The Development of a World Class Othello Program," *Artificial Intelligence* 43 (1990), 21-36.

34. D. A. Lelewer and D. S. Hirschberg, "Data Compression," *ACM Computing Surveys* 19 (1987), 261-296.

35. F. M. Liang, "A Lower Bound for On-line Bin Packing," *Information Processing Letters *10 (1980), 76-79.

36. G. L. Miller, "Riemann's Hypothesis and Tests for Primality," *Journal of Computer and System Sciences* 13 (1976), 300-317.

37. L. Monier, "Evaluation and Comparison of Two Efficient Probabilistic Primality Testing Algorithms," *Theoretical Computer Science* 12 (1980), 97-108.

38. V. Pan, "Strassen's Algorithm is Not Optimal," *Proceedings of the Nineteenth Annual IEEE Symposium on the Foundations of Computer Science* (1978), 166-176.

39. S. K. Park and K. W. Miller, "Random Number Generators: Good Ones are Hard To Find," *Communications of the ACM* 31 (1988), 1192-1201.

40. F. P. Preparata and M. I. Shamos, *Computational Geometry: An Introduction,* Springer-Verlag, New York, NY, 1985.

41. W. Pugh, "Skip Lists: A Probabilistic Alternative to Balanced Trees," *Communications of the ACM* 33 (1990), 668-676.

42. M. O. Rabin, "Probabilistic Algorithms," in *Algorithms and Complexity, Recent Results and New Directions* (J. F. Traub, ed.), Academic Press, New York, 1976, 21-39.

43. M. O. Rabin, "Probabilistic Algorithms for Testing Primality," *Journal of Number Theory,* 12 (1980), 128-138.

44. P. Ramanan, D. J. Brown, C. C. Lee, and D. T. Lee, "On-line Bin Packing in Linear Time," *Journal of Algorithms* 10 (1989), 305-326.

45. M. I. Shamos and D. Hoey, "Closest-Point Problems," *Proceedings of the Sixteenth Annual IEEE Symposium on the Foundations of Computer Science* (1975), 151-162.

46. L. Schrage, "A More Portable FORTRAN Random Number Generator," *ACM Transactions on Mathematics Software* 5 (1979), 132-138.

47. S. S. Skiena, W. D. Smith, and P. Lemke, "Reconstructing Sets From Interpoint Distances," *Proceedings of the Sixth Annual ACM Symposium on Computational Geometry *(1990), 332-339.

48. V. Strassen, "Gaussian Elimination is Not Optimal," *Numerische Mathematik* 13 (1969), 354-356.

49. R. A. Wagner and M. J. Fischer, "The String-to-String Correction Problem," *Journal of the ACM* 21 (1974), 168-173.

50. A. C. Yao, "New Algorithms for Bin Packing," *Journal of the ACM* 27 (1980), 207-227.

51. F. F. Yao, "Efficient Dynamic Programming Using Quadrangle Inequalities," *Proceedings of the Twelfth Annual ACM Symposium on the Theory of Computing* (1980), 429-435.

52. J. Ziv and A. Lempel, "A Universal Algorithm for Sequential Data Compression," *IEEE Transactions on Information Theory* IT23 (1977), 337-343.

53. J. Ziv and A. Lempel, "Compression of Individual Sequences via Variable-rate Coding," *IEEE Transactions on Information Theory* IT24 (1978), 530-536.