*Deals with dynamic data structures, which**can grow and contract in size**support efficient searches**The data structures are**priority queues, which**support insertions and deletions*

*hash tables, which**support insertions and deletions**have excellent average search time**do not support ordered traversals*

*binary search trees, which**support searching, inserting, and deleting**have good average times for these operations**support ordered traversing*

*balanced binary search trees, which**support all the operations of binary search trees**also support fast access to information in an arbitrary position in a sorted ordering of the information stored**have good worst-case time for all the operations*

*Worst-case and average times are given for the operations on each data structure*

In the previous chapter, techniques were presented for ordering and storing data so that it could be efficiently *searched* and traversed when desired. Those methods work well as long as the collection of data is essentially *static*--meaning that relatively few insertions and deletions are required. This is a general characteristic of arrays, which were the basis for the faster binary search: they support efficient search but not efficient insertion and deletion. Lists can be used for more efficient insertion and deletion, but then a binary search is not possible.

It often happens that there are a number of tasks, *n*, each with an associated priority. The *n *tasks are to be carried out in order of priority. When studying for examinations, most students review and prepare in an order of priority determined by the date of the exam. Most people read the chapters of a book in order, with priority determined by chapter numbering. In this case the situation is static, since the number of chapters does not change. Computer centers process submitted programs by attaching a priority to each and processing them in order of priority. Here the number of programs does change. This represents the usual case, where it is necessary to add new tasks with their priorities to the current collection. The general situation is covered by a *priority queue*.

A **priority queue**** **is a data structure with objects, their priorities, and two operations, insertion and deletion. * Insertion in a priority queue* means that an object with its priority is added to the collection of objects.

Implementation Insertion Deletion

----------------------------------------------

Array ConstantO(n)

Unordered List ConstantO(n)

ArrayO(n) Constant

Ordered ListO(n) Constant

Another way to implement a priority queue is to use a heap. You have seen how to build a heap efficiently so that storage is minimal, heap creation takes at most *O*(*n*) time, and reheaping takes at most *O*(lg *n*) time. To use a heap for the priority queue, the records are stored at the nodes of a heap that is ordered by priority field values. To insert a new record, place it as a new rightmost entry at the greatest depth of the heap. Then let it work its way up in the heap as in the first heap creation algorithm. Adding the new record in this way will take time at most *O*(lg *n*). To delete a record, simply delete the record at the root of the heap, and then reheap by invoking shift. This takes time at most *O*(lg *n*). The time to create the heap initially will be at most *O*(*n*) if the second heap creation algorithm is used. This is also the time that would be required to create an initial unordered array or unordered list of records. To create sorted arrays or lists takes time at most *O*(*n* lg *n*).* *Referring to Table 9.1, you can see that the heap* *implementation affords a compromise solution. With arrays and lists, one operation can be done quickly (in constant time) and the other in* O*(*n*) time. With the heap, both operations can be done in at most *O*(lg *n*) time.

Still another way to implement a priority queue is as a * leftist tree,* which is a null binary tree or a binary tree with the following properties:

**2.** Its left and right subtrees are leftist trees.

A leftist tree implementation of a priority queue has a number of pluses. Both additions and deletions can be done in constant time when the priority queue is acting as a stack. The time for these operations is *O*(lg *n*) for the general case. Also, two priority queues can be merged into one in *O*(l*g n*) time, where *n* is the total number of entries in the two priority queues to be merged. This is significantly faster merging than can be done with any of the previous methods, except for unordered lists. Crane [1972] introduced *leftist trees* as a data structure that can be used to implement priority queues. (See Knuth [1973b] for more details.)

1. Make node.p the root of the merged tree, and merge q with p's right subtree.

2. Update the distance field of node.p.

The next data structure we consider is the *hash table.* A hash table is a data structure for the storage of records. It provides a means for rapid searching for a record with a given key value and adapts well to the insertion and deletion of records. Many compilers, for example, use hash tables to implement their symbol tables. They are also used extensively in systems where the primary objective is information retrieval. Thus nametable of Chapter 6 or a dictionary of words might be implemented in a hash table.

This section deals with a scaled-down version of the social security example. There are eighteen records, each with a key value that is a three-digit number. The records and their key values will be given in some order, normally unpredictable. We are to build a table, implemented in an array, in which the eighteen key values will be stored, so that searching can be done efficiently. One way to do this is to build a special kind of table--a * hash table*. This table will then be searched for given key values. To build a hash table requires a hashing function and a collision resolution policy. A

The hash function should have other properties, as will become evident. The hash function used here assigns to any key the remainder after division of the key by the table size. If the table size is *m* and the key value is *k*, then *k* mod *m* denotes this remainder. The starting table or array size is 23.* The eighteen key values and their order are given in Table 9.2, along with the hashing function values assigned to each.

Hashing Hashing

Key Value Address Key Value Address

---------------------------------------------

019 --> 19 468 --> 08

392 --> 01 814 --> 09

179 --> 18 720 --> 07

359 --> 14 260 --> 07

663 --> 19 802 --> 20

262 --> 09 364 --> 19

639 --> 18 976 --> 10

321 --> 22 774 --> 15

097 --> 05 566 --> 14

When the fifth key value, 663, is input, its hashing address is 19, but element 19 of the table is already occupied by another key value. This event is called a * collision*. The

How would you search a hash table when asked to find a given key, such as 642? If the key were stored in the table, it would either be found at its hashing address or at the end of a linear probe. If its hashing address were empty, or if the search came to an empty element in the linear probe before finding the key, you might conclude that it was not stored in the table. In this case, the hash address of 642 is 21. Element 21 is not empty but does not have 642 as its value. If you do a linear probe, you will eventually come to element 3, which is empty, without finding 642; thus you would conclude that 642 is not in the table.

1/18 X 1 + 1/18 X 1 + 1/18 X 1 + 1/18 X 1 + 1/18 X 2 + 1/18 X 1

+ 1/18 X 4 + 1/18 X 1 + 1/18 X 1 + 1/18 X 1 + 1/18 X 2 + 1/18 X 1

+ 1/18 X 5 + 1/18 X 4 + 1/18 X 7 + 1/18 X 3 + 1/18 X 1 + 1/18 X 3

The hash table was of length 23 and stored eighteen key values. Its usage ratio is 18/23; the table is 78 percent full. Intuitively, the lower this ratio, the greater the storage wasted, but the lower the likelihood of a collision. Appropriate hashing functions must be selected with care in real applications, but it is possible to achieve very quick searches using hash tables.

A great deal of literature in computer science is devoted to the selection of hashing functions and collision resolution policies. To give some idea of what can be achieved, this section states results for an idealized scheme that assumes the hashing function to be perfectly random. Given any key value, the scheme assigns an address to that key value by picking randomly from among all the addresses of the table. This random selection is made so that each of the *m* addresses in a table of size *m* is equally likely to be selected. The table is built by inserting, in turn, each given key value. A key value is stored by determining its hash address. If that address is empty, the key value is stored there. If that address is occupied, the collision resolution policy works as follows. Another random hash function again selects an address at random. If that address is empty, the key value is stored there. If it is occupied, this procedure is repeated until the key value is finally entered. Thus an insertion of a key value may require the generation of a sequence of hash addresses ending with the actual final storage address.

Successful Unsuccessful

Usage RatioSearch Searchn/m

-----------------------------------------

.25 1.15 1.33

.50 1.39 2.00

.75 1.85 4.00

.80 2.01 5.00

.90 2.56 10.00

.95 3.15 20.00

The linear probing collision resolution policy is a special case of **open addressing***. *The idea of open addressing is to resolve collisions by following a specific *probe sequence* whenever a collision occurs, in order to determine the insertion location for a new key. To be specific, let *k _{i} *denote the key stored in the

Let *h*(*k*) denote the hash function value for search key *k*. The sequence of indexes determined by

[*h*(*k*) + *i *X *p*(*k*)]mod *m* *for *i* = 0, 1, 2, . . . , *m* - 1

* Linear probing* is the special case obtained by taking

2. While *k _{i}* is not the search key

Notice that *p*(*k*) represents a displacement from the current probe address.

With linear probing, large clusters of keys stored in adjacent locations of the hash table tend to get larger as insertions are made. This is because large clusters have a greater probability of the new keys' hash address falling within the cluster, assuming a "random" hash function and "random" keys. A cluster gets larger when a new key is inserted at either end of the cluster, or when two separated clusters become connected because of the new key insertion. Both of these events occurred in the scaled-down social security number example. This phenomenon is called** ****primary clustering****.** As the table usage ratio increases, primary clustering leads to greater search and insertion times. This is clearly undesirable. It happens because the large clusters must be traversed, with the average time being proportional to half the cluster length when keys are searched with equal frequency. Thus the insertion, or search for a key, begins to look like a linear search through a cluster.

To avoid primary clustering, try taking the next probe to be in a position removed from the current one instead of being adjacent to it. This is open addressing with **displaced linear probing***.* Taking *p*(*k*) = for some integer that is not a factor of *m* will give such a probe sequence. To see this, take *m* = 13 and = 5. The probe sequence is then generated by [*h*(*k*) + *i* X 5] mod 13 for *i* = 0, 1, 2, . . . , 12. If *h*(*k*) = 7, this gives the probe sequence

7, 12, 4, 9, 1, 6, __11__, 8, 0, 5, 10, 2

How can we eliminate or break up large clusters to improve these results? It can be done using open addressing with **secondary clustering***.* Consider the following analogy. Suppose there are *m* stores, each selling a certain specialty item, and each with exactly one such item in stock. Assume *n* people decide to buy that item the same day, and they, in turn, each select one of the *m* stores at random. If a customer finds the item in stock at the shop of his or her choice, he or she buys it. However, if someone has already bought the item, it will be out of stock, and the person may ask the store manager for directions to another store. Let the stores be numbered from 0 to *m* - 1. Whenever this occurs, suppose the manager of the *i*th shop sends the customer to the (*i* + 1)th shop. Of course, the manager of the (*m* - 1)th store refers people to the 0th store. The result of all this feverish activity is that *all* customers arriving at a particular out-of-stock store are sent to the same next store. Again, at that store, they are *all* sent to the same next store whenever the item has been sold. It seems clear that this results in *primary clusters* of customers being formed. The physical locations of the stores is irrelevant to this process.

Hash Hash Hash Hash Hash Hash

Keys Address Table (a) Table (b) Table (c) Table (d) Table (e)

-----------------------------------------------------------------------

019 19 0 802 (4) 663 (2) 364 (5) 364 (4) 392 (1)

392 01 1 392 (1) 392 (1) 392 (1) 392 (1)

179 18 2 364 (7)

359 14 3 321 (2)

663 19 4 364 (3) 566 (3)

262 09 5 097 (1) 097 (1) 097 (1) 097 (1) 097 (1)

639 18 6 720 (3) 260 (3) 663 (3)

321 22 7 720 (1) 720 (1) 639 (2) 720 (1) 720 (1)

097 05 8 468 (1) 468 (1) 468 (1) 468 (1) 468 (1)

468 08 9 262 (1) 262 (1) 262 (1) 262 (1) 262 (1)

814 09 10 814 (2) 976 (1) 976 (1) 814 (2) 976 (1)

720 07 11 260 (5) 260 (2) 976 (2) 364 (2)

260 07 12 976 (3) 566 (12) 814 (3) 321 (2)

802 20 13 814 (2) 566 (3)

364 19 14 359 (1) 359 (1) 359 (1) 359 (1) 359 (1)

976 10 15 774 (1) 774 (1) 774 (1) 774 (1) 774 (1)

774 15 16 566 (3) 566 (3)

566 14 17 260 (4) 639 (3) 260 (4)

18 179 (1) 179 (1) 179 (1) 179 (1) 179 (1)

19 019 (1) 019 (1) 019 (1) 019 (1) 019 (1)

20 663 (2) 802 (1) 663 (2) 663 (2) 806 (1)

21 639 (4) 802 (2) 802 (2) 814 (2)

22 321 (1) 639 (2) 321 (1) 321 (1) 639 (2)

linear displaced secondary quadratic double

probing linear clustering residue hashing

probing

Average number 2.22 2.00 1.89 1.72 1.61

of probes for a

succesful search

Suppose each store manager, instead of giving identical directions to each would-be purchaser, sends customers to another store depending on which store they tried originally and on the number of stores they have already been to. The effect will be to disperse the unhappy customers. Rather than *all* these people being sent on to the same sequence of shops, only those with the same initial shop and the same number of previous attempts at buying are given identical directions. Clusters should still appear, but they should tend to be smaller than the primary clusters. This is, in fact, what happens, and the clusters that appear after such dispersion are called **secondary clusters.**

7, 5, 3, 1, 12, 10, 8, 6, 4, 2, 0, __11__, 9

Any key hashing to 11 now (say, key 258) has the probe sequence

11, 0, 2, 4, 6, 8, 10, 12, 1, 3, 5, 7, 9

Open addressing with the **quadratic residue**** **technique attempts to break up primary clusters in a similar way, although it generates a probe sequence somewhat differently. The probe sequence for key *k *starts with *h*(*k*) and follows with

[*h*(*k*) + *i*^{2}]mod *m*, [*h*(*k*) - *i*^{2}]mod *m *for *i* = 1, 2, . . . , (*m* - 1)/2

[*h*(*k*) + 1^{2}] mod 23, [*h*(*k*) - 1^{2}] mod 23,

[*h*(*k*) + 2^{2}] mod 23, [*h*(*k*) - 2^{2}] mod 23, . . . ,

[*h*(*k*) + 11^{2}] mod 23, [*h*(*k*) - 11^{2}] mod 23

If *k* is 364, *h*(*k*) = 19, and the probe sequence is

19, 20, 18, 0, 15, 5, 10, 12, 3, 21, 17, 9, 6, 22, 16, 12, 1, 8, 7, 4, 11, 2, 13

Returning to the example of the out-of-stock specialty item, we might look for a remedy that avoids even these secondary clusters, since they also tend to increase search and insertion times. It seems reasonable to try to disperse these secondary clusters by giving each would-be customer a seperate set of directions. This may be done by having store managers randomly select the next store for a customer to try. This idea is known as open addressing with * double hashing.* The technique uses a second hash function for

For instance, *p*(657) = 11, since 657 = 50 X 13 + 7 when *m* = 13. Hence the probe sequence for 657 is

7, 5, 3, 1, 12, 10, 8, 6, 4, 2, 0, 11, 9

7, 4, 1, 11, 8, 5, 2, 12, 9, 6, 3, 0, 10

Linear probing 7, 8, 9, 10, 11, 12 (7)

Displaced linear

probing 7, 11, 15, 19, 0, 4, 8, 12, 16 (9)

Secondary

clustering 7, 18, 6, 17, 5, 16 (6)

Quadratic

residue 7, 8, 6, 11, 3 (5)

Double hashing 7, 9, 11, 13 (4)

* Chaining* is another method of collision resolution. The idea is to link together all records that collide at a hash address. These may be kept separately or within the hash table itself.

* Coalesced chaining* maintains lists of records whose keys hash to the same address. These lists are stored within the hash table itself. Whenever two such lists "intersect," they coalesce. This is analogous to two separated clusters becoming connected when a new key is inserted in the location that separated them. Each hash table entry is now a record containing a key value and a link value to the next record on its associated list.

If keys are entered into a hash table in sorted order, open-addressing and chaining techniques will always result in searches along a probe sequence or along a list, in which the keys appear in the same or reverse order, respectively. In both cases, the search algorithm may be easily modified so that whenever a key is encountered that precedes the search key in the relevant ordering, the search is immediately terminated unsuccessfully. This can significantly decrease search times for unsuccessful searches but will not affect successful search times. You can generate the new hash tables (a) through (g) of Figures 9.4 to 9.6 to see the effect on the tables of the keys appearing in sorted order.

Deletion in hash tables using separate chaining is straightforward, since it amounts to the deletion of a record from a list. With open-addressing and coalesced chaining, the deletion of a key is not so straightforward, since a key of the table will be needed to reach another key when it is on that key's probe sequence or list. Deleting it would break this path. One solution to this problem is to mark a deleted key location as "deleted," so that the path is not broken. When a significant fraction of keys has been deleted, the actual usage ratio of the resultant table is low, but search times will still reflect the original higher usage ratio. New insertions, when encountering a "deleted" location, may recapture this wasted storage by occupying such "deleted" locations. However, if "deleted" locations build up, without being recaptured by insertions, search times deteriorate to a linear search for unsuccessful searches and are larger than necessary for successful searches. The only remedy is to rebuild the hash table. Although the algorithm is not given here, this may be done in place, by using only storage allocated to the table in order to carry out the creation of a new table. Hash tables can also be expanded or decreased in size, and recreated. Again, this can be done in place.

We turn now to the next data structure: binary search trees. A * binary search tree* is a binary tree with the property that its left subtree contains records no larger than the root record, its right subtree contains records no smaller than the root record, and its left and right subtrees are also binary search trees. This is a recursive definition.

In Chapter 7, three important traversals through the nodes of any binary tree were defined: 1) preorder, 2) inorder, and 3) postorder. In order to search a binary tree to determine if a particular key value is stored, any one of those traversals could be used to gain access to each node in turn, and the node tested to see if the search key value is there. This would be equivalent to a linear search.

Assume that nodes of the binary search tree are represented as follows.

typedef struct treenode

{

whatever info;

struct treenode *leftptr;

struct treenode *rightptr;

}binarytreenode, *binarytreepointer;

This special search may be implemented as follows:

found = FALSE;

p = t;

pred = NULL;

while ((p != NULL) && (!found))

if (keyvalue == p->key)

found = TRUE;

else if (keyvalue < p->key)

{

pred = p;

p = p->leftptr;

}

else

{

pred = p;

p = p->rightptr:

}

Found indicates whether the search key value was present in the tree. If so, p points to the node in which the search key was found, and pred points to its predecessor. If the search key was not found, then pred points to the node that would have been its predecessor had it been in the tree. Note that this allows additional information to be obtained, if desired, beyond the fact that the search key was not found. The key less than the search key and the key greater than the search key can easily be determined, so that records closest to the desired one are known.

Notice that when (b) of Figure 9.7 is searched in this way, the search is equivalent to a linear search of an array in which the records are stored in sorted order. This search algorithm, applied to (c), would be equivalent to a binary search of an array in which the records are stored in sorted order. A search of (a) would fall somewhere between these two extremes.

What allows the search of Figure 9.7(c) to be equivalent to a binary search? It is the fact that the tree is "scrunched up" or near minimum possible depth, and not as straggly (unnecessarily deep) as the others. It has an intuitive kind of "balance," so that we eliminate from consideration about half the records each time we make a comparison. We are back to the game of "twenty questions.''

This section develops an algorithm for creating or growing a binary search tree that stores records at its nodes. Imagine that you have already processed some records and created a binary search tree for them. Given the next input record, search the current tree for its key value. Assume no duplicates will occur in the tree, although this need not create any real problem, since a node can contain a pointer to a list, stored separately, of records with identical key values. Eventually, since the new record is not in the tree, a null subtree will be reached. However, if this null subtree is replaced by a node that stores this record, the resultant tree will still be a binary search tree. For instance, inserting 45 by this procedure in Figure 9.7(c) produces Figure 9.8. This gives an algorithm for growing a binary search tree. Start with the first input record at the root. For each additional input, search the tree. The search will reach a node whose left or right successor must be followed, but the successor is null. The record replaces that null subtree.

p = malloc(sizeof(binarytreenode));

p->key = keyvalue;

p->leftptr = NULL;

p->rightptr = NULL;

if (keyvalue < pred->key)

pred->leftptr = p;

else

pred->rightptr = p;

We now have algorithms for growing, searching, and inserting records into a binary search tree.

The binary search tree grown by the algorithm of the preceding section will have its shape determined by the input order of the records to be stored. If they are input in sorted order or reverse sorted order, the result is Figure 9.7(b) or its mirror image. If they are input in the order 55, 40, 83, 35, 51, 80, 85, 17, 37, 50, 52, 70, 81, 84, 100, the result is Figure 9.7(c). There are *n*! possible input orderings of *n* distinct keys. Some of these may give rise to the same binary search tree. Recall that the tree grown cannot have depth greater than *n* nor less than lg(*n* + 1). Suppose the input is selected so that each of the *n*! possible orderings is equally likely to occur. This would be the case if the first input record is selected at random from among the *n* records, the second is selected at random from among the remaining *n* - 1 records, and so on. What will be the *average* depth of trees grown by the algorithm in this case? To calculate the average depth directly would mean to generate, for each of the *n*! orderings of input records, the binary search tree grown and note its depth. Then add up the *n*! depths and divide by *n*! to obtain the average depth.

It is difficult to analyze this problem using probability theory. However, the average depth has been shown to be *O*(*c* lg *n*), where *c* is about 4. The **average search time**** **can be calculated for a randomly grown binary search tree. It is the average time required to search for a record stored in the binary search tree, assuming each record is equally likely to be searched. This average search time is *O*(1.39 lg *n*). In the case of random input, the average search of the binary search tree grown by the algorithm will be only 39 percent more than the smallest possible average search time! Moreover, the algorithm for growing the tree will take an average time at most *O*(*n*lg *n*). If the records are given in sorted order instead of in random order, it will take the algorithm *O*(*n*^{2}) time to grow the tree.

It is possible to grow "balanced" binary search trees, also called AVL trees (after the discoverers of these trees, Adelson-Velskii and Landis [1962]). The algorithm for growing AVL trees is slower and more complex, as is the insertion and deletion of records. However, they are guaranteed to have depth no greater than about 1.44 lg *n*. This is the *worst-case* depth. Consequently, AVL trees may be searched in time at most O(1.44 lg *n*). Such trees appear quite healthy and are never sparse. However it takes great care to grow healthy trees, as will be seen later.

At this point all the important operations on binary search trees have been considered except deletion. The purpose of this section is to derive an algorithm for deleting a record from a binary search tree so that the remaining tree is still a binary search tree. To delete a terminal record, simply replace it by a null tree. To delete a record with only one successor, simply insert the successor's subtree at the position occupied by the deleted node. For instance, in the tree shown in Figure 9.9(a), deleting 51 and 71 yields the tree shown in Figure 9.9(b).

The "simple" algorithms for growing binary search trees and inserting and deleting records within them are relatively simple and have excellent time and storage requirements--on the average--when randomly done. These randomly grown trees can be fine for compiler symbol tables or for small data base applications such as dictionaries, but there are situations where good average behavior is not enough. Suppose your rocket ship's computer must search a binary search tree for retro-rocket firing instructions and must complete the search within a specific interval. This is not the time to find that the tree is out of shape and that a much longer than average time will ensue.

One answer is the AVL tree, which is discussed in this section. An * AVL tree *is a balanced binary tree that is also a binary search tree. (

This imbalance is remedied by rearranging the subtree with root 55 as shown in Figure 9.10(c). Notice that the subtrees with roots 40 and 70 are no longer out of balance. In fact, the tree of Figure 9.10(c) is now an AVL tree. Notice also that the subtree in Figure 9.10(c) replacing the subtree of 55 in Figure 9.10(b) has exactly the same depth as the original subtree of 55 in Figure 9.10(a). If any of the other insertions that would have produced an imbalance in Figure 9.10(a) had occurred, they could have been remedied by similar rearrangements. You should confirm this by trying them all and finding the suitable rearrangements. Remember, the rearrangements must not only rebalance but also preserve the binary search tree property.

Consider imbalance *LL*. The remedy looks as though *A* has moved up to the root and *B* has moved to the right. This is called a *rotation to the right of the subtree with *B* at its root*. The remedy for imbalance *RR* would be described as a *rotation to the left of the subtree with* B *at its root*. To rectify imbalance *LR*, perform a rotation to the left of the subtree with root *A* to obtain Figure 9.14(a). A further rotation to the right of the subtree rooted at *C* produces (b) of Figure 9.14. Assume that rotations are accomplished by the procedures rotateleft and

LL: rotateright(plast) where last points toB

RR: rotateleft(plast) where last points toB

LR: rotateleft(p(last.leftptr)) where last points toC

rotateright(plast)

RL: rotateright(p(last.rightptr)) where last points toC

rotateleft(plast)

The procedure for rotateright, for example, might be as follows:

rotateright(plocalroot)

/* Performs a rotation to the right

of the subtree whose root is

pointed to by localroot.

*/

binarytreepointer *plocalroot;

{

binarytreepointer q;

q = (*plocalroot).leftptr;

(*plocalroot).leftptr = q->rightptr;

q->rightptr = *plocalroot;

*plocalroot = q;

}

typedef struct treenode

{

whatever key;

struct treenode *leftptr;

struct treenode *rightptr;

int balance;

}binarytreenode,*binarytreepointer;

The algorithm for * inserting a node into an AVL tree* can now be written.

a. Create a new node for it and set its field values.

b. Insert the new node into the tree.

c. If the tree has more than one node, then

If it is necessary to rebalance, then

ii. If this root was the root of the tree itself, then

set the head of the tree to this root; otherwise

set its predecessor to this root;

reset the balance of the predecessor of the inserted node.

During the search, an updated pointer to the predecessor node of last should be kept in predlast. P will point to the node whose key field is currently being compared with the search key, and predp points to its predecessor. An implementation of the AVL tree insertion is given in the insertavl function. For clarity, the procedures createnode and search are then given separately, followed by procedures insertnode and resetpathbalances.

#define TRUE 1

#define FALSE 0

#define NULL 0

insertavl(keyvalue,pt)

/* Inserts a new record with key equal to

keyvalue in the AVL tree t, if a record

with this keyvalue is not present in t.

After insertion t will still be an AVL tree.

*/

whatever keyvalue;

binarytreepointer *pt;

{

binarytreepointer p,predp,last,predlast,q;

int found;

search(keyvalue,*pt,&p,&predp,

&last,&predlast,&found);

if(!found)

{

createnode(keyvalue,&p);

insertnode(keyvalue,p,predp,pt);

if(p != *pt)

if(predp->balance==0)

if(keyvalue < last->key)

if(last->balance == 0)

{

last->balance = -1;

resetpathbalances(keyvalue,

last->leftptr,p);

}

else if(last->balance == +1)

{

last->balance = 0;

resetpathbalances(keyvalue,

last->leftptr,p);

}

else

{

q = last->leftptr;

if(keyvalue < q->key)

ll(keyvalue,&last,p);

else

lr(keyvalue,&last,&q,p);

last->balance = 0;

if(predlast == NULL)

*pt = last;

else if(keyvalue < predlast->key)

predlast->leftptr = last;

else

predlast->rightptr = last;

}

else

if(last->balance == 0)

{

last->balance = +1;

resetpathbalances(keyvalue,

last->rightptr,p);

}

else if(last->balance == -1)

{

last->balance = 0;

resetpathbalances(keyvalue,

last->rightptr,p);

}

else

{

q = last->rightptr;

if(keyvalue < q->key)

rr(keyvalue,&last,p);

else

rl(keyvalue,&last,&q,p);

last->balance = 0;

if(predlast == NULL)

*pt = last;

else if(keyvalue < predlast->key)

predlast->leftptr = last;

else

predlast->rightptr = last;

}

else

predp->balance = 0;

}

}

ll(keyvalue,plast,p)

/* Applies remedy ll, resets the balance of the node

involved in the remedy but not on the new search

path, and resets the proper new search path balances.

*/

whatever keyvalue;

binarytreepointer *plast,p;

{

rotateright(plast);

((*plast)->rightptr)->balance = 0;

resetpathbalances(keyvalue,(*plast)->leftptr,p);

}

lr(keyvalue,plast,pq,p)

/* Applies remedy lr, resets the balance of the node

involved in the remedy but not on the new search

path, and resets the proper new search path balances.

*/

whatever keyvalue;

binarytreepointer *plast,*pq,p;

{

rotateleft(&((*plast)->leftptr));

rotateright(plast);

if(keyvalue < (*plast)->key)

{

(*pq).balance = 0;

((*plast)->rightptr)->balance = +1;

resetpathbalances(keyvalue,

((*plast)->leftptr)->rightptr,p);

}

else if(keyvalue > (*plast)->key)

{

(*pq)->balance = -1;

((*plast)->rightptr)->balance = 0;

resetpathbalances(keyvalue,

((*plast)->rightptr)->leftptr,p);

}

else

((*plast)->rightptr)->balance = 0;

}

rr(keyvalue,plast,p)

/* Applies remedy rr, resets the balance

of the node involved in the remedy but

not on the new search path, and resets

the proper new search path balances.

*/

whatever keyvalue;

binarytreepointer *plast,p;

{

rotateleft(plast);

((*plast)->leftptr)->balance = 0;

resetpathbalances(keyvalue,(*plast)->rightptr,p);

}

rl(keyvalue,plast,pq,p)

/* Applies remedy rl, resets the balance

of the node involved in the remedy but

not on the new search path, and resets

the proper new search path balances.

*/

whatever keyvalue;

binarytreepointer *plast,*pq,p;

{

rotateright(&((*plast)->rightptr));

rotateleft(plast);

if(keyvalue > (*plast)->key)

{

(*pq)->balance = 0;

((*plast)->leftptr)->balance = -1;

resetpathbalances(keyvalue,((*plast)->rightptr)->leftptr,p);

}

else if(keyvalue < (*plast)->key)

{

(*pq)->balance = +1;

((*plast)->leftptr)->balance = 0;

resetpathbalances(keyvalue,((*plast)->leftptr)->rightptr,p);

}

else

((*plast)->leftptr)->balance = 0;

}

rotateright(plocalroot)

/* Performs a rotation to the right

of the subtree whose root is

pointed to by localroot.

*/

binarytreepointer *plocalroot;

{

binarytreepointer q;

q = (*plocalroot)->leftptr;

(*plocalroot)->leftptr = q->rightptr;

q->rightptr = *plocalroot;

*plocalroot = q;

}

rotateleft(plocalroot)

/* Performs a rotation to the left

of the subtree whose root is

pointed to by localroot.

*/

binarytreepointer *plocalroot;

{

binarytreepointer q;

q = (*plocalroot)->rightptr;

(*plocalroot)->rightptr = q->leftptr;

q->leftptr = *plocalroot;

*plocalroot = q;

}

resetpathbalances(keyvalue,start,p)

/* Resets the balances of all nodes on

search path between the nodes

pointed to by start and p.

*/

whatever keyvalue;

binarytreepointer start,p;

{

binarytreepointer q;

q = start;

while(q !=p)

if(keyvalue < q->key)

{

q-balance = -1;

q = q->leftptr;

}

else

{

q->balance = +1;

q = q->rightptr;

}

}

createnode(keyvalue,pp)

/* Returns with p pointing to a new node

record with key equal to keyvalue,

left and right pointer fields null, and

balance field zero.

*/

whatever keyvalue;

binarytreepointer *pp;

{

*pp = malloc(sizeof(binarytreenode));

(*pp)->key = keyvalue;

(*pp)->leftptr = NULL;

(*pp)->rightptr = NULL;

(*pp)->balance = 0;

}

search(keyvalue,t,pp,ppredp,plast,ppredlast,pfound)

/* Returns with found true only if the binary search

tree t has a node with its key equal to keyvalue.

If found is true

p and predp point respectively, to

the found node and its predecessor,

else

predp, and last, will point, respectively,

to the node that will be the new node's

predecessor, and the node along the search

path closest to the new node's insertion point

that may become unbalanced.

*/

whatever keyvalue;

binarytreepointer t,*pp,*ppredp,*plast,*ppredlast;

int *pfound;

{

*pfound = FALSE;

*pp = t;

*plast = t;

*ppredp = NULL;

*ppredlast = NULL;

while((*pp != NULL) && (!*pfound))

if(keyvalue < (*pp)->key)

{

if((*pp)->balance != 0)

{

*ppredlast = *ppredp;

*plast = *pp;

}

*ppredp = *pp;

*pp = (*pp)->leftptr;

}

else if(keyvalue > (*pp)->key)

{

if((*pp)->balance != 0)

{

*ppredlast = *ppredp;

*plast = *pp;

}

*ppredp = *pp;

*pp = (*pp)->rightptr;

}

else

*pfound = TRUE;

}

insertnode(keyvalue,p,predp,pt)

/* Inserts the new node, pointed to

by p, as the proper successor

of predp.

*/

whatever keyvalue;

binarytreepointer p,predp,*pt;

{

if(predp == NULL)

*pt = p;

else

if(predp == *pt)

if(keyvalue < (*pt)->key)

(*pt)->leftptr = p;

else

(*pt)->rightptr = p;

else

if(keyvalue < predp->key)

predp->leftptr = p;

else

predp->rightptr = p;

}

To insert a node in t:

If t is null then

create the new node, set its field values and set t to it

else

if keyvalue < t.key, then

insert the node in t.leftptr

if t is balanced, then

reset its balance

else

rebalance t and reset the balances of the nodes involved in the

remedy

else if keyvalue > t.key then

insert the node in t.rightptr

if t is balanced, then

reset its balance

else

rebalance t and reset the balances of the nodes involved in the

remedy.

The recursive insertavl procedure is as follows:

#define TRUE 1

#define FALSE 0

#define NULL 0

insertavl(keyvalue,pt,pincrease)

/* Inserts a new record with key value to

keyvalue in the AVL tree t, if a record

with this keyvalue is not present in t.

After insertion t will still be an AVL tree.

Increase must be zero when the function is invoked.

*/

whatever keyvalue;

binarytreepointer *pt;

int *pincrease;

{

binarytreepointer q;

if(*pt == NULL)

{

createnode(keyvalue,pt);

*pincrease = 1;

}

else if(keyvalue < (*pt)->key)

{

insertavl(keyvalue,

&((*pt)->leftptr),pincrease);

if(*pincrease) == 1)

if((*pt)->balance == 0)

(*pt)->balance = -1;

else if((*pt)->balance == +1)

{

(*pt)->balance = 0;

*pincrease = 0;

}

else

{

q = (*pt)->leftptr;

if(keyvalue < q->key)

ll(pt);

else

lr (pt,q);

(*pt)->balance = 0;

*pincrease = 0;

1t's depth did not increase

}

}

else if(keyvalue > (*pt)->key)

{

insertavl(keyvalue,

&((*pt)->rightptr),pincrease);

if(*pincrease == 1)

if((*pt)->balance == 0)

(*pt)->balance = +1;

else if((*pt)->balance == -1)

{

(*pt)->balance = 0;

*pincrease = 0;

}

else

{

q = (*pt)->rightptr;

if(keyvalue > q->key)

rr (pt)

else

rl(pt,q);

(*pt)->balance = 0;

*pincrease = 0;

}

}

}

ll(pt)

/* Apply remedy ll and reset the

balance of the node involved in

the remedy.

*/

binarytreepointer *pt;

{

rotateright(pt);

reset1balance((*pt)->rightptr);

}

lr(pt,q)

/* Apply remedy lr and reset the balances

of the two nodes involved in

the remedy.

*/

binarytreepointer *pt,q;

{

rotateleft(&((*pt)->leftptr));

rotateright(pt);

reset2balances(q,*pt,(*pt)->rightptr);

}

rr(pt)

/* Apply remedy rr and reset the

balance of the node involved in

the remedy.

*/

binarytreepointer *pt;

{

rotateleft(pt);

reset1balance((*pt)->leftptr);

}

rl(pt,q)

/* Apply remedy rl and reset the balances

of the two nodes involved in the remedy.

*/

binarytreepointer *pt,q;

{

rotateright(&((*pt)->rightptr));

rotateleft(pt);

reset2balances(q,*pt,(*pt)->leftptr);

}

rotateright(plocalroot)

/* Performs a rotation to the right

of the subtree whose root is

pointed to by localroot.

*/

binarytreepointer *plocalroot;

{

binarytreepointer q;

q = (*plocalroot)->leftptr;

(*plocalroot)->leftptr = q->rightptr;

q->rightptr = *plocalroot;

*plocalroot = q;

}

rotateleft(plocalroot)

/* Performs a rotation to the left

of the subtree whose root is

pointed to by localroot.

*/

binarytreepointer *plocalroot;

{

binarytreepointer q;

q = (*plocalroot)->rightptr;

(*plocalroot)->rightptr = q->leftptr;

q->leftptr = *plocalroot;

*plocalroot = q;

}

createnode(keyvalue,pp)

/* Returns with p pointing to a new node

record with key equal to keyvalue,

left and right pointer fields null, and balance field zero.

*/

whatever keyvalue;

binarytreepointer *pp;

{

*pp = malloc(sizeof(binarytreenode));

(*pp)->key = keyvalue;

(*pp)->leftptr = NULL;

(*pp)->rightptr = NULL;

(*pp)->balance = 0;

}

The functions for resetting balances are

reset1balance(q)

/* Sets the balance of node

q to zero.

*/

binarytreepointer q;

{

q->balance = 0;

}

reset2balances(q,t,p)

/* Resets the balance of nodes

p and q determined by the

balance of t.

*/

binarytreepointer q,t,p;

{

if(t->balance == -1)

p->balance = +1;

else

p->balance = 0;

if(t->balance == +1)

q->balance = -1;

else

q->balance = 0;

}

#include <stdio.h>

#define NULL 0

typedef int whatever;

typedef struct treenode

{

whatever key;

struct treenode *leftptr;

struct treenode *rightptr;

int balance;

}binarytreenode,*binarytreepointer;

binarytreepointer left(p)

/* Returns a copy of the left pointer

of the node pointed to by p.

*/

binarytreepointer p;

{

return(p->leftptr);

}

binarytreepointer right(p)

/* Returns a copy of the right pointer

of the node pointed to by p.

*/

binarytreepointer p;

{

return(p->rightptr);

}

info(l)

/* Returns the key field value of

the node pointed to by l.

*/

binarytreepointer l;

{

return(l->key);

}

bal(l)

/* Returns the balance field value of

the node pointed to by l.

*/

binarytreepointer l;

{

return(l->balance);

}

binarytreepointer setnull()

/* Returns a null pointer. */

{

return(NULL);

}

#define SENTINEL -1

main()

/* Reads record information, creates an AVL tree

storing the records, prints the key field values

of the tree records in preorder access order

after each record is inserted in the tree, and

prints the key & balance field values of the

final tree in preorder access order.

*/

{

binarytreepointer t,setnull();

int keyvalue,increase;

t = setnull();

printf(\" enter keyvalue or sentinel \n");

scanf("%d",&keyvalue);

while(keyvalue != SENTINEL)

}

increase = 0;

insertavl(keyvalue,&t,&increase);

printf("\n enter keyvalue or sentinel \n");

scanf("%d",&keyvalue);

printtree(t);

}

preorder (t);

}

printtree(t)

/* Prints the keys of t in

preorder access order.

*/

binarytreepointer t,left(),right();

{

printf("\n %d \n",info(l));

printtree(left(t));

printtree(right(t));

}

preorder(t)

/* Prints the keys & balances of

t in preorder access order.

*/

binarytreepointer t,left(),right(),setnull();

{

if(t == setnull())

printf("\n The tree is null \n");

else

{

process(t);

preorder(left(t));

preorder(right(t));

}

}

process(t)

/* Prints the key and balance fields of

the node pointed to by t.

*/

binarytreepointer t;

{

printf("\n key:%d balance:%d \n",info(t),bal(t));

}

Node 11, the predecessor of 12, is now unbalanced. Rebalancing can be achieved by applying remedy *LR* at node 11 to obtain Figure 9.16(c). Node 8, the predecessor of 11 before rebalancing, is now unbalanced. Rebalancing can be done by applying remedy *LL* at node 8 to obtain Figure 9.16(d).

A recursive deletion algorithm is sketched below.

To delete a node from t:

If t is not null, then

if keyvalue<t.key, then

delete the node from t.leftptr

if t is balanced, then

reset its balance

else

rebalance t and reset balances

else if keyvalue>t.key, then

delete the node from t.rightptr

if t is balanced, then

reset its balance

else

rebalance t and reset balances

else

delete the node with the largest key value in t.leftptr

replace the root of t by that deleted node

if t is balanced, then

reset its balance

else

rebalance t and reset balances.

Given a key value to find, it is easy to search the AVL tree for it. However, consider the records stored at the nodes as ordered by their processing positions in an inorder traversal of the AVL tree. For instance, in the AVL tree of Figure 9.10, the ordering is

1.179.60

2.3510.70

3.3711.80

4.4012.81

5.4513.83

6.5014.85

7.5315.100

8.55

Easily constructed binary search trees allow efficient insertion and deletion and can also be searched in* average time O*(1g *n*). More complex construction, insertion, and deletion algorithms for balanced binary search trees guarantee search, insertion, and deletion in *worst-case time O*(1g *n*). Binary search trees allow easy access to records in sorted order, and balanced binary search trees also allow *O*(lg *i*) access to the* i*th record in sorted order. It is not necessary to estimate in advance how many entries will be made in a binary search tree, as it is for hash tables. Hash tables can be searched very quickly, on the average, when properly constructed, but they do not allow records to be easily enumerated in sorted order.

**1. **Justify all the entries of Table 9.1 for the priority queue implementations.

**2.** Write two functions to insert and delete, respectively, records from a priority queue implemented as a heap.

**b.** How many probes were needed to insert each name?

**4. **Criticize the choice of hash function in Exercise 3.

**5.** Search the hash table of Exercise 3 for John Smith, and write down the probe addresses needed.

**6.** Show the state of hash tables (a) through (g) of Section 9.3 when key 019 is deleled.

**7.** Suppose that the eighteen keys are entered into the hash table of Section 9.3 in order, from largest to smallest. As you search for a key, you conclude that it is not in the table whenever you encounter a key value smaller than the search key as you trace out the linear probe path. Convince yourself that this is true, and that it will result in fewer probes for an unsuccessful search.

**8.** Write a function to build a hash table for each of the open-addressing and chaining techniques of Section 9.3.

**9.** Run each function of Exercise 8 with the input data and hash function of Exercise 3.

**10.** What is the maximum number of probes for an unsuccessful search of each of the hash tables (a)-(g) of Figures 9.4, 9.5, and 9.6?

**11.** Can you find a way to build a hash table as in Exercise 7 so that the sorted search can be done even if the input was not actually in sorted order?

**12.** How might the average number of probes be studied for a given hash function as in Exercise 3?

**13.** For *m* = 107, generate *n* random key values between 000 and 999 and build (by executing a program) hash tables (a) to (g) of Figures 9.4, 9.5, and 9.6 for these keys. Do this twenty-five times for each *n* and output the average number of probes to insert the *n* keys over the twenty-five samples. Do this repeatedly for a series of *n* values corresponding to usage ratios of approximately 0.50, 0.55, 0.60, . . ., 0.90. 0.95. Compare results of this simulation to your expectations.

**14.** Take the nametable names of Chapter 6 and grow a simple binary search tree (as in Section 9.4) when the names are input in the order in which they appear in that figure. The order is determined alphabetically by last name, first name, and then middle initial. Thus Robert Evans precedes Harry Truman. If a duplicate name occurs, do not enter it again in the tree.

**15.** Delete Paulina Koch from the binary search tree of Exercise 14, using the simple deletion algorithm of Section 9.4.

**16.** Discuss the advantages and disadvantages of storing the nametable entries of Chapter 6 in a sorted array, sorted chain, binary search tree, AVL tree, and hash table.

**17.** Write a function, search, with parameters keyvalue, p, and t, a pointer to the root node of a binary search tree. Search is to return with p pointing to the node of t in which keyvalue equals pkey. If no such node exists, p should return with its value null.

**18.** Write a function, insertbst, with parameters keyvalue and t. It is to insert a node with keyvalue into the simple binary search tree t if no node exists with that key.

**19.** Write a function, de1etebst, with parameters keyvalue and t. It is to delete a node whose key equals keyvalue in the simple binary search tree t, if such a node exists.

**20.** Should a linked or sequential implementation be used for binary search trees grown "simply" when they will contain fifty integers picked at random from the first 1,000 integers? Why?

**21.** What is the minimum depth of a binary search tree storing 137 entries, and how may it be constructed in time *O*(*n*)* *for *n* entries?

**22.** Should a linked or sequential implementation be used for a binary search tree grown as an AVL tree when it will contain 50 integers picked at random from the first 1,000 integers? Why? (Cf. Exercise 20.)

**23.** Follow the same directions as in Exercise 14, but grow a binary search tree that is an AVL tree. It should be the same AVL tree that would result if it were grown by invoking insertavl of Section 9.4.6 for each input name.

**24.** Create a binary search tree, with the names of Exercise 23 in it, which has minimum possible depth.

**25.** Delete the rightmost node of the AVL tree grown in Exercise 23, using the recursive deletion algorithm of Section 9.4.6.

**26.** Modify the nonrecursive function insertavl so that indexing can be done as described in Section 9.4.6.

**27.** Modify the recursive function insertavl so that indexing can be done as described in Section 9.4.6.

**28.** Why does growing and then inorder traversing an AVL tree correspond to an *O*(*n* lg *n*) sort?

**29.** Write a function search, with parameters indexvalue, p, and t. It is to return with p pointing to the node, if it exists, in the AVL tree t, whose index field is equal to indexvalue. P should be null if there is no such node.

**30.** How might we estimate the relation between the average depth of a simple binary search tree grown randomly and *n*, the number of inputs?

**31.** Suppose AVL trees are grown randomly. That is, each of the *n *! arrangements of *n* distinct inputs is equally likely to occur. Starting with the null tree, insertavl is invoked for each input. How might the average depth of such AVL trees be estimated as a function of *n*?

**1.** This is the same as the suggested assignment 2 of Chapter 2, except the integers for part (a) and the records for parts (b) and (c) are to be stored in a binary search tree.

**2.** Design the nametable for the case study of Chapter 6. Discuss the relative merits of arrays, lists, binary search trees, AVL trees, and hash tables for its implementation.

**3.** Write an efficient check (t,d,flag) to return in d the depth of the binary tree pointed to by t, and to return in flag *true* if t points to a balanced binary tree and *false *otherwise.