The notion of a symbol table arises frequently in computer science. When building loaders, assemblers, compilers, or any keyword driven translator a symbol table is used. In these contexts a symbol table is a *set of name-value pairs*. Associated with each name in the table is an attribute, a collection of attributes, or some directions about what further processing is needed. The operations that one generally wants to perform on symbol tables are (i) ask if a particular name is already present, (ii) retrieve the attributes of that name, (iii) insert a new name and its value, and (iv) delete a name and its value. In some applications, one wishes to perform only the first three of these operations. However, in its fullest generality we wish to study the representation of a structure which allows these four operations to be performed efficiently.

Imagine the following set of declarations for a language with block structure:

begin

integeri,j;

.

.

.

begin

realx,i;

.

.

The representation of the symbol table for these declarations would look like *S *=

INSERT(INSERT(INSERT(INSERT(CREATE,*i***,integer**),

INSERT(INSERT(INSERT(CREATE,*i*,**integer**),*j*,**integer**),*x*,**real**)

DELETE(INSERT(S,a,r),b) :: =

ifEQUAL(a,b)thenDELETE(S,b)

elseINSERT(DELETE(S,b),a,r)

structureSYMBOL__TABLE

declareCREATE( )symtb

INSERT(symtb,name, value)symtb

DELETE(symtb,name)symtb

FlND(symtb,name)value

HAS(symtb,name)Boolean

ISMTST(symtb)Boolean;

for allSsymtb, a,bname, rvaluelet

ISMTST(CREATE) :: =true

ISMTST(INSERT(S,a,r)) :: =false

HAS(CREATE,a) :: =false

HAS(INSERT(S,a,r),b) :: =

ifEQUAL(a,b)then true elseHAS(S,b)

DELETE(CREATE,a) :: =CREATE

DELETE(INSERT(S,a,r),b) :: =

ifEQUAL(a,b)thenS

elseINSERT(DELETE(S,b),a,r)

FIND(CREATE,a) :: =error

FIND(INSERT(S,a,r),b) :: =ifEQUAL(a,b)thenr

elseFIND(S,b)

end

endSYMBOL__TABLE

(ii) all identifiers in the right subtree of *T *are greater than the identifier in the root node *T*;

(iii) the left and right subtrees of *T* are also binary search trees.

procedureSEARCH(T,X,i)

//search the binary search treeTforX. Each node has fields

LCHILD, IDENT, RCHILD. Returni= 0 ifXis not inT.

Otherwise, returnisuch thatIDENT(i)= X.//

1iT

2whilei0do

3case

4 :X< IDENT(i):iLCHILD(i)//search left subtree//

5 :X = IDENT(i):return

6 :X> IDENT(i):iRCHILD(i)//search right subtree//

7end

8end

9endSEARCH

In evaluating binary search trees, it is useful to add a special "square" node at every place there is a null link. Doing this to the trees of figure 9.1 yields the trees of figure 9.3. Remember that every binary tree with *n* nodes has *n* + 1 null links and hence it will have *n* + 1 square nodes. We shall call these nodes *external* nodes because they are not part of the original tree. The remaining nodes will be called *internal* nodes. Each time a binary search tree is examined for an identifier which is not in the tree, the search terminates at an external node. Since all such searches represent unsuccessful searches, external nodes will also be referred to as *failure* nodes. A binary tree with external nodes added is an *extended binary tree*. Figure 9.3 shows the extended binary trees corresponding to the search trees of figure 9.1. We define the *external path length* of a binary tree to be the sum over all external nodes of the lengths of the paths from the root to those nodes. Analogously, the *internal path length* is defined to be the sum over all internal nodes of the lengths of the paths from the root to those nodes. For the tree of figure 9.3(a) we obtain its internal path length, *I*, to be:

Its external path length, *E*, is:

*E* = 2 + 2 + 4 + 4 + 3 + 2 = 17.

This can be more compactly written as

One tree with minimal internal path length is the complete binary tree defined in section 5.2.

Their respective weighted external path lengths are:

Binary trees with minimal weighted external path length find application in several areas. One application is to determine an optimal merge pattern for *n* + 1 runs using a 2-way merge. If we have four runs *R*_{1}-*R*_{4} with *q _{i}* being the number of records in run

Another application of binary trees with minimal external path length is to obtain an optimal set of codes for messages *M*_{1}, ...,*M _{n}* + 1. Each code is a binary string which will be used for transmission of the corresponding message. At the receiving end the code will be decoded using a decode tree. A decode tree is a binary tree in which external nodes represent messages. The binary bits in the code word for a message determine the branching needed at each level of the decode tree to reach the correct external node. For example, if we interpret a zero as a left branch and a one as a right branch, then the decode tree corresponds to codes 000, 001, 01 and 1 for messages

procedureHUFFMAN(L,n)

//Lis a list ofnsingle node binary trees as described above//

fori1ton- 1do//loopn- 1 times//

callGETNODE(T)//create a new binary tree//

LCHILD(T)LEAST(L)//by combining the trees//

RCH1LD(T)LEAST(L)//with the two smallest weights//

WEIGHT(T)WEIGHT(LCHILD(T))

+WEIGHT(RCHILD(T))

callINSERT(L,T)

end

endHUFFMAN

The weighted external path length of this tree is

2 4 + 3 4 + 5 3 + 13 2 + 7 2 + 9 2 = 93

In comparison, the best complete binary tree has weighted path length 95.

An *optimal binary search tree* for the identifier set *a*_{1}, ...,*a _{n}* is one which minimizes eq. (9.1) over all possible binary search trees for this identifier set. Note that since all searches must terminate either successfully or unsuccessfully we have

**Example 9.1:** The possible binary search trees for the identifier set (*a*_{1},*a*_{2},*a*_{3}) = (**do**, **if**, **stop**) are:

With equal probabilities *p _{i}* =

cost (treea) = 15/7; cost (treeb) = 13/7

cost (treec) = 15/7; cost (treed) = 15/7

cost (treee) = 15/7.

cost (treea) = 2.65; cost (treeb) = 1.9

cost (treec) = 1.5; cost (treed) = 2.05

cost (treee) = 1.6

Tree *c* is optimal with this assignment of *p's* and *q's*.

c=_{i,j}p+ cost (_{k}L) + cost (R) +w-1+_{i,k}w_{k,j}

weight (L) = weight (T-1) =_{i,k}w-1_{i,k}

weight (R) = weight (T) =_{k,j}w_{k,j}

c=_{i,j}p+_{k}w-1 +_{i,k}w+_{k,j}c-1 +_{i,k}c_{k,j}

=wi,j+c-1 +_{i,k}c_{k,j }

Since *T _{ij}* is optimal it follows from eq. (9.3) that

w+_{ij}c-1 + c_{i,k}= min {_{k,j}w+_{ij}c1 +_{i,l-}c}_{l,j}

i<lj

c-1 +_{i,k}c= min {_{k,j}c-1 +_{i,1}c}_{l,j}

i<lj

**Example 9.2:** Let *n *= 4 and (*a*_{1}, *a*_{2}, *a*_{3}, *a*_{4})_{ }= (**do**, **if**, **read**, **while**). Let (*p*_{1}, *p*_{2}, *p*_{3}, *p*_{4}) = (3,3,1,1) and (*q*_{0}, *q*_{1}, *q*_{2}, *q*_{3}, *q*_{4}) = (2,3,1,1,1). The *p*'s and *q*'s have been multiplied by 16 for convenience. Initially, we have *w _{i,i}* =

01 =^{w}1 +^{p}00 +^{w}11 =^{w}p1 +q1 +00 = 8^{w}

c01 =w01 + min {c00 +c11} = 8

r01 = 1

w12 =p2 +w11 +w22 =p2 +q2 +w11 = 7

c12 =w12 + min {c11 +c22} = 7

r12 = 2

23 =^{w}p3 +w22 +w33 =p3 +q3 +w22 = 3

c23 =w23 + min {c22 +c33} = 3

r23 = 3

w34 =p4 +w33 +w44 =p4 +q4 +w33 = 3

34 =^{c}w34 + min {c33 +c44} = 3

34= 4^{r}

procedureOBST(p)_{i},q_{i},n

//Givenndistinct identifiersa_{1}<a_{2}< ... <aand probabilities_{n}p,_{i}

1inandq, 0_{i}inthis algorithm computes the costc_{ij}

of optimal binary search treesTfor identifiers_{ij}a+1, ...,_{i}a._{j}

It also computesr, the root of_{ij}T._{ij}wis the weight of_{ij}T_{ij}//

fori0ton- 1do

(w,_{ii}r,_{ii}c) (_{ii}q,0,0)_{i}//initialize//

(w+1,_{i,i}r+1,_{i,i}c+1) (_{i,i}q+_{i}q+1 +_{i}p+1,_{i}i+1,q+_{i}q+1 +_{i}p+1)_{i}

//optimal trees with one node//

end

(w,_{nn}r,_{nn}c) (_{nn}q,0,0)_{n}

form2 tondo//find optimal trees withmnodes//

fori0ton-mdo

ji+m

w_{ij}w-1 +_{i,j}p+_{j}q_{j}

ka value of l in the ranger-1_{i,j}lr+1,_{i}j

that minimizes{c-1 +_{i,l}c}_{l,j}//solve (9.4) using

Knuth's result//

c_{ij}w+ c_{ij}+_{i,k-1}ceq. (9.3)_{k,j }////

r_{ij}k

end

end

endOBST

Dynamic tables may also be maintained as binary search trees. An identifier *X* may be inserted into a binary search tree *T *by using the search algorithm of the previous section to determine the failure node corresponding to *X. *This gives the position in *T *where the insertion is to be made. Figure 9.7 shows the binary search tree obtained by entering the months JANUARY to DECEMBER in that order into an initially empty binary search tree. Algorithm BST is a more formal rendition of the insertion process just described.

procedureBST(X,T,j)

//search the binary search treeTfor the nodejsuch thatIDENT(j)

=X. IfXis not already in the table then it is entered at the

appropriate point. Each node hasLCHILD,IDENTandRCHILD

fields//

p0;jT//pwill trailjthrough the tree//

whilej0do

case

:X < IDENT(j):pj; jLCHILD(j)//search left

sub-tree//

:X = IDENT(j):return

:X > IDENT(j):pj; jRCHILD(j)//search right

sub-tree//

end

end

//Xis not in the tree and can be entered as a child ofp//

callGETNODE(j);IDENT(j)X; LCHILD(j) 0;

RCHILD(j) 0

case

:T= 0:Tj//insert into empty tree//

:X<IDENT(p):LCHILD(p)j

:else:RCHILD(p)j

end

endBST

Adelson-Velskii and Landis in 1962 introduced a binary tree structure that is balanced with respect to the heights of subtrees. As a result of the balanced nature of this type of tree, dynamic retrievals can be performed in *O*(log *n*) time if the tree has *n* nodes in it. At the same time a new identifier can be entered or deleted from such a tree in time* O*(log *n*). The resulting tree remains height balanced. The tree structure introduced by them is given the name AVL-tree. As with binary trees it is natural to define AVL trees recursively.

**Definition: **The *balance factor*, *BF*(*T*), of a node *T* in a binary tree is defined to be *h _{L} *-

LL: new node Y is inserted in the left subtree of the left subtree of A

LR: Y is inserted in the right subtree of the left subtree of A

RR: Y is inserted in the right subtree of the right subtree of A

RL: Y is inserted in the left subtree of the right subtree of A

procedureAVL-INSERT(X,Y,T,)

//the identifierXis inserted into theAVLtree with rootT. Each

node is assumed to have an identifier fieldIDENT, left and right

child fieldsLCHILDandRCHILDand a two bit balance factor

fieldBF. BF(P) = height ofLCHILD(P) - height ofRCHILD(P).

Yis set such thatIDENT(Y) =X.//

//special case: empty treeT= 0.//

ifT= 0then[callGETNODE(Y);IDENT(Y)X; TY;

BF(T) 0;

LCHILD(T) 0;RCHILD(T) 0;return]

//Phase 1: Locate insertion point forX. A keeps track of most recent

node with balance factor 1 andFis the parent ofA.Qfollows

Pthrough the tree.//

F0;AT;PT;Q0

whileP0do//searchTfor insertion point forX//

ifBF(P) 0then[AP;FQ]

case

:X<IDENT(P):QP;PLCHILD(P)//take left

branch//

:X>IDENT(P):QP;PRCHILD(P)//take right

branch//

:else:YP;return//Xis inT//

end

end

//Phase 2: Insert and rebalance.Xis not inTand may be inserted

as the appropriate child of Q//

callGETNODE(Y);IDENT(Y)X;LCHILD(Y) 0;

RCHILD(Y) 0;BF(Y) 0

ifX<IDENT(Q)thenLCHILD(Q)Y//insert as left child//

elseRCHILD(Q)Y//insert as right child//

//adjust balance factors of nodes on path fromAtoQ. Note that

by the definition ofA, all nodes on this path must have balance

factors of 0 and so will change to 1.d= +1 impliesXis inserted

in left subtree ofA.d= -1 impliesXis inserted in right subtree

ofA//

ifX>IDENT(A)then[PRCHILD(A);BP;d-1]

else[PLCHILD(A);BP;d+1]

whilePYdo

ifX>IDENT(P)

then[BF(P) -1;PRCHILD(P)] //heigth of right

increases by 1//

else[BF(P) +1;PLCHILD(P)] //height of left

increases by 1//

end

//Is tree unbalanced?//

ifBF(A) = 0then[BF(A)d;return//tree still balanced//

ifBF(A) +d= 0then[BF(A) 0;return//tree is balanced//

//tree unbalanced, determine rotation type//

ifd= +1then//left imbalance//

case

:BF(B) = +1;//rotation typeLL//

LCHILD(A)RCHILD(B);RCHILD(B)A;

BF(A)BF(B) 0

:else://typeLR//

CRCHILD(B)

RCHILD(B)LCHILD(C)

LCHILD(A)RCHILD(C)

LCHILD(C)B

RCHILD(C)A

case

:BF(C) = +1:BF(A) -1;BF(B) 0//LR(b)//

:BF(C) = -1:BF(B) +1;BF(A) 0//LR(c)//

:else:BF(B) 0;BF(A) 0;//LR(a)//

end

BF(C) 0;BC//Bis a new root//

end

else[//right imbalance; this is symmetric to left imbalance//

//and is left as an exercise//]

//subtree with root B has been rebalanced and is the new subtree//

//ofF. The original subtree ofFhad rootA

case

:F =0 :TB

:A = LCHILD(F) :LCHILD(F)B

:A = RCHILD(F) :RCHILD(F)B

end

endAVL__INSERT

Operation Sequential List Linked List AVL-Tree

---------------------------------------------------------------------

Search for X O(log n) O(n) O(log n)

Search for k^{th }item O(1) O(k) O(log n)

Delete X O(n) O(1) O(log n)

(doubly linked list

and position of X

known)

Delete k^{th }item O(n - k) O(k) O(log n)

Insert X O(n) O(1) O(log n)

(if position for

insertion is known)

Output in order O(n) O(n) O(n)

In tree tables, the search for an identifier key is carried out via a sequence of comparisons. Hashing differs from this in that the address or location of an identifier, *X*, is obtained by computing some arithmetic function, *f*, of *X*. *f*(*X*) gives the address of *X* in the table. This address will be referred to as the hash or home address of *X*. The memory available to maintain the symbol table is assumed to be sequential. This memory is referred to as the hash table, *HT*. The hash table is partitioned into *b* buckets, *HT*(0), ...,*HT*(*b* - 1). Each bucket is capable of holding *s* records. Thus, a bucket is said to consist of *s* slots, each slot being large enough to hold 1 record. Usually *s* = 1 and each bucket can hold exactly 1 record. A hashing function, *f*(*X*), is used to perform an identifier transformation on* X*. *f*(*X*) maps the set of possible identifiers onto the integers 0 through *b* - 1. If the identifiers were drawn from the set of all legal Fortran variable names then there would be *T = ** _{0}*i

A hashing function, *f*, transforms an identifier *X* into a bucket address in the hash table. As mentioned earlier the desired properties of such a function are that it be easily computable and that it minimize the number of collisions. A function such as the one discussed earlier is not a very good choice for a hash function for symbol tables even though it is fairly easy to compute. The reason for this is that it depends only on the first character in the identifier. Since many programs use several identifiers with the same first letter, we expect several collisions to occur. In general, then, we would like the function to depend upon all the characters in the identifiers. In addition, we would like the hash function to be such that it does not result in a biased use of the hash table for random inputs. I.e., if *X *is an identifier chosen at random from the identifier space, then we want the probability that *f*(*X*)* = i* to be *1/b* for all buckets *i*. Then a random *X *has an equal chance of hashing into any of the *b *buckets. A hash function satisfying this property will be termed a *uniform hash function.*

Several kinds of uniform hash functions are in use. We shall describe four of these.

IDENTIFIER INTERNAL REPRESENTATION

X X X^{2}

--------------------------------------------------

A 1 1

A1 134 20420

A2 135 20711

A3 136 21204

A4 137 21501

A9 144 23420

B 2 4

C 3 11

G 7 61

DMAX 4150130 21526443617100

DMAX1 415013034 5264473522151420

AMAX 1150130 1345423617100

AMAX1 115013034 3454246522151420

(f(_{D}X) -f(_{D}Y))modp

= (64 mod 3C(x_{1})mod 3 +C(x_{2})mod 3

- 64 mod 3C(x_{2})mod 3 -C(x_{1})mod 3) mod 3

=C(x_{1})mod 3 +C(x_{2})mod 3 -C(x_{2})mod 3 -C(x_{1})mod 3

= 0 mod 3.

In order to be able to detect collisions and overflows, it is necessary to initialize the hash table, *HT, *to represent the situation when all slots are empty. Assuming that no record has identifier zero, then all slots may be initialized to zero.* When a new identifier gets hashed into a full bucket, it is necessary to find another bucket for this identifier.The simplest solution would probably be to find the closest unfilled bucket. Let us illustrate this on a 26-bucket table with one slot per bucket. Assume the identifiers to be *GA, D, A, G, L, A*2,* A*1,* A*3, *A*4, *Z, ZA, E*. For simplicity we choose the hash function *f*(*X*) = first character of *X.* Initially, all the entries in the table are zero. *f*(*GA*) = 7, this bucket is empty, so *GA *and any other information making up the record are entered into *HT*(7). *D* and *A* get entered into the buckets *HT*(4) and *HT*(1) respectively. The next identifier *G* has *f*(*G*) = 7. This slot is already used by *GA*. The next vacant slot is *HT*(8) and so *G* is entered there. *L* enters *HT*(12). *A*2 collides with *A *at *HT*(1), the bucket overflows and *A*2 is entered at the next vacant slot *HT*(2). *A*1, *A*3 and *A*4 are entered at *HT*(3), *HT*(5) and *HT*(6) respectively. *Z* is entered at *HT*(26), *ZA* at *HT*(9), (the hash table is used circularly), and *E* collides with *A*3 at *HT*(5) and is eventually entered at *HT*(10). Figure 9.17 shows the resulting table. This method of resolving overflows is known as *linear probing* or *linear open addressing*.

procedureLINSRCH(X,HT,b,j)

//search the hash tableHT(0: b -1) (each bucket has exactly 1

slot) using linear probing. IfHT(j= 0 then thej-th bucket is empty

andXcan be entered into the table. OtherwiseHT(j) =Xwhich

is already in the table.fis the hash function//

if(X);ji// compute hash address forX//

whileHT(j)XandHT(j) 0do

j(j+ 1)modb//treat the table as circular//

ifj=ithencallTABLE_FULL//no empty slots//

end

endLINSRCH

One of the problems with linear open addressing is that it tends to create clusters of identifiers. Moreover, these clusters tend to merge as more identifiers are entered, leading to bigger clusters. Some improvement in the growth of clusters and hence in the average number of probes needed for searching can be obtained by *quadratic probing*. Linear probing was characterized by searching the buckets (*f*(*X*) *+ i*) mod *b*; 0 *i* *b* - 1 where *b* is the number of buckets in the table. In quadratic probing, a quadratic function of i is used as the increment. In particular, the search is carried out by examining buckets *f*(*X*), (*f*(*X*) *+ i ^{2}*) mod

Prime j Prime j

-------------------------

3 0 43 10

7 1 59 14

11 2 127 31

19 4 251 62

23 5 503 125

31 7 1019 254

One of the reasons linear probing and its variations perform poorly is that searching for an identifier involves comparison of identifiers with different hash values. In the hash table of figure 9.17, for instance, searching for the identifier *ZA* involved comparisons with the buckets *HT*(1) to *HT*(8), even though none of the identifiers in these buckets had a collision with *HT*(26) and so could not possibly be *ZA*. ManäÂÑãÒøÔŒµß“EäÂÑãÒøÔŒµß made could be saved if we maintained lists of identifiers, one list per bucket, each list containing all the synonyms for that bucket. If this were done, a search would then involve computing the hash address *f*(*X*) and examining only those identifiers in the list for *f*(*X*). Since the sizes of these lists is not known in advance, the best way to maintain them is as linked chains. Additional space for a link is required in each slot. Each chain will have a head node. The head node, however, will usually be much smaller than the other nodes since it has to retain only a link. Since the lists are to be accessed at random, the head nodes should be sequential. We assume they are numbered 1 to *M* if the hash function *f* has range 1 to *M*.

procedureCHNSRCH(X,HT,b,j)

//search the hash tableHT(0:b- 1) forX. EitherHT(i) = 0 or it

is a pointer to the list of identifiersX:f(X) =i. List nodes have

fieldIDENTandLINK. Eitherjpoints to the node already containing

Xorj= 0//

jHT(f(X))//compute head node address//

//search the chain starting atj//

whilej0andIDENT(j)Xdo

jLINK(j)

end

endCHNSRCH

**Theorem 9.1 **Let =* n/b* be the loading density of a hash table using a uniform hashing function *f.* Then:

(i) for linear open addressing

(ii) for rehashing, random probing and quadratic probing

The *O*(*n*^{2}) optimum binary search tree algorithm is from:

"Optimum Binary Search Trees" by D. Knuth, *Acta Informatica,* vol. 1, Fasc 1, 1971, pp. 14-25.

For a discussion of heuristics that obtain in *O*(*n*log*n*) time nearly optimal binary search trees see:

"Nearly Optimal Binary Search Trees," by K. Melhorn, *Acta Informatica,* vol. 5, pp. 287-295, 1975.

Additional algorithms to manipulate AVL trees may be found in:

Results of an empirical study on height balanced trees appear in:

Several interesting and enlightening works on hash tables exist. Some of these are:

"Scatter storage techniques" by R. Morris, *CACM,* vol. 11, no. 1, January 1968, pp. 38-44.

A method to avoid hash table initialization may be found in:

"Algorithms for sets and related problems," by T. Gonzalez, University of Oklahoma, Nov. 1975.

**1.** (a) Prove by induction that if T is a binary tree with n internal nodes, I its internal path length and E its external path length, then E = I + 2n, n 0.

**2.** (a) Show that algorithm HUFFMAN correctly generates a binary tree of minimal weighted external path length.

**3.** Using algorithm OBST compute *w _{ij}*,

**4. **(a) Show that the computing time of algorithm OBST is *O*(*n*^{2}).

**5. **Since, often, only the approximate values of the *p*'s and *q*'s are known, it is perhaps just as meaningful to find a binary search tree that is nearly optimal i.e. its cost, eq. (9.1), is almost minimal for the given *p*'s and *q*'s. This exercise explores an *O*(*n*log*n*) algorithm that results in nearly optimal binary search trees. The search tree heuristic we shall study is:

An analysis of the performance of this heuristic may be found in the paper by Melhorn.

**6. **(a) Convince yourself that Figures 9.11 and 9.12 take care of all the possible situations that may arise when a height balanced binary tree becomes unbalanced as a result of an insertion. Alternately come up with an example that is not covered by any of the cases in Figures 9.11 and 9.12.

(b) Complete Figure 9.12 by drawing the tree configurations for the rotations *RL* (a), (b) and (c).

**7. **Complete algorithm AVL--INSERT by filling in the steps needed to rebalance the tree in case the imbalance was of type *RL*.

**8. **Obtain the height balanced trees corresponding to those of Figure 9.10 using algorithm AVL--INSERT, starting with an empty tree, on the following sequence of insertions:

DECEMBER, JANUARY, APRIL, MARCH, JULY, AUGUST, OCTOBER, FEBRUARY, NOVEMBER, MAY, JUNE.

Label the rotations according to type.

**9. **Assume that each node in an AVL tree *T* has the field LSIZE. For any node, *A*, LSIZE(*A*) is the number of nodes in its left subtree +1. Write an algorithm AVL--FINDK(*T,k*) to locate the k^{th} smallest identifier in the subtree *T*. Show that this can be done in *O*(log *n*) time if there are *n* nodes in *T*.

**10. **Rewrite algorithm AVL--INSERT with the added assumption that each node has a LSIZE field as in exercise 9. Show that the insertion time remains *O*(log *n*).

**11. **Write an algorithm to list the nodes of an AVL-tree *T* in ascending order of IDENT fields. Show that this can be done in *O*(*n*) time if *T* has *n* nodes.

**12. **Write an algorithm to delete the node with identifier *X* from an AVL-tree *T*. The resulting tree should be restructured if necessary. Show that the time required for this is *O*(log* n*) when there are *n* nodes in *T*.

**13. **Do exercise 12 for the case when each node has a LSIZE field and the *k ^{th }*smallest identifier is to be deleted.

**14.*** *Write an algorithm to merge the nodes of the two AVL-trees

**15. **Write an algorithm to split an AVL tree, *T,* into two AVL trees *T*_{1 }and *T*_{2} such that all identifiers in *T _{1}* are

**16. **Prove by induction that the minimum number of nodes in an AVL tree of height *h* is *N _{h} = F_{h+}*2 - 1,

**17. **Write an algorithm to delete identifier *X *from a hash table which uses hash function *f *and linear open addressing to resolve collisions. Show that simply setting the slot previously occupied by *X *to 0 does not solve the problem. How must algorithm LINSRCH be modified so that a correct search is made in the situation when deletions are permitted? Where can a new identifier be inserted?

**18. **(i) Show that if quadratic searching is carried out in the sequence (*f*(*x*) + *q*^{2}), f(x) + (q - 1)^{2}),...,(*f*(*x*) + 1),*f*(*x*),(*f*(*x*) - 1), ...,(*f*(*x*) - *q*^{2}) with *q* = (*b -* 1)12 then the address difference mod *b *between successive buckets being examined is

*b* - 2, *b* - 4, *b -* 6, ...,5, 3, 1, 1, 3, 5, ...,*b -* 6, *b *- 4, *b *- 2

**19. **[Morris 1968] In random probing, the search for an identifier, *X*, in a hash table with *b* buckets is carried out by examining buckets *f*(*x*), *f*(*x*) + *S*(*i*), 1 i *b *- 1 where* S*(*i*) is a pseudo random number. The random number generator must satisfy the property that every number from 1 to *b* - 1 must be generated exactly once. (i) Show that for a table of size 2* ^{r}*, the following sequence of computations generates numbers with this property:

Initialize *R* to 1 each time the search routine is called.

On successive calls for a random number do the following:

**20. **Write an algorithm to list all the identifiers in a hash table in lexicographic order. Assume the hash function *f* is *f*(*X*) = first character of *X* and linear probing is used. How much time does your algorithm take?

**21. **Let the binary representation of identifier *X* be *x*_{1}*x*_{2}. Let | *x* | denote the number of bits in *x* and let the first bit of x_{1} be 1. Let | x_{1 }| = | x | /2 and | x_{2 }| = | x | /2 . Consider the following hash function

*f*(*X*) = middle *k* bits of (*x*_{1} *XOR x*_{2})

**22.** [T. Gonzalez] Design a symbol table representation which allows one to search, insert and delete an identifier *X *in *O*(1) time. Assume that 1 *X* *m* and that *m + n* units of space are available where *n* is the number of insertions to be made (Hint: use two arrays *A*(1:*n*) and *B*(1:*m*) where *A*(*i*) will be the *i*th identifier inserted into the table. If *X* is the *i*th identifier inserted then *B*(*X*)* = i*.). Write algorithms to search, insert and delete identifiers. Note that you cannot initialize either *A* or *B* to zero as this would take *O*(*n + m*) time. Note that *X* is an integer.

**23.** [T. Gonzalez] Let *S = *{*x*_{1},*x*_{2},* ...*,*x _{n}*} and

**24.** [T. Gonzalez] Using the idea of exercise 22 write an *O*(*n + m*) time algorithm to carry out the function of algorithm VERIFY2 of section 7.1. How much space does your algorithm need?

**25. **Complete table 9.12 by adding a column for hashing.

**26. **For a fixed *k, k * 1 we define a height balanced tree *HB*(*k*) as below:

For the case of *HB*(2) trees obtain the rebalancing transformations.

**27. **Write an insertion algorithm for *HB*(2) trees.

**28. **Using the notation of section 9.3.3 show that when linear open addressing is used:

Using this equation and the approximate equality:

**29.** [Guttag] The following set of operations define a symbol table which handles a language with block structure. Give a set of axioms for these operations:

*ENTERB*--indicates a new block has been entered;

*ADD*--places an identifier and its attributes in the table;

*LEAVEB*--deletes all identifiers which are defined in the innermost block;

*RETRIEVE*--returns the attributes of the most recently defined identifier;

*ISINB*--returns true if the identifier is defined in the innermost block else false.