# CHAPTER 8: BINARY TREES

As an example, if Faye has children Alia, March, and Ralph, two of whom have children, their family relationship may be described by Figure 8.1. The notation associated with such structures is fairly comfortable for anyone conversant witj both families and botany. Faye is the root node of a tree, Faye's Family. The tree rooted at Faye has three braches, and they are called subtrees. The subtree rooted at Alia has two branches, at Ralph one, and at March none. Wrye, Jan, March, and Will are leaf nodes of Faye's Family and of its substrees. All other nodes are interior nodes. Alia and Faye are both ancestors of Wrye and Jan.

#### Figure 8.1

In the literature, Faye is commonly called the father of March, Alia, and Ralph, and these three are the sons of Faye. (It is perfectly acceptable to use mother and daughter, as in done for cells in biology.) Wrye and Jan are siblings, often called brothers. Faye is the grandfather of Jan.

Faye's Family can be modeled by a data structure with links representing the directed access from a node to its children. The many applications for general tree structures are limited by their own generality: the nodes of a general tree might need to have any number of links. In some applications, such as representations of family trees, it may not always be possible to foretell how many links are enough. Even if a maximum is imposed, most of the links of most nodes may be unused.

#### Figure 8.2

A useful compromise that contains a one-to many relationship and yet controls the proliferation of children is the binary tree. A binary tree may have a left child and a right child, which is not the same as having two children. The trees in Figure 8.2 are equivalent as trees, but not equivalent as binary tree. A binary trees T1 T2 because their left children (and redundantly their right children) differ. Will may be implemented as either the left child or the right child of Ralph, giving rise to two distinct binary trees, shown in Figure 8.3.

#### Figure 8.3

General trees with more than two children per node can be transformed into binary trees by the rule:

Order the children of a node 0 of tree T to obtain children c1,c2, . . .,ck. Then c1 becomes the left child of N. If N is the (i - 1)st child of a node P and there is an ith child of P, then that ith child of P becomes the right child of N.

#### Figure 8.4

Note: The distinction between binary trees and trees is important to recognize, but it is common to ignore the distinction in situations where it is not significant.

One advantage that tree-like structures have over linear linked lists is that, on the average, fewer links need to be processed to get from one node to another (assuming, of course, that the choice of which branch to take can be determined). The binary tree BT1 in Figure 8.4 has given up some of the advantage inherent in the original tree because there are more links between the root and some leaves than there were. That advantage can be regained by balancing the binary tree so that the distances from root to leaves tend to be as near the minimum as the application allows. In this particular case, rearranging the binary tree would lose the information modeled by it; for some applications, however, balancing is an advantage.

We will return to the issue of balancing in section 8.7--after examining general features of binary trees in section 8.1, their implementation in section 8.2, finding our way around them in sections 8.3 and 8.4, ordering them in section 8.5, and applying them in section 8.6. Section 8.8 exhibits one more way to walk around a tree.

Exercises E8.1 and E8.2 are appropriate at this point.

## 8.1.1 Operations

The nature of the structure BINARY TREE is determined by the operations applied to it. Binary trees can be empty, and they remain empty until a node is added to them; and so their locator, root, has an appropriate initial value: NIL or u (or zero for array implementations). The genesis of a binary tree is an operation formally called CREATE.

Deletions from a binary tree can remove all of its nodes, and so root may be dynamically set to NIL or u (or zero for an array implementation). The formal name of the operation that checks the viability of a binary tree is EMPTY.

A node may be inserted into a binary tree at any link that is NIL. Such a node (any node!) is itself the root node of a binary tree. If the attached binary tree is a singleton, it becomes a leaf node. The formal operation involved is INSERT, but the question of where to insert depends heavily on the application. INSERT can be universal only if it inserts a subtree as a given child of a given node, replacing whatever was there.

The left child or right child of a given node may be set to NIL, which detaches the binary tree rooted at that child and hence partitions the binary tree into two binary trees. Note that--like attachment--partitioning deals with a subtree, not necessarily a single node. Deletion of a single node, like insertion of a single node, is application-dependent and of considerable interest in the management of binary trees.

In distinction from stacks and queues, all nodes of a binary tree are accessible by following links from the root node. The ways a binary tree can be systematically traversed to visit every node are treated in sections 8.3 and 8.4. In many applications information is available to make a choice at each node; there is a guide to following a link to the left or to the right. When this is possible, the process of locating a node can be much faster than for a general traverse. The efficiency of this guided branching is one major advantage of trees over both linear structures and more general structures (such as graphs).

Because every node in a binary tree can be located, values can be retrieved from and assigned to every node. These operations do not change the structure, they update it.

## 8.1.2 Node Levels and Counts

The root of a binary tree is at level zero and a child of a node is at the next higher level. The depth of a binary tree is the maximum level of any node.

For example, in BT1,

Faye is at level 0.

Alia is at level 1.

Wrye and March are at level 2.

Jan and Ralph are at level 3.

Will is at level 4.

#### Figure 8.5

The maximum number of nodes at level 0 in a binary tree is 1, at level 1 is 2, at level 2 is 2 x 2 = 4, and so on. In general:

There are no more than 2L nodes at level L in a binary tree.

There are no more than 2d+1 - 1 nodes in a binary tree of depth d.

(Both results can be proved by easy induction arguments.)

#### Figure 8.6

Every time a new internal node replaces an external node, it increases both the number of internal nodes and the number of external nodes by 1. Hence, by induction, a binary tree with n internal nodes has n + 1 external nodes.

The averages of the internal and external path lengths provide a measure of the average number of links needed to reach nodes during searches from the root. For BT3 these are:

A binary tree is full if, when it is extended, each internal node is either a leaf or has two internal descendents, not one. A binary tree is complete if all leaf nodes lie on the lowest levels and those on the bottom level are as far left as possible. Figure 8.7 shows an example of each. A complete binary tree has minimal depth for the number of nodes it contains.

#### Figure 8.7

Binary tree BT4 is full but not complete, and BT5 is complete.

If there are N nodes in a binary tree of minimal depth, then the path from the root to a leaf has no more than INT(ln N) links. The potential minimal cost of a search for a node is that of a binary search. In practice, the information required to know which way to branch at a node during a search is incorporated into the shape of the binary tree and tends to increase its depth.

Exercise E8.3 and problem P8.1 are appropriate at this point.

## 8.1.3 Shape

The two binary trees in Figure 8.8 are both constructed so that their node values can be retrieved in nondecreasing order with the same standard algorithm (InOrder; see section 8.3):

#### Figure 8.8

The location of the value 2 by a standard search algorithm (BSTT; see section 8.5.1) applied to BT7 takes a third as many link-hops as it does for BT6. If the values in these trees were arranged at random, the expected number of hops would be intermediate between these extremes. An insertion algorithm guaranteed to produce a more efficient binary tree form is itself less efficient than one that simply builds a tree suitable for searching. The compromise between forcing the shape of a binary tree and losing the speed of location within it is a central issue in section 8.7.

## 8.1.4 On the Use of Binary Trees

Trees are a naturally occurring pattern that can be modeled closely by data structures. Their fundamental feature is branching. Branching is an essential part of language and thought--of any conceptual framework that involves choices. The binary tree reduces branching to its most elementary form, two-way branching. Each branch that is taken in a binary tree also represents a branch that is not taken and sharpens the focus of attention. A sequence of branches sharpens exponentially, much like binary search.

The data structure BINARY TREE is a decision tool, a separator of subsets. Ways to exhaustively traverse a binary tree are useful and sometimes necessary, but in many major applications nodes are reached by relatively short sequences of branches. BINARY TREE provides guided access to an arbitrary node in the structure. It is the data structure of choice when such guidance can be built into the structure.

## 8.2 Implementations of Binary Trees

When binary trees are implemented with pointer-linked nodes, the nodes are constructed with three fields:

A binary tree rooted at node btLeft is linked as the left child of a node bt simply by bt .Left btLeft, shown in Figure 8.9. Linking to a right child is similar.

## 8.2.1 Array Support and the Heapform

#### Figure 8.10

A binary tree may be implemented in a single array A[0 . . Max] in such a way that no explicit links are needed:

The left child of node A[i] is A[2i].

The right child of node A[i] is A[2i + 1].

Note that Max must allow enough room for the rightmost leaf of the structure.

When a complete binary tree of n nodes is stored in an array in heapform, it occupies the span [1 . . n], a very compact, convenient, and efficient arrangement.

Exercise E8.4 and problem P8.2 are appropriate at this point.

## 8.3 Recursive Binary Tree Traverses

Three kinds of operations are required to visit and process the nodes of a binary tree:

a move to the left child

a move to the right child

processing a node in some manner

It can be advantageous to process a node when:

It is first reached.

Its left child (subtree) has been processed.

Its right child (subtree) has been processed.

It is possible to mix these six things in a number of ways, but two conventions are used to reduce the number of possibilities:

The left subtree of a node is visited before its right subtree.

A node is processed only once during a traverse.

`procedure PreOrder(bt)          OR    procedure PreOrder(bt)    {O(n)`

`if bt = NIL                           if bt  NIL`

`then return endif                      then PROCESS1(bt)`

`PROCESS1(bt)                                  PreOrder(bt  .Left)`

`PreOrder(bt  .Left)                          PreOrder(bt  .Right)`

`PreOrder(bt  .Right)                    endif`

`end {PreOrder                         end {PreOrder`

`procedure InOrder(bt)                                {O(n)`

`if bt = NIL then return endif`

`InOrder(bt  .Left)`

`PROCESS2(bt)`

`InOrder(bt  .Right)`

`end {InOrder`

`procedure PostOrder(bt)                              {O(n)`

`if bt = NIL then return endif`

`PostOrder(bt  .Left)`

`PostOrder(bt  .Right)`

`PROCESS3(bt)`

`end {PostOrder`

These may be combined into one:

`RBTT`

`procedure RBTT(bt)               {O(n)`

`if bt = NIL then return endif`

`PROCESS1(bt)`

`RBTT(bt  .Left)               {Recursive Binary Tree Traverse`

`PROCESS2(bt)`

`RBTT(bt  .Right)`

`PROCESS3(bt)`

`end {RBTT`

Consider the binary tree shown in Figure 8.11. Any traverse visits A (and processes it or not), then B (and processes it or not), then C, then B again, and so on. Each node is visited three times and processed on (at least) one of its visits.

#### Figure 8.11

`A  B  C  C  C  B  D  E  E  E  D  F  F  F  D  B  A  G  G  G  A`

`1  1  1  2  3  2  1  1  2  3  2  1  2  3  3  3  2  1  2  3  3`

The three standard traverses select the first, second, or third visit to a node in the visitation sequence for processing. Suppose, for example, that PROCESSi is simply a display of the node value. Then traverses of BT8 produce:

`Preorder: A  B  C  D  E  F  G`

`Inorder: C  B  E  D  F  A  G`

`Postorder: C  E  F  D  B  G  A`

Program PG8.1 is appropriate at this point.

## 8.3.1 A Recursive Tree Display

#### Figure 8.12

Display 2 is an application of the "breadth-first" traverse of section 8.4.3. (The values are displaced from the center of the display-line by an amount determined by their depth. There must be enough spread to accommodate the bottom line, which may have 2d items in a tree of depth d.)

Display 1 can be done with a recursive traverse that violates the left-first rule. The version that follows prints a string that includes the node value and prints blank spaces or the node value at each level. It is initially called with d = 0:

`procedure SideView(d,bt)                             {O(n)`

`if bt  NIL`

`then SideView (d + 1,bt  .Right)`

`Writeln({d blank strings},bt  .Value)`

`SideView(d + 1,bt  .Left)`

`endif`

`end {SideView`

## 8.3.2 Traversals of Arithmetic-Expression Trees

#### Figure 8.13

The postorder traverse of BT9 produces

`C E F + * G /`

which, when the usual precedence is applied, represents the postfix form of the infix expression:

`C * (E + F) / G`

The preorder traverse produces:

`/ * C + E F G`

the prefix form of the same expression. If we think of an operation as adding left and right parentheses around its results, the inorder traverse of BT9 produces:

`((C * (E + F)) / G)`

To be sure, producing an expression with correct precedence requires either additional processing or additional (parenthesis) nodes, but the order of the tokens is that of the infix expression. The parenthesized expression can be generated by processing an operation node three times: on contact, display "("; on return from the left subchild, display the operation; and on return from the right child, display ")".

Exercise E8.5 and problem P8.3 are appropriate at this point.

## 8.3.3. A Copy Procedure (optional)

One fundamental tool that can be applied to a binary tree is a procedure to make a copy of it. The recursive version is particularly apt, because it only requires that a copy be made of both the left and right subtrees and that they be joined:

`function TreeCopy(bt)`

`if bt = NIL then return NIL endif`

`NEW( p)`

`p   bt `

`p  .Left  TreeCopy(bt  .Left)`

`p  .Right  TreeCopy(bt  .Right)`

`return p`

`end {TreeCopy`

## 8.4 Iterative Tree Walks

Explicit backtracking is usually done by stacking node pointers (indices in an array implementation) and then unstacking them to make a return. Explicitly programmed stacks contain only the necessary pointers, whereas a recursive routine is compiled so that the entire routine can be reentered. Additional information is stacked to make that possible.

A binary tree can be traversed by stacking each node when it is first encountered, and then again just prior to traversing each of its subtrees. Each node is then stacked and unstacked three times. The order of stacking is arranged so that when a subtree traverse is complete, the parent node will be unstacked next. Some way is needed to distinguish between three encounters with a node during the traverse: the initial contact and two subtree returns. Processing at these three encounters corresponds precisely to preorder, inorder, and postorder processing. All three processing modes can be combined in one algorithm, or done with walks tuned for them individually.

## 8.4.1 A General Binary Tree Walk: BTT

For the algorithm that follows, nodes of a tree structure are assumed to contain an integer field, Visit, which counts the encounters of the walk with the node:

`VisitNode = RECORD`

`Value  : {value type`

`Left   :  VisitNode`

`Right  :  VisitNode`

`Visit  : INTEGER`

`END`

The Visit field of each node is assumed to contain 1 at the beginning of the walk and is left that way on exit. An explicit stack, NodeStack, replaces the implicit stack of the recursive procedures. NodeStack is a stack in which the Key field of a node contains a pointer to a tree node. NodeStack may be implemented with records, but it is less obtrusive to think of it as array-based so that Pop and Push take tree node pointers as arguments without the need to encase those pointers in stack nodes.

`BTT`

`procedure BTT(bt)                                    {O(n)`

`if bt = NIL then return endif`

`Push(bt)`

`while NOT Empty(NodeStack) do                      {Binary Tree Traverse  `

`Pop(p)`

`case of p  .Visit`

`1 : PROCESS1(p)                               {Prefix processing`

`p  .Visit  2`

`Push(p)`

`if p  .Left  NIL then Push (p  .Left)  endif`

`2 : PROCESS2(p)                               {Infix processing`

`p  .Visit  3`

`Push (p)`

`if p  .Right  NIL then Push (p  .Right)  endif`

`3 : PROCESS3(p)                               {Postfix processing`

`p  .Visit  1`

`endcase`

`endwhile`

`end {BTT`

BTT can be used as a preorder listing procedure by setting processes 2 and 3 to and using PROCESS1 as a display. The operation of the stack during such a walk can be traced by indicating a snapshot of it each time endcase is reached. For a preorder treewalk on BT8:

`stack           output`

`A   B              A`

`A   B   C          B`

`A   B   C          C`

`A   B   C`

`A   B`

`A   B   D`

`A   B   D   E      D`

`A   B   D   E      E`

`A   B   D   E`

`A   B   D`

`A   B   D   F`

`A   B   D   F      F`

`A   B   D   F`

`A   B   D`

`A   B`

`A`

`A   G`

`A   G`

`A   G       G`

`A`

This causes more stack activity than would a tree-walk algorithm with a planned restriction to preorder processing.

Problems P8.4 and P8.5 and program PG8.2 are appropriate at this point.

## 8.4.2 Individualized Iterative Tree Walks (optional)

BTT works the stack heavily: every node is popped from it three times. A quite different approach is based on the statement:

A tree walk is made by stepping left as far as possible, then taking a step to the right.

For the preorder display, the form is:

`procedure PreStack(bt)                               {O(n)`

`if bt = NIL then return endif`

`p  bt`

`repeat`

`while p  NIL do`

`PROCESS1(p)`

`Push(p)`

`p  p  .Left`

`endwhile`

`if Empty(NodeStack)`

`then return`

`else Pop(p)`

`p  p  .Right`

`endif`

`forever`

`end {PreStack`

The corresponding inorder walk is derived from PreStack simply by moving the processing position. The adaptation of this kind of walk to postorder processing involves reintroducing a Visit, although only two values are required. Because only two values are required for Visit, in an array implementation the (integer) pointers may simply be negated to indicate a previously visited node.

`procedure PostStack(bt)                              {O(n)`

`if bt = NIL then return endif`

`p  bt`

`repeat`

`while p  NIL do`

`Push(p)`

`p  p  .Left`

`endwhile`

`Pop(p)`

`while p  .Visit = 3 do`

`PROCESS3(p)`

`p . Visit  1`

`if Empty(NodeStack) then return endif`

`Pop(p)`

`endwhile`

`p  .Visit  3`

`Push(p)`

`p  p  .Right`

`forever`

`end {PostStack`

Exercises E8.6 and E8.7 are appropriate at this point.

## 8.4.3 A Breadth-First Traverse (optional)

The binary tree traverses of sections 8.3, 8.4.1, and 8.4.2 all move from the root of a tree first to the deepest level of the leftmost path or the rightmost path (in SideView), then move to siblings on the same level. This is called depth-first traversal. An alternate class of traverses that may be applied to trees in general as well as binary trees is a breadth-first traversal.

With the help of a queue, Q, and the queue operations applicable to it, InsertRear and DeleteFront, it is possible to process nodes in the order of their levels, left-to-right within each level.

A node p of Q consists of a pointer p .ptr to a tree node, and a queue-link field. The insertion of a tree node t into Q in bare outline is:

`procedure Insert(Q,t)                                {O(1)`

`NEW(p)`

`p  .ptr  t`

`InsertRear(Q,p)`

`end {Insert`

The level order traverse is then:

`procedure LevelOrder(BinaryTree,Q)                   {O(n)`

`Insert(Q,BinaryTree)`

`while NOT Empty(Q) do`

`p  Front(Q)`

`t  p  .ptr`

`DeleteFront(Q)`

`PROCESS(t)`

`if t  .Left  NIL`

`then Insert(Q,t  .Left)`

`endif`

`if t  .Right  NIL`

`then Insert(Q,t  .Right)`

`endif`

`endwhile`

`end {LevelOrder`

When PROCESS(t) is a display routine, LevelOrder is the basis of a top-down display of a binary tree, in contrast to the sideways display of SideView (section 8.3.1). A display of this kind requires that the queue nodes contain information about both horizontal and vertical positioning. It is an interesting but reasonable challenge to adapt LevelOrder to that task.

Program PG8.3 is appropriate at this time.

The traversal of LevelOrder technique is generalized to use a priority queue and applied to graph structures in Chapter 10.

## 8.5 Ordered Binary Trees

`p  .Left  st     or     p .Right  st`

Applying these statements with st = NIL removes a subtree without regard to what it contains. Thus, the fundamental issue in most applications is what to do with the other nodes of a subtree when a single node p is added or deleted at its root. Both subtrees of p cannot be used as direct replacements for p when it is deleted, and so decisions about what to do with a subtree involve the relationship of the subtree nodes to the rest of the structure. Such relationships generally exhibit some structural parameter, a restriction on the placement of nodes in the binary tree. In particular, binary trees may be ordered.

A binary tree bt may be constructed so that it has property order:

If p is any node of bt, no node in the left subtree (rooted at p .Left) has a value that is as large as p .Value, and no node in the right subtree (rooted at p .Right) has a value that is as small as p .Value. This imposes the restriction that no value be duplicated.

## 8.5.1 A Binary Search Tree Traverse: BSTT

BT10 in Figure 8.14 is a BST but would not be one if the value 15 were replaced by 25 because the defining condition would be violated for the root node.

#### Figure 8.14

Note that an inorder traverse of BT10 returns the values in the defining order. (This statement can be used as a definition of a BST.) The creation of a BST is thus a sorting method. Without control of the depth of the tree, it is an O(n2) procedure.

The values of a BST do not, of course, need to be numeric. Any operator that has the mathematical property of transitivity (A.op.B and B.op.C implies A.op.C), together with the values on which it operates, will do for the formation of a BST.)

The traverse of a BST is greatly simplified because no backtracking is required--the order property guides a search directly to the appropriate node (which may be external). It is:

`BSTT`

`procedure BSTT(bt,v)                   {O(n)`

`PROCESS0`

`p  bt                              {BST Traverse`

`while p  NIL do`

`case`

`p  .Value = v : PROCESSe`

`return`

`p  .Value < v : PROCESS1`

`p  p  .Left`

`p  .Value > v : PROCESSr`

`p  p  .Right`

`endcase`

`endwhile`

`PROCESSx`

`end {BSTT`

## 8.5.2 Insertion and Deletion in a BST

In a BST, the insertion and deletion of nodes must be done in a way that maintains the structural parameter order. Insertion replaces an external node of the extended tree and only involves finding the appropriate parent and which child of the parent the inserted node is to become. Deletion may be of an interior node and involve shifting subtrees. The first task of either is to determine where in the binary tree a node belongs and if that spot is occupied.

Three possibilities need to be determined:

The value is already in the BST.

The value is (or would become) the left child of node Pred (if inserted).

The value is (or would become) the right child of a node Pred (if inserted).

The procedure given here finds the value v in the binary search tree bt. It returns pointer p to the node in bt with value v, the predecessor Pred of p, and a signal Trio. Trio is zero or negative if v was found to be the value of an internal node, and positive if it was not found. Zero values of Trio can occur in two ways: when bt = NIL or when v was found to be the value of the root node. For a non-NIL bt, Trio is 1 if p is the left child of Pred, and 2 if it is the right child.

This value is derived from BSTT by:

`PROCESS0:  Trio  0`

`Pred  NIL`

`PROCESSe:  Trio  -Trio`

`PROCESS1:  Trio  1`

`Pred  p`

`PROCESSr:  Trio  2`

`Pred  p`

`PROCESSx:  NULL`

The resulting procedure is:

`procedure NodeSpot(bt,v,Trio,p,Pred)   {O(depth)`

`Pred  NIL`

`Trio  0`

`p  bt`

`while p  NIL do`

`if v = p  .Value                 {v is an internal value and`

`then Trio  -Trio              {Trio will not be positive`

`return`

`endif`

`Pred  p`

`if v < p  .Value`

`then Trio  1`

`p  p  .Left`

`else Trio  2`

`p  p  .Right`

`endif`

`endwhile`

`end {NodeSpot`

NodeSpot is O(depth) for a binary tree of n nodes because it does not need to traverse the structure. The order property allows the algorithm to dispense with backtracking.

The insertion of a non-NIL node n with NIL child links into a BST bt is then done by first invoking NodeSpot(bt,n .Value,Trio,p,Pred), followed by:

`procedure InsertNode(bt,n,Pred,Trio)    {0(1)`

`if bt = NIL                           {create a new BST from n`

`then bt  n`

`return`

`endif`

`if Trio  0                           {v is already present. This`

`then return endif                   {could be treated as an error.`

`if Trio = 1`

`then Pred  .Left  n`

`else Pred  .Right  n             {Trio = 2`

`endif`

`end {InsertNode`

For some applications, it is convenient to allow the insertion of duplicate values into an ordered tree. Insertion of duplicate values is not so straightforward as InsertNode, and deletion is much more complex. For example, equal values may be inserted either randomly or consistently to the left or right, but deleting all occurrences of a value presents a problem. In any case, the resulting tree is partially ordered, not ordered.

A deletion procedure must deal with a number of cases. Problem P8.6 asks the reader to try to display all of the cases graphically. It is worthwhile to do so before reading the algorithm given here.

Problem P8.6 is appropriate at this point.

`procedure CutValue(bt,v)               {O(depth)`

`if bt = NIL`

`then {deal with this error.`

`endif`

`NodeSpot(bt,v,Trio,p,Pred)`

`if Trio > 0                          {v was not found in bt.`

`then {deal with this error.`

`endif`

`TackOn  p .Right                   {TackOn is the subtree that is`

` {to replace p as a Pred child.`

`if p  .Left  NIL`

`then TackOn  p  .Left`

`if p  .Right  NIL`

`then NodeSpot(p  .Left,p  .Right  .Value,Top3,s,sp)`

`InsertNode(p  .Left,p  .Right,sp,Top3)`

`endif`

`endif`

`if Pred = NIL                        {the root node must go.`

`then bt  TackOn`

`else Trio  - Trio                {Specify which Pred child.`

`InsertNode(bt,TackOn,Pred,Trio)`

`endif`

`end {CutValue`

Exercise E8.8 and program PG8.4 are appropriate at this point.

## 8.5.3 Static Tree Files (optional)

A dynamic file is one that grows and shrinks and changes shape. The shape is normally controlled by maintaining balance, the topic of section 8.7. The benefits of controlling the depth of a file directory can be great, when costs are amortized over a sequence of operations, but the price for maintaining balance lies in the complexity of the insertion and deletion routines. More complex management requires more programming and debugging time and generally more overhead: more run-time with each use of the routines.

A static file is created with a shape that remains fixed. The shape of a binary tree that is the directory to a static file can then be optimized for efficient searching at the time of its creation. The optimum shape is normally determined by the keys of the file--it is data-directed.

#### Figure 8.15

Program PG8.5 asks for the construction and use of the entire Morse code tree.

The Morse code is a cryptic form of trie (pronounced "try," not "tree"). Most tries use the sequence of letters in a character string to determine which way to branch at a node. Arrival at a leaf node completes the trace of the spelling of a key, and generally uncovers a pointer to an appropriate record. The nodes, however, necessarily have more than two children and so will be reintroduced in Chapter 9.

One way to optimize a static binary search tree involves constructing a CLT with these steps:

1. Determine the probability (relative frequency) with which each key is searched.

3. Construct a CLT from the codes.

A code string of 0's and 1's then leads to a message (a key) at a leaf node.

## 8.6 Heaps

One balance condition that was introduced in section 8.1.2 can be rephrased as:

A binary tree is complete if the nodes of every level in it except possibly the last one have two children, and the nodes of the last level are as far to the left as possible.

One specific structural parameter that determines a subset of the set of complete binary trees is stated as:

A binary tree is a heap if it is complete and if the value of each node is at least as large as the value of its children.

This subset proves to be very efficient and easy to use in many applications.

The natural way to implement a heap is in an array in heapform (see section 8.2.1), where the child of a node in A[i] is either A[2i] or A[2i + 1]. Heaps are not often implemented in linked form because maintaining the balance condition requires additional information in each node.

If a heap is accessed in specified ways, it becomes one support structure for a priority queue, discussed in section 8.6.2 and applied in Chapter 10.

The heap formed from a set of values is not unique. Faye's Family can be configured as a heap in several ways; one, shown in Figure 8.16, is formed by adding values to an initially empty heap in the order: Faye, Alia, March, Ralph, Wrye, Jan, Will.

#### Figure 8.16

In contrast, Figure 8.17 shows the heap when it is constructed by adding names in alphabetical order.

#### Figure 8.17

Exercise E8.9 and problem P8.7 are appropriate at this point.

#### Figure 8.18

Note: It is either the heapform-heap or the priority queue that is usually meant by "heap."

## Migration

If a heap contains k - 1 items in A[1 . . k - 1], the value in A[k] may destroy the heap condition if it is included in the structure. If it is allowed to migrate upward, the heap condition can be restored:

`procedure UpHeap(k)                                  {O(ln k)`

`while k  2 do`

`p  INT(k/2)`

`if A[k] > A[p]`

`then Temp  A[k]`

`A[k]  A[p]`

`A[p]  Temp`

`k  p`

`else exit`

`endif`

`endwhile`

`end {UpHeap`

(This routine and those in the rest of this section are best understood by tracing an example.)

Exercise E8.10 is appropriate at this point.

## Heap Insertion

A value can be inserted into the heap in A[1 . .n] (supported in A[1 . . Max]), by inserting it at A[n + 1] and reheaping with UpHeap:

`procedure InsertHeap(v)                              {O(1n n)`

`if n  Max`

`then {deal with this error`

`endif`

`n  n + 1`

`A[n]  v`

`UpHeap(n)`

`end {InsertHeap`

The insertion of a value, which may destroy the heap condition, followed by restitution of it, is a common feature of heap operations.

## Heap Creation

The level-seeking of UpHeap can be used to heap an arbitrary array of values A[1 . . n].

`procedure MakeHeap                                   {O(n ln n)`

`for k = 2 to n do`

`UpHeap(k)`

`next k`

`end {MakeHeap`

The timing function of this procedure is T(n) = (n - 1) tk, where tk is the time required to execute UpHeap(k). The number of times the if-statement of UpHeap is executed can be no more than the maximum level in the heap, which is bounded by INT(1n n). Hence T(n) is 0(n 1n n).

One reason that heaps prove to be valuable in practice is the relative efficiency of heap operations.

## Priority Retrieval

The largest value (or smallest, or whichever value is heaped to the top) is retrieved and deleted from the heap but not the array by: switch A[1] and A[n], decrement n, and reheap.

Establishment of the heapform after this swap, however, requires that A[1] trickle down to its proper level. This downshift can be done with:

`procedure SwapDown(k)                                {O(1n n)`

`while k < n do`

`j  2 * k`

`if j > n then exit endif`

`if (j < n) AND (A[j] < A[j+1])`

`then j  j + 1`

`endif`

`if A[k] < A[j]`

`then Temp  A[k]`

`A[k]  A[j]`

`A[j]  Temp`

`endif`

`k  j`

`endwhile`

`end {SwapDown`

This process can be made somewhat more efficient by shifting values instead of swapping them, as is done in procedure DownHeap that follows. The kth item is shifted no farther than n with:

`procedure DownHeap(k,n)                              {O(ln n)`

`j  2 * k`

`Half  k`

`Temp  A[k]`

`while j  n do`

`if (j < n) AND (A[j] < A[j + 1])`

`then j  j + 1`

`endif`

`if Temp  A[j]`

`then A[Half]  Temp`

`return`

`endif`

`A[Half]  A[j]`

`Half  j`

`j  2 * j`

`endwhile`

`A[Half]  Temp`

`end {DownHeap`

Now retrieval of the value with the highest priority (the top value) becomes:

`function DeHeap                                      {O(ln n)`

`Temp  A[1]`

`A[I]  A[n]`

`A[n]  Temp`

`n  n - 1`

`DownHeap(1)`

`return A[n + 1]`

`end {DeHeap`

A heap is created by repeated use of DownHeap, applied to A[i], for i = Half, Half - 1, . . ., 1, where Half = INT(n/2).

Exercise E8.11 is appropriate at this point.

## Exogenous Heaps

An alternate approach is to use an array-based heap of pointers to data records, with the heap nodes treated as discussed in this section. In effect, this approach forms an exogenous heap; the data records do not need to contain pointers and are not linked into a tree at all. The heap of pointers is a directory to an otherwise unstructured collection of records. It should be clear that any structure in this book can be used in a like manner.

## 8.6.1 HeapSort

`procedure HeapSort(A,n)                              {O(n In n)`

`Half INT(n/2)`

`for i = Half downto 1 by -1 do`

`DownHeap(i, n)`

`next i`

`for i = n - 1 downto 1 by -1 do`

`Temp  A[i+1]`

`A[i+1]  A[1]`

`A[1]  Temp`

`DownHeap(1, i)`

`next i`

`end {HeapSort`

The largest item is placed in A[n], the next largest in A[n - 1], etc.

The pointer-variable form of HeapSort adds the root node to a linear list as it is removed. In any case, HeapSort has a destructive readout in the sense that the heap is destroyed in the process.

The importance of HeapSort is that few sorting procedures are O(n 1n n). Although QuickSort is O(n 1n n) on the average, it is O(n2) in the worst case. InsertionSort, BubbleSort, and SelectionSort are all O(n2). A sorting procedure safe and suitable for large sets of data is often not encountered in programming courses that do not have data structures as a prerequisite.

Exercise E8.12 and program PG8.6 are appropriate at this point.

## 8.6.2 Priority Queues

Create a priority queue from n items.

Insert an item.

Remove the largest item.

Replace the largest item big by v if v < big.

It can be useful to apply the following operations that are more difficult to implement efficiently:

Change the priority (size) of an item.

Delete a specified item.

Join two priority queues into one.

There are implementations in which most of these operations are O(ln n), which is one reason for applying priority queues.

Three implementations of priority queues can be based on structures introduced in previous sections:

1. An array of pointers to priority-class queues, each an instance of QUEUE--a pure queue. This implementation would likely be applied for a fixed number of classes, and most operation procedures would involve a binary search within the sorted array of k class headers. With natural algorithms, Create is O(n 1n k); Insert, Remove, and Replace are O(1n k). Change and Delete depend on a search of the individual class queues and so are O(m + 1n k), where m is the maximum size of a class queue. The operation JOIN is subject to interpretation in this context but is not likely to be as efficient as the other operations.

2. A heapform-heap. Create is O(n 1n n); Insert, Remove, and Replace are all O(1n n). Change and Delete are simply linear searches, O(n). Join consists of adding the items in heap H1 to heap H2, where n1 < n2 are their sizes. It is then O(n1 1n (n1 + n2)). A heapform-heap is the most common implementation for a priority queue.

3. A pointer-linked binary tree heap. Create is O(n 1n n). Insert and Replace are O(1n n). Change and Delete involve a binary tree search and so are O(n) at best. Delete and Remove would involve rebalancing of the binary tree if the balance condition is not relaxed. If balance is abandoned, the other operations approach O(n) instead of O(1n n). Join becomes O(m 1n n), where m = Min{n1, n2)}, n = n1 + n2, and n1 and n2 are the sizes of the heaps to be joined by insertion of the items of one heap into the other.

There are implementations other than the three above, and some can be useful:

4. The degenerate priority-class queue. This implementation has only one item in each class; it is a sorted list of items in an array.

5. A sorted linked list of items. Create is in effect InsertionSort, O(n2). Insert and Replace are O(n) but Remove is O(1). Change and Delete are essentially O(n) linear searches. Join is likely to be repeated insertion, O(m * n), where n = n1 + n2 and m = Min{n1, n2}.

6. An unsorted linked list of items. Insert becomes O(1) and hence Create is O(n). If the position of a specified item is not known, Replace and Remove are O(n) linear searches, as are Change and Delete. Join becomes a trivial O(1) operation (as are Change and Delete if a pointer to the specified item is provided).

7. An unsorted array of items. Create is not required, Insert is O(1). Replace, Remove, and Change are O(n) linear searches. Delete is an O(n) linear search followed by swapping the found item with A[n] and decrementing n. The JOIN operation is ambiguous or simply copies all of the items into some array.

## 8.6.3 On the Use of Priority Queues

A collection of items that differ in some quality used to select one of them for an application is a natural candidate for retention in a priority queue. The priority queue is the data structure that exemplifies the pair of demands: "Save this with the others . . . . Find the best one available."

For specific applications and restrictions, some of the data structures discussed in previous sections and in Chapters 9 and 10 are ideal. For the general situation where items are to be stored and retained in some order determined by their own qualities, the priority queue is the structure of choice. Above all, a priority queue is a structure of general utility and great flexibility.

Priority queues are applied to the construction of Huffman codes in Part II, section Q, and used in the general traverse of a graph in Chapter 10.

## 8.7 Balanced Binary Trees

A general binary tree is of the form shown in Figure 8.19, where L and R are subtrees.

#### Figure 8.19

Ways to measure subtrees L and R for comparison with each other that have been applied include:

height (depth)

weight (the number of external nodes)

internal path length (IPL)

Suppose, for example, that the binary tree shown schematically in Figure 8.20 is balanced at n and at p, but an insertion is made into subtree R2 that causes an inbalance (by the criterion in use).

#### Figure 8.21

As a more specific example, the effect of an SLR on the tree in Figure 8.22 helps its balance (by almost any criterion).

#### Figure 8.23

For a specific example, a double rotation improves the balance of the tree in Figure 8.24 (by almost any criterion).

#### Figure 8.24

The right-rotation forms are symmetric to the left-rotations illustrated. Both single and double rotations are designed to maintain the order of the keys.

Since rotations may take place around any node in the tree, the parent of the pivot node is involved: the pivot node is either its left or its right child, and that pointer is changed. For the sake of clarity, it will be assumed that the pivot node is the root in the discussion that follows, and all the subtrees shown are non-NIL.

With these simplifying assumptions, SLR becomes:

`SLR:  p .Right  n .Left`

`n .Left  p`

`{switch parent-of-p pointer to n`

Similarly, DLR is:

`DLR:     p  .Left  n  .Right`

`gp  .Right  n .Left`

`n  .Right  p`

`n  .Left  gp`

`{switch parent-of-gp pointer to n`

In practice, a number of special cases must be dealt with.

Several general features of binary tree balancing are of interest. Insertion is always made at a leaf node. For height- and weight-balance criteria, insertion requires no more than one rotation (if any) in order to restore balance. With IPL balance, rotations can propagate down the tree.

Deletion can always be of a node with no more than one internal child. If the search for a deletion key locates it at node p, then in the sequence of values that determines the tree, the values adjacent to p .Key are

`Pred .Key < p .Key < Succ .Key`

where Pred is the node in the left subtree of p to which p would be added by an insertion. Similarly, Succ is the node in the right subtree of p to which p would be added by an insertion. Insertion always replaces an external node, which has a parent with no more than one internal child. Consequently, the p .Key can be swapped with Pred .Key (or with Succ .Key if p Left is external), and the latter deleted.

The nodes of a balanced tree generally contain information fields that indicate the state of balance at a node. Rotation, insertion, and deletion procedures maintain the fields to reflect altered conditions.

Perhaps the easiest dynamically balanced tree structure to maintain, and one of the most often applied, is the B-tree discussed in section 9.5.

A tree structure that is reformulated as a binary tree and then balanced is the Red-Black Tree of section 9.6 and Part II, section U.

Further details about height-balanced trees, commonly called AVL trees, are to be found in section 8.7.1. Details of the management routines for them are to be found in Part II, section R. Weight-balanced trees, commonly called BB() trees, are discussed in section 8.7.2.

IPL-balanced trees are treated in [GONNET, 1983]. Their advantage is in management routines that are simpler than for AVL trees, and their disadvantage is in rotations that can propagate down the tree.

The analysis of the cost of maintenance procedures for various balance criteria, their amortization, and their relative merits lie outside the aims of this book. As pointed out elsewhere [GONNET, 1983], research is yet to be completed on some questions concerning:

the (worst case, average, or amortized worst case)

number of (rotations, node inspections)

to (search, insert, or delete)

a key in (AVL, BB(), or IPL)

trees.

## 8.7.1 AVL Trees (optional)

If bt is a binary tree with subtrees L and R, then bt is height-balanced (has the AVL property) iff:

1. h(L) - h(R)| 1, where h is the height function.

2. L and R are themselves height-balanced.

The tree in Figure 8.25 fails to be height-balanced at E, and hence G, although it would be 1-balanced at G on the basis of the relative heights of its subtrees alone.

## 8.7.2 BB()Trees (optional)

`(bt) = 1/2 if bt is empty, and`

`otherwise`

Clearly, (bt) = 1/2 implies perfect balance and it must be true that

`0 < (bt) < 1`

The parameter that measures how close to 1/2 the balance is to be maintained is . A tree is of weight-balance or in the set BB() if:

1. (bt) 1 - and 1/2

2. if bt is not empty, L and R are in BB().

Clearly, = 0 is no restriction and = 1/2 asks for perfect balance. BB(x) is a subset of BB(y) if x y. It happens that if 1/3 1/2 , then BB()1/3 = BB (1/2).

#### Figure 8.26

Exercise E8.13 and problem P8.8 are appropriate at this point.

## 8.8 Threaded Binary Trees (optional)

A thread is created by replacing NIL (external) links. A NIL right link is replaced with a pointer to the node that would be processed next in an inorder traverse. A NIL left link is replaced by a pointer to the node that was processed last. The leftmost left link and the rightmost right link are set to NIL.

#### Figure 8.27

The nodes of a threaded binary tree must be designed to implement threads. The links and the threads need to be distinguished. They are distinguished in an array implementation by using negative indices tor links and in a pointer implementation by using flags:

`ThreadNode = RECORD`

`Value  : {ValueType`

`Left   :  ThreadNode`

`Lflag  : BOOLEAN`

`Right  : ThreadNode`

`Rflag  : BOOLEAN`

`END    {ThreadNode`

It is convenient to initialize the flag fields to TRUE, implying that the associated fields are pointers, not threads.

An unthreaded binary tree where the nodes are of type ThreadNode can be threaded by calling Baste(bt,NIL,NIL):

`procedure Baste( p,Pred,Next)                        {O(n)`

`if p = NIL then return endif`

`if p  .Left = NIL`

`then p  .Left  Pred`

`p  .Lflag  FALSE`

`else Baste( p  .Left,Pred,p)`

`endif`

`if p  .Right = NIL`

`then p  .Right  Next`

`p  .Rflag  FALSE`

`else Baste( p  .Right,p,Next)`

`endif`

`end {Baste`

Exercise E8.14 is appropriate at this point.

A traverse of a threaded tree follows the standard pattern of: "step left as far as possible, step right one link, move to the appropriate successor." When a node is reached by following a normal link, its left subtree is to be explored. If it is reached by a thread, its left subtree has already been explored. Processing of nodes can be in postorder, preorder, or in the inorder shown on the next page.

`procedure InThread(bt)                               {O(n)`

`GoLeft  TRUE`

`p  bt`

`while p  NIL do`

`if GoLeft`

`then while p  .Lflag do`

`p  p  .Left`

`endwhile`

`endif`

`PROCESS2(p)`

`GoLeft  p  .Rflag`

`p  p  .Right`

`endwhile`

`end {InThread`

Insertion of a node into a threaded binary tree raises the question of placement--of where the insertion is to take place. Balancing and order are not related to the choice between external nodes and threads and must depend on applications. Threads have only a minor affect on the emplacement of a new leaf node. To illustrate, consider the insertion of a node q as the right child of a leaf node p in a threaded binary tree:

`procedure ThreadRight(p,q)`

`q  .Lflag  FALSE`

`q  .Left  p`

`q  .Rflag  p  .Rflag`

`q  .Right  p  .Right`

`p  .Right  q`

`end {ThreadRight`

Adding a node as the left child of a leaf node is similar and is left as a problem.

Problem P8.9 is appropriate af this point.

Deletion of nodes from a threaded binary tree (any binary tree) depends upon shape parameters that are heavily application dependent. The crucial issue is: what should be done with the subtrees of the excised node? The possible responses to this question differ only in detail from those applied in unthreaded trees.

Program PG8.7 is appropriate at this point.

## Summary

Trees in the mathematical sense of a graph without cycles are a common form, and they can be represented in a program with the structure BINARY TREE. The nodes of a binary tree have two distinguished (left and right) child nodes. Like lists, stacks, and queues, the binary tree is dynamic and may be either endogenous or exogenous.

An important feature of binary trees is that every node is accessible from the root node by following links. There are several ways to traverse a binary tree, either recursively or iteratively. One is the universal iterative binary tree traverse BTT, which allows processing a node at each of three contacts with it during the traverse. Many application algorithms can be based on BTT or on its recursive counterpart RBTT.

Traverses more specialized than BTT process a node on only one of the traverse contacts with it, the standards being: preorder, inorder, and postorder traverses. The recursive forms of these traverses are especially handy for quick application. The iterative forms of the standard traverses require explicit stacks that are less extensive than those in the corresponding recursive forms.

A breadth-first traverse uses a queue to traverse a binary tree level-by-level. When generalized, it becomes the priority-queue traverse of a graph in Chapter 10.

Reaching a sought-for node in the minimum number of steps requires that a decision to traverse either a left subtree or a right subtree be made at each node in the walk. A binary tree with nodes ordered so that node keys direct the walk efficiently is a BST (Binary Search Tree).

An array support system for binary trees called heapform is efficient and of great utility. When order and completeness are included in a binary tree supported in heapform, it is a heap. (A heap may also be supported with pointers, but not conveniently.) Heaps are a preferred support structure for priority queues, which in practice replace several of the simplified application models presented in this book. Note the layers of structure: (array and binary tree) heapform heap priority queue.

Heap algorithms can be assembled into HeapSort, a sorting method guaranteed to be O(n In n) in the worst case. This is the theoretical optimum.

Binary trees are heavily used as directories to (endogenous or exogenous) files and hence repeatedly searched. The cost of searches must be amortized over a sequence of operations.

For static files, the shape is determined at creation and must be optimized at that point. A general balancing insertion algorithm is to be found in Part II, section P. A highly tuned static file technique, the use of Huffman codes, is applied to file compression and decoding of messages. The algorithms are to be found in Part II, section Q.

Dynamic files change shape with the operations of insertion and deletion, and so those operations are often sophisticated in order to maintain balance. Height, weight, and internal path length are all applied to devise balance criteria. Height-balanced trees (AVL trees) are detailed in Part II, section R. A more general structure, the B-tree, is described in Chapter 9.

Binary tree traverses can be directed without stacks by replacing NIL pointers with threads that point back to appropriate ancestors, at the cost of the initial threading.

## Exercises

Exercises immediate applications of the text material

E8.3 Derive the level of each node for the tree in Figure 8.28, the depth of the tree, the IPL, and the EPL. Modify the tree to find variations with the same set of keys that are full, or complete.

#### Figure 8.28

E8.4 Show how the vowel binary tree of E8.1 would appear in single-array and three-array form. (In the three-array form, the vowels are to be stored in alphabetical order.)

E8.5 Add parenthesis nodes to BT9 (section 8.3.2) so that the inorder trace produces the correct infix expression.

E8.8 Trace the insertion of T-I-P into the BST of Figure 8.29, followed by deletion of E and O:

#### Figure 8.29

E8.9 Construct the heap H2 of section 8.6 in stages, as was done for heap H1.

E8.10 Trace procedure UpHeap of section 8.5 on the letter sequence T-R-A-S-H-D-U-M-P, applied for k = 2, 3 . . . 9.

E8.14 Trace the values of the parameters p, Pred, and Next in procedure Baste of section 8.8 when it is applied to BT8. Show them as they would appear on top of the stack created by recursive calls and indicate thread assignments.

## Problems

Problems not immediate, but requiring no major extensions of the text material

P8.6 Redesign procedures NodeSpot and InsertNode of section 8.5.2 so that values may be duplicated in a (partially) ordered binary tree.

## Programs

Programs for trying it yourself

PG8.5 Write a program that constructs the International Morse code tree introduced in section 8.5.3 from the table below and allows users to enter a code message of dots, dashes, and separators. The output of the program is the decoded message.

`A -     B -     C --    D -     E `

`F -    G --     H      I       J ---`

`K --    L -     M --     N -     O ---`

`P --    Q ---    R -     S      T -`

`U -     V -     W --     X --   Y ---`

`Z --`