9.4.3 The Shape of Simple Binary Search Trees

The binary search tree grown by the algorithm of the preceding section will have its shape determined by the input order of the records to be stored. If they are input in sorted order or reverse sorted order, the result is Figure 9.7(b) or its mirror image. If they are input in the order 55, 40, 83, 35, 51, 80, 85, 17, 37, 50, 52, 70, 81, 84, 100, the result is Figure 9.7(c). There are n! possible input orderings of n distinct keys. Some of these may give rise to the same binary search tree. Recall that the tree grown cannot have depth greater than n nor less than ?/FONT>lg(n + 1)?/FONT>. Suppose the input is selected so that each of the n! possible orderings is equally likely to occur. This would be the case if the first input record is selected at random from among the n records, the second is selected at random from among the remaining n - 1 records, and so on. What will be the average depth of trees grown by the algorithm in this case? To calculate the average depth directly would mean to generate, for each of the n! orderings of input records, the binary search tree grown and note its depth. Then add up the n! depths and divide by n! to obtain the average depth.

It is difficult to analyze this problem using probability theory. However, the average depth has been shown to be O(c lg n), where c is about 4. The average search time can be calculated for a randomly grown binary search tree. It is the average time required to search for a record stored in the binary search tree, assuming each record is equally likely to be searched. This average search time is O(1.39 lg n). In the case of random input, the average search of the binary search tree grown by the algorithm will be only 39 percent more than the smallest possible average search time! Moreover, the algorithm for growing the tree will take an average time at most O(nlg n). If the records are given in sorted order instead of in random order, it will take the algorithm O(n2) time to grow the tree.

It is possible to grow "balanced" binary search trees, also called AVL trees (after the discoverers of these trees, Adelson-Velskii and Landis [1962]). The algorithm for growing AVL trees is slower and more complex, as is the insertion and deletion of records. However, they are guaranteed to have depth no greater than about 1.44 lg n. This is the worst-case depth. Consequently, AVL trees may be searched in time at most O(1.44 lg n). Such trees appear quite healthy and are never sparse. However it takes great care to grow healthy trees, as will be seen later.