Data Structures and Algorithms: CHAPTER 5: Advanced Set Representation Methods

5.1	Draw all possible binary search trees containing the four elements 1, 2, 3, 4.
5.2	Insert the integers 7, 2, 9, 0, 5, 6, 8, 1 into a binary search tree by repeated application of the procedure INSERT of Fig. 5.3.
5.3	Show the result of deleting 7, then 2 from the final tree of Exercise 5.2.
*5.4	When deleting two elements from a binary search tree using the procedure of Fig. 5.5, does the final tree ever depend on the order in which you delete them?
5.5	We wish to keep track of all 5-character substrings that occur in a given string, using a trie. Show the trie that results when we insert the 14 substrings of length five of the string ABCDABACDEBACADEBA.
*5.6	To implement Exercise 5.5, we could keep a pointer at each leaf, which, say, represents string abcde, to the interior node representing the suffix bcde. That way, if the next symbol, say f, is received, we don't have to insert all of bcdef, starting at the root. Furthermore, having seen abcde, we may as well create nodes for bcde, cde, de, and e, since we shall, unless the sequence ends abruptly, need those nodes eventually. Modify the trie data structure to maintain such pointers, and modify the trie insertion algorithm to take advantage of this data structure.
5.7	Show the 2-3 tree that results if we insert into an empty set, represented as a 2-3 tree, the elements 5, 2, 7, 0, 3, 4, 6, 1, 8, 9.
5.8	Show the result of deleting 3 from the 2-3 tree that results from Exercise 5.7.
5.9	Show the successive values of the various S_i's when implementing the LCS algorithm of Fig. 5.29 with first string abacabada, and second string bdbacbad.
5.10	Suppose we use 2-3 trees to implement the MERGE and SPLIT operations as in Section 5.6. Show the result of splitting the tree of Exercise 5.7 at 6. Merge the tree of Exercise 5.7 with the tree consisting of leaves for elements 10 and 11.
5.11	Some of the structures discussed in this chapter can be modified easily to support the MAPPING ADT. Write procedures MAKENULL, ASSIGN, and COMPUTE to operate on the following data structures. Binary search trees. The "<" ordering applies to domain elements. 2-3 trees. At interior nodes, place only the key field of domain elements.
5.12	Show that in any subtree of a binary search tree, the minimum element is at a node without a left child.
5.13	Use Exercise 5.12 to produce a nonrecursive version of DELETE-MIN.
5.14	Write procedures ASSIGN, VALUEOF, MAKENULL and GETNEW for trie nodes represented as lists of cells.
*5.15	How do the trie (list of cells implementation), the open hash table, and the binary search tree compare for speed and for space utilization when elements are strings of up to ten characters?
*5.16	If elements of a set are ordered by a "<" relation, then we can keep one or two elements (not just their keys) at interior nodes of a 2-3 tree, and we then do not have to keep these elements at the leaves. Write INSERT and DELETE procedures for 2-3 trees of this type.
5.17	Another modification we could make to 2-3 trees is to keep only keys at interior nodes, but do not require that the keys k₁ and k₂ at a node truly be the minimum keys of the second and third subtrees, just that all keys k of the third subtree satisfy k ?/FONT> k₂, all keys k of the second satisfy k₁ ?/FONT> k < k₂, and all keys k of the first satisfy k < k₁. How does this convention simplify the DELETE operation? Which of the dictionary and mapping operations are made more complicated or less efficient?
*5.18	Another data structure that supports dictionaries with the MIN operation is the AVL tree (named for the inventors' initials) or height-balanced tree. These trees are binary search trees in which the heights of two siblings are not permitted to differ by more than one. Write procedures to implement INSERT and DELETE, while maintaining the AVL-tree property.
5.19	Write the Pascal program for procedure delete1 of Fig. 5.21.
*5.20	A finite automaton consists of a set of states, which we shall take to be the integers 1..n and a table transitions[state, input] giving a next state for each state and each input character. For our purposes, we shall assume that the input is always either 0 or 1. Further, certain of the states are designated accepting states. For our purposes, we shall assume that all and only the even numbered states are accepting. Two states p and q are equivalent if either they are the same state, or (i) they are both accepting or both nonaccepting, (ii) on input 0 they transfer to equivalent states, and (iii) on input 1 they transfer to equivalent states. Intuitively, equivalent states behave the same on all sequences of inputs; either both or neither lead to accepting states. Write a program using the MFSET operations that computes the sets of equivalent states of a given finite automaton.
**5.21	In the tree implementation of MFSET: Show that W(n log n) time is needed for certain lists of n operations if path compression is used but larger trees are permitted to be merged into smaller ones. Show that O(na(n)) is the worst case running time for n operations if path compression is used, and the smaller tree is always merged into the larger.
5.22	Select a data structure and write a program to compute PLACES (defined in Section 5.6) in average time O(n) for strings of length n.
*5.23	Modify the LCS procedure of Fig. 5.29 to compute the LCS, not just its length.
*5.24	Write a detailed SPLIT procedure to work on 2-3 trees.
*5.25	If elements of a set represented by a 2-3 tree consist only of a key field, an element whose key appears at an interior node need not appear at a leaf. Rewrite the dictionary operations to take advantage of this fact and avoid storing any element at two different nodes.

Bibliographic Notes

Tries were first proposed by Fredkin [1960]. Bayer and McCreight [1972] introduced B-trees, which, as we shall see in Chapter 11, are a generalization of 2-3 trees. The first uses of 2-3 trees were by J. E. Hopcroft in 1970 (unpublished) for insertion, deletion, concatenation, and splitting, and by Ullman [1974] for a code optimization problem.

The tree structure of Section 5.5, using path compression and merging smaller into larger, was first used by M. D. McIlroy and R. Morris to construct minimum-cost spanning trees. The performance of the tree implementation of MFSET's was analyzed by Fischer [1972] and by Hopcroft and Ullman [1973]. Exercise 5.21(b) is from Tarjan [1975].

The solution to the LCS problem of Section 5.6 is from Hunt and Szymanski [1977]. An efficient data structure for FIND, SPLIT, and the restricted MERGE (where all elements of one set are less than those of the other) is described in van Emde Boas, Kaas, and Zijlstra [1977].

Exercise 5.6 is based on an efficient algorithm for matching patterns developed by Weiner [1973]. The 2-3 tree variant of Exercise 5.16 is discussed in detail in Wirth [1976]. The AVL tree structure in Exercise 5.18 is from Adel'son-Vel'skii and Landis [1962].

† Recall the left child of the root is a descendant of itself, so we have not ruled out the possibility that x is at the left child of the root.

† The highest-valued node among the descendants of the left child would do as well.

† Trie was originally intended to be a homonym of "tree" but to distinguish these two terms many people prefer to pronounce trie as though it rhymes with "pie."

† There is another version of 2-3 trees that places whole records at interior nodes, as a binary search tree does.

† All nodes, however, take the largest amount of space needed for any variant types, so Pascal is not really the best language of implementing 2-3 trees in practice.M

† A useful variant would take only a key value and delete any element with that key.

† We say a is congruent to b modulo n if a and b have the same remainders when divided by n, or put another way, a-b is a multiple of n.

† Note that n- 1 is the largest number of merges that can be performed before all elements are in one set.

‡ Note that our ability to call the resulting component by the name of either of its constituents is important here, although in the simpler implementation, the name of the first argument was always picked.

† Strictly speaking we should use a different name for the MERGE operation, as the implementation we propose will not work to compute the arbitrary union of disjoint sets, while keeping the elements sorted so operations like SPLIT and FIND can be performed.

Advanced Set Representation Methods

5.1 Binary Search Trees

5.2 Time Analysis of Binary Search Tree Operations

Evaluation of Binary Search Tree Performance

5.3 Tries

Trie Nodes as ADT's

A List Representation for Trie Nodes

Evaluation of the Trie Data Structure

5.4 Balanced Tree Implementations of Sets

Insertion into a 2-3 Tree

Deletion in a 2-3 tree

Data Types for 2-3 Trees

Implementation of INSERT

Implementation of DELETE

5.5 Sets with the MERGE and FIND Operations

A Simple Implementation of MFSET

A Faster Implementation of MFSET

A Tree Implementation of MFSET's

Path Compression

The Function a(n)

5.6 An ADT with MERGE and SPLIT

The Longest Common Subsequence Problem

Time Analysis of the LCS Algorithm

Exercises

Bibliographic Notes