4.8: Case Study: Checking Sequences for Proper Nesting

To bring together the major ideas of this chapter, a case study to check program structure for proper nesting is presented. Its main features are:

1. The top-down approach

2. Demonstration of the use of the stack, treating it as a data abstraction

3. Checking program structure for proper nesting by developing a nonrecursive and a recursive function

4. The different perspectives required to develop the two functions

Every computer programmer has at one time or another gotten error messages saying "incomplete loop structure" or "a missing parenthesis." You now have enough background to write a program that checks for such errors. Let's see how it can be done.

Arithmetic expressions, as well as a program text, consist of sequences of symbols. Complex arithmetic expressions may be composed by properly combining simpler arithmetic expressions, delineated by matching left and right parentheses. The simpler expressions are said to be nested within the more complex expressions that contain them. The complex expression (a * (b + [c /d])) + b) contains two pairs of nested expressions. C programs contain compound statements delimited by matching { (begin) and } (end) pairs. Ignoring all other symbols, these matching pairs must satisfy rules of nesting to be syntactically correct.

Nesting means that one matching pair is wholly contained within another. When pairs are nested, they may not overlap, as do the two pairs in ( [ ) ]. The rules of nesting are the same for any sequence that is built by combining simpler sequences. They require that the start and end of each component be specified. Checking complex sequences to determine that these begin-end pairs do not overlap is an important operation. It provides an interesting application of the stack data abstraction, as well as of recursion. An intimate connection exists among such sequences, stacks, recursions, and trees; it will be explored more fully in later chapters.

Two abstractions of this problem will be considered. The second is a generalization of the first. To begin, assume only one kind of left parenthesis and one kind of matching right parenthesis, and consider sequences composed of these. Thus, at first, expressions such as ([]) are not allowed, since they involve more than one kind of parenthesis.

Given a sequence of left and right parentheses, determine if they may be paired so that they satisfy two conditions:

i. Each pair consists of a left and right parenthesis, with the left parenthesis preceding the right parenthesis.

ii. Each parenthesis occurs in exactly one pair.

If a sequence has a pairing satisfying both of these conditions, the sequence and pairing are valid. The problem is to determine whether a given sequence is valid or not. No sequence starting with a right parenthesis, ")", can be valid, since there is no preceding left parenthesis, "(", to pair with it. Any valid sequence must therefore begin with some series of n "(" parentheses preceding the first appearance of a ")" parenthesis, such as Figure 4.11(a), for which n is 3. Suppose we remove this adjacent pair, the nth "(" parenthesis, and the immediately following first ")" parenthesis. This would be pair 1 in Figure 4.11(a). If the new sequence thus obtained is valid, then adding the removed pair to it results in a valid pairing. Thus the original sequence is valid.

Suppose you find that the resultant sequence is not valid. Can you conclude that the original sequence is not valid? In other words, has constraining one of the pairs to be the first adjacent "(" and ")" parentheses made a pairing of the original sequence satisfying (i) and (ii) impossible? The answer is no. To see this, assume that there is some valid pairing of the original sequence, in which the first ")" parenthesis is not paired with the immediately preceding "(" parenthesis, but there is no valid pairing in which it is. In general, the first ")" parenthesis must then be paired with an earlier "(" parenthesis, and the immediately preceding "("parenthesis must have been paired with a later ")" parenthesis, as in Figure 4.11(b). These two pairs can then be rearranged by interchanging their "(" parentheses. The resultant pairing satisfies conditions (i) and (ii), yet it has the first adjacent "(" parenthesis paired with the immediately following ")" parenthesis, but this is the nesting constraint that has been specified. This contradicts the assumption that no valid pairing satisfies the constraint.

Consequently, the conclusion is that a solution to the problem is obtained as follows:

1. Pair the first adjacent left, "(", and right, ")", parentheses and remove the pair.

2. If the resultant sequence is valid, then so is the original sequence; otherwise the original sequence is not valid.

Think of a left parenthesis as denoting an incurred obligation, and a right parenthesis as the fulfilling of the most recently incurred obligation. This is the precise situation for which a stack is useful. As the program scans a sequence