4.8: Case Study: Checking Sequences for Proper Nesting

To bring together the major ideas of this chapter, a case study to check program structure for proper nesting is presented. Its main features are:

1. The top-down approach

2. Demonstration of the use of the stack, treating it as a data abstraction

3. Checking program structure for proper nesting by developing a nonrecursive and a recursive function

4. The different perspectives required to develop the two functions

Every computer programmer has at one time or another gotten error messages saying "incomplete loop structure" or "a missing parenthesis." You now have enough background to write a program that checks for such errors. Let's see how it can be done.

Arithmetic expressions, as well as a program text, consist of sequences of symbols. Complex arithmetic expressions may be composed by properly combining simpler arithmetic expressions, delineated by matching left and right parentheses. The simpler expressions are said to be nested within the more complex expressions that contain them. The complex expression (a * (b + [c /d])) + b) contains two pairs of nested expressions. C programs contain compound statements delimited by matching { (begin) and } (end) pairs. Ignoring all other symbols, these matching pairs must satisfy rules of nesting to be syntactically correct.

Nesting means that one matching pair is wholly contained within another. When pairs are nested, they may not overlap, as do the two pairs in ( [ ) ]. The rules of nesting are the same for any sequence that is built by combining simpler sequences. They require that the start and end of each component be specified. Checking complex sequences to determine that these begin-end pairs do not overlap is an important operation. It provides an interesting application of the stack data abstraction, as well as of recursion. An intimate connection exists among such sequences, stacks, recursions, and trees; it will be explored more fully in later chapters.

Two abstractions of this problem will be considered. The second is a generalization of the first. To begin, assume only one kind of left parenthesis and one kind of matching right parenthesis, and consider sequences composed of these. Thus, at first, expressions such as ([]) are not allowed, since they involve more than one kind of parenthesis.

Given a sequence of left and right parentheses, determine if they may be paired so that they satisfy two conditions:

i. Each pair consists of a left and right parenthesis, with the left parenthesis preceding the right parenthesis.

ii. Each parenthesis occurs in exactly one pair.

If a sequence has a pairing satisfying both of these conditions, the sequence and pairing are valid. The problem is to determine whether a given sequence is valid or not. No sequence starting with a right parenthesis, ")", can be valid, since there is no preceding left parenthesis, "(", to pair with it. Any valid sequence must therefore begin with some series of n "(" parentheses preceding the first appearance of a ")" parenthesis, such as Figure 4.11(a), for which n is 3. Suppose we remove this adjacent pair, the nth "(" parenthesis, and the immediately following first ")" parenthesis. This would be pair 1 in Figure 4.11(a). If the new sequence thus obtained is valid, then adding the removed pair to it results in a valid pairing. Thus the original sequence is valid.

Suppose you find that the resultant sequence is not valid. Can you conclude that the original sequence is not valid? In other words, has constraining one of the pairs to be the first adjacent "(" and ")" parentheses made a pairing of the original sequence satisfying (i) and (ii) impossible? The answer is no. To see this, assume that there is some valid pairing of the original sequence, in which the first ")" parenthesis is not paired with the immediately preceding "(" parenthesis, but there is no valid pairing in which it is. In general, the first ")" parenthesis must then be paired with an earlier "(" parenthesis, and the immediately preceding "("parenthesis must have been paired with a later ")" parenthesis, as in Figure 4.11(b). These two pairs can then be rearranged by interchanging their "(" parentheses. The resultant pairing satisfies conditions (i) and (ii), yet it has the first adjacent "(" parenthesis paired with the immediately following ")" parenthesis, but this is the nesting constraint that has been specified. This contradicts the assumption that no valid pairing satisfies the constraint.

Consequently, the conclusion is that a solution to the problem is obtained as follows:

1. Pair the first adjacent left, "(", and right, ")", parentheses and remove the pair.

2. If the resultant sequence is valid, then so is the original sequence; otherwise the original sequence is not valid.

Think of a left parenthesis as denoting an incurred obligation, and a right parenthesis as the fulfilling of the most recently incurred obligation. This is the precise situation for which a stack is useful. As the program scans a sequence

(a) Four Pairs Satisfying Conditions i and ii

(b) Four Other Pairs Satisfying Conditions i and ii

(c) Four Pairs Not Satisfying Conditions i and ii; No Other Four Pairs Will

Figure 4.11 Valid and Invalid Nesting of Parentheses

from left to right, it pushes each "(" parenthesis onto a stack, representing an incurred obligation. Task 1 of the solution may then be carried out by popping the stack when the first ")" parenthesis is encountered, to fulfill the most recently incurred obligation.

Carrying out task 2 requires first determining if the resultant sequence is valid. Notice, however, that had the resultant sequence been processed in this same way, the contents of the stack would be exactly what they are now. Consequently, the program can continue to scan the original sequence (which is exactly what would be seen if it were scanning the resultant sequence), pushing whenever it encounters a "(" parenthesis, and popping whenever it comes to a ")" parenthesis. If the stack is empty when the entire scan is completed, then we have determined that the original sequence is valid; otherwise it is not. During the scan, any attempt to pop an empty stack means that no "(" parenthesis is available to be paired with the ")" parenthesis that caused the attempt (that is, the sequence is invalid).

This solution may be implemented as in the function determine, which follows this paragraph. Note that the stack is treated as a data abstraction . Readchar is a function that reads into its parameter the next character of the input sequence and returns true, unless the character read was a specialchar used as a sentinel, and then it returns false. When determine returns, the entire input sequence has been read. Valid will return true if a pairing satisfying (i) and (ii) exists, and false otherwise.

Nonrecursive Version

determine(pvalid)

/* Returns true in valid only if the

input sequence is properly nested

*/

int *pvalid;

{

char currentchar;

struct

{

char topchar;

}topcharacter;

>Comment

stack s;

*pvalid = TRUE;

>Comment

setstack(&s);

>Comment

while(readchar(&currentchar))

>Comment

if(leftparen(currentchar))

push(&currentchar,&s);

>Comment

else if(rightparen(currentchar))

if(!empty(&s))

{

pop(&s,&topcharacter);

>Comment

if(!match(topcharacter.topchar,

currentchar))

*pvalid = FALSE;

}

else

*pvalid = FALSE;

else

>Comment

process(currentchar,&s);

>Comment

if(!empty(&s))

*pvalid = FALSE;

}

Leftparen returns the value true if its parameter is a left parenthesis, and false otherwise. Rightparen is defined similarly. Match returns the value true if topchar and currentchar are left and right parentheses, respectively.

One of the aims of the text is to show how general functions can be adapted to solve particular problems. Determine is such a general function. It can be adapted to evaluate a completely parenthesized infix expression. We repeat the rule of Section 4.6.2 to be used to evaluate such an expression:

1. Scan the infix expression from left to right.

2. When a "matched" pair of "left" and "right" parentheses is encountered,

apply the operator to the operands enclosed within the pair of parentheses, and replace the paired parentheses and the contained operator and operands by the result. Continue scanning from this point.

The matched pairs referred to in this rule are exactly the same as in this case study. Studying the rule, you can see that a right parenthesis signals that an operator and its operands are enclosed between the right parenthesis and its matching left parenthesis.

Determine may be modified to obtain a function evaluate, by defining the process routine as follows:

1. When an operand is encountered, place the operand on top of an operand stack.

2. When an operator is encountered, place the operator on top of the operator stack.

It is also necessary, when a "right" parenthesis is encountered, and the operator stack is not empty, to pop the operator stack, remove the required number of operands from the top of the operand stack, apply the operator to those operands, and place the result on the top of the operand stack. This must be done just before the pop(&s,&topcharacter) statement. Determine's stack, s, plays the role of the operator stack. An operand stack must be introduced and must be available to both the evaluate and process functions.

The general function determine is actually more complex than need be for "paired parentheses." In fact, a stack is not even necessary. All that is really needed is to keep a tally of "(" parentheses, decreasing the tally by 1 every time a ")" parenthesis is read. If the tally ever becomes negative, then determine should, from that point, simply read in the rest of the input sequence and return false. Otherwise, if the tally is zero after the entire input sequence has been read, it should return true. Determine is implemented as shown in the nonrecursive function listed above because it will serve as the solution to the next problem, for which "paired parentheses" will be a special case.

Suppose, instead of containing only "(" and ")" parentheses, the input sequence could consist of any characters. Assume that each character could be either a left or a right character but not both, and also that each left character has a unique right character with which it matches. For instance, a, (, [, and 1 might be "left" characters and the unique "right" characters with which they match z, ), ], and r, respectively.

Given a sequence of left and right characters, determine if they may be paired so that

i. Each pair consists of a left and a right matching character, with the left character preceding the right character.

ii. Each character occurs in exactly one pair.

iii. The sequence of characters between the left and right characters of each pair must also satisfy (i) and (ii).

If a sequence of left and right characters satisfies conditions (i), (ii), and (iii), the sequence and the pairing are valid. Condition (iii) then says that the sequence of characters between the left and right characters of each pair must be valid.

Let's see if this definition makes sense. Consider the above sample "matching" involving a, z, (, ), [, ], l, and r and the sequences of Figure 4.12.

Both the pairings of Figure 4.12(a) and those of Figure 4.12(b) satisfy (i) and (ii). However, only the pairing of (a) satisfies (iii). The pairing of (b) fails because the left and right characters of pair 3 are a and z, and the sequence of characters between them is ])1r. This sequence does not satisfy conditions (i), (ii), and (iii), because none of the six possible ways to form two pairs will be valid. It is also apparent that the pairing of (b) fails, because the left and right characters of pair 2 are "[" and "]" and the sequence of characters between them, a, is not valid. One would have to be sure no other pairings for (b) are valid before concluding that (b)is not a valid sequence.

It is not immediately apparent that this problem generalizes the first problem, even if the characters are restricted to be just left and right parentheses. This is because condition (iii) does not appear in the first problem. However, the reasoning used there shows that adding the condition does not change the problem. In fact, if the left and right matching characters are thought of as different kinds of parentheses, the same reasoning applied in the first problem can be applied to the present problem.

(a) Five pairs Satisfying Conditions i-iii

(b) Five Pairs Not Satisfying Conditions i-iii; No Other Five Pairs Will

Figure 4.12 Valid and Invalid Nesting of Matched Characters

Again, any valid sequence must start with some series of n > 0 left parenthese preceding the first appearance of a right parenthesis. Suppose its matching left parenthesis does not immediately precede it. This is the situation, for example, in Figure 4.12(b), where there are four left parentheses, a, (, [, a, appearing before "]," the first right parenthesis. In order for this first right parenthesis to satisfy (i) and (ii), it must then be paired with a previous matching left parenthesis. But then the immediately preceding left parenthesis is the last character in the sequence between that pair of parentheses. In order to satisfy (iii), this sequence must be valid. This sequence cannot be valid since this last left parenthesis has no matching right parenthesis in this sequence, as indicated in Figure 4.13.

In conclusion, a solution may be obtained as follows:

1. Pair the first adjacent left and right parentheses that match, and remove the pair.

2. If the resultant sequence is valid, then so is the original sequence; otherwise the original sequence is not valid.

Figure 4.13 An Invalid Nesting Sequence

Function determine is also an implementation of this solution using a stack, assuming leftparen, rightparen, and match reflect the different left parentheses, right parentheses, and matched pairs, respectively. The stack is needed in order to implement this solution. It is no longer simply a question of increasing or decreasing a tally, depending on whether the program reads a "(" or a ")"parenthesis, and making sure it never goes below zero and ends at zero. It is not even enough to keep a distinct tally for each of the different kinds of "left" and "right" parentheses. This would not result in Figure 4.12(b), for example, being recognized as invalid, because each of the four numbers would satisfy the requirements just mentioned. It is condition (iii), and more than one kind of left and right parentheses, that makes the stack mandatory. Consider a pair satisfying conditions (i) and (ii). To satisfy condition (iii), the sequence of characters between the "left" and "right" parentheses of the pair must also be valid. In effect, a new set of tallies must be kept for that sequence, and they must satisfy the three conditions. The stack is actually holding all of these tallies. Each time a "left" parenthesis appears, it becomes a character in many sequences, in general, and the tally for it, associated with each of those sequences, must be increased by one. Pushing it onto the stack, in effect, increases all these tallies by one simultaneously. Similarly, popping the stack decreases them all simultaneously.

So far the solution has been nonrecursive and has required looking at the input as simply a sequence of parentheses satisfying certain local constraints; a left parenthesis must encounter a matching right parenthesis before any other right parenthesis appears. Any global structure inherent in the input has been ignored. Development of a recursive solution, however, requires that a valid input be defined recursively. The recursive definition of a valid sequence may be restated to show its structure more clearly.

A valid sequence is

n A null sequence

n A left and right matching pair of parentheses with a valid sequence between them

n A valid sequence followed by a valid sequence

In other words, a valid sequence is

n A null sequence,

n One valid sequence surrounded by a pair of matching parentheses, or

n A series of valid sequences, each surrounded by a pair of matching parentheses

The last case would appear as

(valid sequence)(valid sequence) . . . (valid sequence)

The following program is a recursive version of determine based on the recursive definition.

Recursive Version

determine(pvalid,last)

/* Returns true in valid only if the input

sequence is properly nested; last must

initially contain a special character

used as the input terminator.

*/

int *pvalid;

char last;

{

char currentchar;

int done;

>Comment

static int more;

done = FALSE;

more = TRUE;

while(!done && more)

17 loops through each input character

{

>Comment

more = readchar(&currentchar);

if(leftparen(currentchar))

>Comment

determine(pvalid,currentchar);

else if(rightparen(currentchar))

>Comment

if(!match (last,currentchar))

*pvalid = FALSE;

else

{

>Comment

done = TRUE;

>Comment

last = SPECIALCHAR;

}

else

>Comment

process(last,currentchar);

}

>Comment

if(last != SPECIALCHAR)

*pvalid = FALSE;

}

Determine must be called initially with last set to specialchar. It is used as an input terminator and should not occur in the sequence being checked. When a valid sequence surrounded by a pair of matching parentheses has been encountered, the variable done is set to true. At that point the program continues looking for a next such sequence. However, last must be reinitialized to the specialchar. The variable more is used to signal whether or not more characters remain to be read in the input string. It is declared to be static because it must retain its value between any recursive calls to determine.

Both this recursive solution and the iterative solution work from the inside out. The iterative program looks for adjacent matching left and right parentheses in the interior of the sequence. The recursive program looks for a valid sequence surrounded by a matching pair of parentheses. Thus the iterative solution sees the input

[ ( [ ] ) ] ( ) { [ ] } #

as

[ ( [ 1 ] ) ] ( 4 ) { [ 5 ] } #     (6 pairs of matching parentheses)

-   2   -    -   6   -

-  - 3   - -

and the recursive solution sees it as

[ ( [ ] ) ] ( 2 ) { [ ] } #    (3 valid sequences each one surrounded by a pair of

- -  1 - -         - 3  -       matching parentheses)

where # denotes the specialchar.

Another recursive solution could be written to work from the outside in, looking for a rightmost mate to the first left parenthesis and then checking the sequence between them for validity. This would lead to a less efficient solution, because parentheses could be considered more than once. Recursion often involves this dual possibility, so both should be explored.