Task 5b of the better (or "bag") refinement of the algorithm could be done by writing the procedure update from scratch. Instead, it is done here using a tool that is available, the traverse procedure of Chapter 3. First, traverse is modified by adding bag and count as additional parameters of both traverse and process, and changing its name to update.

Process, called by update, must now be implemented so that update does its job. Its code is

>Comment

currentsucc = info(recordpointer);

decrease(currentsucc,count);

if(iszero(currentsucc,count))

ranking.rank[*pbag] = currentsucc;

*pbag = *pbag + 1;

A complete program using topsort follows. Notice that topsort has two parameters. It is the same as our version, except it copies the rank field of ranking into the output array as a last step.

Because of the implementation, some functions must have access to specific variables. C requires these variables to be either passed as pointers or declared as global. For example, process must have access to rank, and emptybag to next. Incidentally, as written, topsort does not depend on the medium assumed for the input pairs. Normally, we want programs to be independent of the form of the input. This has been accomplished here for the pairs by using the function nextpair(i,j), which inputs the next pair and returns true if it is a nonsentinel pair. The program uses 0 0 as the sentinel pair.

>Comment

#defineLIMIT 21

>Comment

typedef int outputarray[LIMIT];

main()

/* Reads the number of objects n and the input pairs

specifying a partial order on the objects, and prints

a topological sort of the n objects. Each pair should

not occur more than once in the input and n should

not exceed twenty. The sentinel pair is 0 0.

*/

int n,i;

>Comment

outputarray topologicalsort;

>Comment

topsort(&n,topologicalsort);

>Comment

printf("\n A TOPOLOGICAL SORT IS");

for (i=1;i>=n;1++)

printf("\n %d",topologicalsort[i]);

>Comment

typedef int countcollection[LIMIT];

predinitialization(n,count)

/* Initializes all n counts to zero. */

int n;

countcollection count;

int i;

for (i=1;i<=n;i++)

count[i] = 0;

increase(j,count)

/* Increases the j^th count by one. */

int j;

countcollection count;

count[j] = count[j] + 1;

decrease(j,count)

/* Decreases the j^th count by one. */

int j;

countcollection count;

count[j] = count[j] - 1;

iszero(i,count)

/* Returns true only if the i^th count is zero. */

int i;

countcollection count;

return(count[i] == 0);

>Comment

#define RECORDLIMIT 191

/* RECORDLIMIT should be at least

(((LIMIT-1)*(LIMIT-2))/2)+1

*/

#define NULL - 1

typedef int listpointer;

typedef struct

int succobj;

listpointer link;

}listrecords,recordsarray[RECORDLIMIT];

>Comment

recordsarray lists;

listpointer setnull()

/* Returns a null pointer. */

return(NULL);

anotherrecord(recordpointer)

/* Returns true only if recordpointer

points to a record.

*/

listpointer recordpointer;

return(recordpointer != NULL);

info(pointer)

/* Returns the contents of the succobj field

of the record pointed to by pointer.

*/

listpointer pointer;

return(lists[pointer].succobj);

listpointer next(pointer)

/* Returns the link field value of

the record pointed to by pointer.

*/

listpointer recordpointer;

return(lists[recordpointer].link);

setinfo(pointer,value)

/* Copies value into the succobj field

of the record pointed to by pointer.

*/

listpointer pointer;

int value;

lists[pointer].succobj = value;

setlink(pointer1,pointer2)

/* Copies pointer 2 into the link field

of the record pointed to by pointer 1.

*/

listpointer pointer1,pointer2;

lists[pointer 1].link = pointer2;

listpointer avail()

/* Returns a pointer to storage allocated

for a list record.

*/

static int t=0;

t++;

return(t);

>Comment

typedef listpointer succ_collection[LIMIT];

insert(i,j,succ)

/* Add j into the i^th collection in succ. */

int i,j;

succ_collection succ;

listpointer pointer,avail();

pointer = avail();

setinfo(pointer,j);

setlink(pointer,succ[i]);

succ[i] = pointer;

succinitialization(n,succ)

/* Initializes all n collections of succ to empty. */

int n;

succ_collection succ;

int i;

listpointer setnull();

for (i=1;i<=n;i++)

succ[i] = setnull();

listpointer access_succ(i,succ)

/* Returns a pointer to the i collection in succ. */

int i;

succ_collection succ;

return(succ[i]);

>Comment

typedef struct

outputarray rank;

int next;

}rankingrecord;

rankingrecord ranking;

1 allocates storage for ranking

copy(n,topologicalsort,ranking)

/* Copies the rank field of ranking

into topologicalsort.

*/

int n;

outputarray topologicalsort;

rankingrecord ranking;

int i;

for (i=1;i<=n;i++)

topologicalsort[i] = ranking.rank[i];

>Comment

typedef int bagcollection;

baginitialization(pbag,n,count)

/* Initializes the bag so it contains

only objects whose counts are zero.

*/

bagcollection *pbag;

int n;

countcollection count;

int i;

ranking.next = 1;

*pbag = 1;

for(i=1;i<=n;i++)

if(iszero(i,count))

ranking.rank[*pbag] = i;

*pbag = *pbag + 1;

emptybag(bag)

/* Returns true only if bag is empty. */

bagcollection bag;

return(bag == ranking.next);

remove(pbag,pobj)

/* Sets obj to an object to be removed

from bag, and removes it.

*/

bagcollection *pbag;

int *pobj;

*pobj = ranking.rank[ranking.next];

ranking.next = ranking.next + 1;

process(listname,recordpointer,pbag,count)

/* Decreases the count of the successor pointed

to by recordpointer and adds it to the bag if

its count has become zero.

*/

listpointer listname,recordpointer;

bagcollection *pbag;

countcollection count;

int currentsucc;

currentsucc = info(recordpointer);

decrease(currentsucc,count);

if (iszero(currentsucc,count))

ranking.rank[*pbag] = currentsucc;

*pbag = *pbag + 1;

>Comment

topsort(pn,topologicalsort)

/* Inputs the number of objects n and the

partial order and generates a solution in

topologicalsort.

*/

int *pn;

outputarray topologicalsort;

>Comment

countcollection count;

succ_collection succ;

bagcollection bag;

int i,j,obj;

>Comment

printf("\n enter n between 1 and %d\n",LIMIT-1);

scanf("%d",pn);

>Comment

predinitialization(*pn,count);

>Comment

succinitialization(*pn,succ);

>Comment

while (nextpair(&i,&j))

>Comment

increase(j,count);

>Comment

insert(i,j,succ);

>Comment

baginitialization(&bag,*pn,count);

>Comment

while(!emptybag(bag))

>Comment

remove(&bag,&obj);

>Comment

update(succ,obj,&bag,count);

>Comment

copy(*pn,topologicalsort,ranking);

nextpair(pi,pj)

int*pi,*pj;

/* Input the next pair and return true

only if it was not the sentinel pair.

*/

print("\n enter a pair \n");

scanf("%d %d",pi,pj);

return(!((*pi == 0) && (*pj == 0)));

>Comment

update(succ,obj,pbag,count)

/* Update the counts of all successors of obj

and place any whose counts become zero

into bag.

*/

succ_collection succ;

int obj;

bagcollection *pbag;

countcollection count;

listpointer listname,recordpointer,access_succ();

listname = access_succ(obj,succ);

recordpointer = listname;

while(anotherrecord(recordpointer))

process(listname,recordpointer,pbag,count);

recordpointer = next(recordpointer);

Note that the ranking and bag implementations are not independent. Thus, if it is desirable to change the implementation of one, the other is also affected. As written, the bag acts as a stack, but it may be important to pick the entry to be removed from the bag more selectively. One way to do this is to implement the bag as a priority queue. In any case, to make ranking and bag independent requires changing their declarations, definitions, and basic operations.

11.4: Analysis of Topsort

We will now analyze the time requirements of this implementation. Refer to the refinement shown in the implementation that used data abstractions on page 503. Reading n takes constant time. The count and succ array initializations take time proportional to n. The while loop read and test takes constant time. Reading the next input pair and tasks 3a and 3b, which involve executing five instructions, take constant time. Each repetition of the loop thus takes constant time. Since the loop is executed once for each of the m input pairs, the total loop time will be proportional to m. Phase I thus takes some constant time, plus time proportional to n, plus time proportional to m.

Initializing the bag at the start of phase II takes time proportional to n, since it involves traversing the count array and placing each object whose count is zero into the bag. The while loop body of phase II, unlike that of phase I, does not take the same amount of time on every repetition. Task 5a takes constant time, but task 5b depends on the number of successors of the object that was removed by task 5a. The loop itself is executed n times, once for each object output.

How can we determine the total loop time required? Notice that each of the n objects will eventually be output in some loop execution. We do not know in what order the objects will be output. However, the time taken in its particular loop execution for each object output is the same. This time is at most a constant plus time proportional to the number of successors of the object. Thus, the total time for the while loop in phase II is the sum of the time required to output each object. The total time is then a constant times n plus time proportional to the sum

[(number of successors of object 1) plus (number of successors of object 2) . . . plus the (number of successors of object n)]

This sum is just the total number of successors, m. Hence the total time for phase II is made up of the same kinds of components as the total time for phase I. We conclude that the total time for this implementation has the same form.

Note that the search-based algorithm could take time O(n!). The straight-forward implementation of the construction-based algorithm could take time O(n m). Topsort takes time of the form c₁ + c₂n + c₃m and represents a very substantial improvement. It was made possible because we were able to determine precisely, and to obtain efficiently, the information required to do the processing steps of the algorithm. The topological sort problem is more complex than earlier ones in this book, but it illustrates the same theme. The processing to be done determines the data structures that are most appropriate. Clearly, any implementation of an algorithm that does topological sorting must take time of the form an + bm, since it is necessary to read in all m input pairs and output all n objects.

How much storage does the implementation require? Suppose the program is to run correctly for values of n between 1 and 20. The only difficulty in determining the actual amount of storage required is in deciding what length to declare for lists. This depends on how large m may be. In general, if there are n objects, each object can have no more than n - 1 successors--that is, an arrow to every other object. There are then at most n(n - 1) possible successors. Each successor requires one record of the lists array. Actually, because of the asymmetry property, at most one-half of the n(n - 1) possibilities can appear. Hence we need a total of at most 1/2 n(n - 1) records for lists. If n is 20, then (20 19)/2 records will do. The storage required is proportional to n².

Knuth [1973a] and Wirth [1976] give different implementations of the algorithm for the topological sorting problem. See Aho, Hopcroft, and Ullman [1983] for a somewhat different point of view on its solution.

11.5: Behavior for Replicated Pairs or Loops

In the solution to the topological sorting problem, it is assumed that each distinct input pair appears only once and also that the input contains no loops. Either assumption might be violated. Of course, more obvious errors could occur. For example, n or one of the input pair members could be negative or out of range. It is not difficult to add a validation routine to check for these kinds of input errors. Subtler errors, such as replication of pairs or the occurrence of loops, are not as obviously remedied.

What does happen if pairs are replicated or loops occur in the input ? How would you recognize loops? Replication of an input pair i, j causes the count of j to be 1 larger than it should be. It also causes the successor j to appear on the list of successors of i one more time. When object i is eventually removed, its list of successors is traversed. This results in the count of j being reduced one additional time, so that the extra increase due to the i, j replication is cancelled. The implementation will thus work correctly as long as there is room for the extra successor records to appear in lists. If there is no room to accommodate these extra records, it is not possible to predict what will occur when the implementation is executed.

Loops, on the other hand, result in some objects never being output. Objects that are involved in a loop, or that are the successors of an object in a loop, will never have their counts go to zero, and will never appear in the bag of objects. In fact, had the test for completion been "S not empty" or "n objects output," loops in the input would cause an infinite loop in the program. It should now be clear that next-1, when the loop in phase II is exited, will be the actual number of objects that were output by the loop. If this is less than n, then loops occurred in the input. This can provide a simple test for loops.

11.6: Final Comments on Topsort

As already noted, the implementation of the output ranking and the bag are coupled because they share storage. As a result they are not independent. This means that a change in the choice of implementation for one will affect the other. In general, such a situation is to be avoided--and, as pointed out earlier, could have been avoided. Certainly the small saving in time and storage did not warrant the added complexity.

There are, however, some important storage considerations that were alluded to earlier. Topsort will run out of storage before it runs out of time, since storage requirements grow as n². The bulk of the storage needed is taken by the lists array. Suppose a maximum size, based on the available storage of the computer system, is declared for the lists array. Say its length is 50,000. Assuming each of its records takes two entries, this provides enough storage to guarantee the solution of any problem with no more than 25,000 successors (m 25,000). It makes no difference how the successors are distributed among the objects; only the total number is relevant.

Suppose, instead, that the decision had been to represent each collection of successors by dedicating an array to the collection. Since the programmer does not know in advance how the successors will be distributed, each of the required n arrays should have the same length. Lists are no longer necessary, since the successors can be placed sequentially in the proper array. Of course, a variable will be needed to keep track of the last successor in each array. An additional array may be used for these pointers. If n is to be no greater than (say) 500, then each of the arrays containing the successors can be declared of length 50,000/500, or 100. The upshot is that the solution may now be guaranteed for a different class of problems. These problems must have n 500, and each object may have no more than 100 successors. Even though the total number of successors may now be 50,000 (twice as many as before), their distribution is critical. When storage is at a premium, such considerations are important.

Had dynamic memory been used instead of the lists array for storage of the successor records, the program would contain no explicit storage limitation (other than the declarations for the length of count, succ, and rank). Still, the maximum dynamic memory size would have imposed a limit on the value of m for which the program would execute.

11.6.1 An Input Validation

One final point involving time and storage trade-offs. Suppose an input validation must be added to topsort just prior to the processing of the current i, j pair read in phase I. The function is to check whether i, j is a duplicate of an earlier input pair. If so, it is to be ignored; if not, it is to be processed. One way to accomplish this, which requires no additional storage, is to traverse the current list of successors of object i. If j appears on the list as a successor, then i, j is a duplicate; otherwise it is not. This takes time proportional to the current number of successors of i and thus adds a total time to phase I that is proportional to m². Instead, the function can work in constant time if n(n - 1) additional storage is available. Use the storage for an n(n - 1) array that is initialized to all zeros. When i, j is read, simply test the (i, j)th array entry. If it is zero, the pair is not a duplicate. Then set the entry to 1 to indicate that this i, j pair has been processed. In this way, an entry of 1 will mean that the pair is a duplicate from now on. This adds total time to phase I of O(m) but requires n(n - 1) additional storage.

11.7: Reviewing Methodology

A topological sort produces a ranking of objects that satisfies constraints on their allowable positions within the ranking. The obvious algorithm for finding a topological sort, searching through all rankings until one satisfying the constraints is found, is not feasible. A feasible algorithm was developed by constructing a ranking that satisfied the constraints. The initial implementation merely produced an image of the input data in memory. It was improved significantly by using an array to contain predecessor counts, creating lists of successors, and introducing a bag to hold objects whose predecessor counts became zero. This solution was achieved by a process of stepwise refinement, stressing data abstraction, encapsulation, modularity, and the proper choice of data structures. The functional modularity of the final solution allowed immediate application of a routine developed earlier--the list traversal routine. Again, because of functional modularity, it was not difficult to determine where and how to build in appropriate checks of the input data.

More generally, the analysis of the storage and time requirements of a fairly complex program was demonstrated. Even though this cannot always be accomplished, or may be difficult for complex programs, it serves as an example of what we hope to be able to do.

Exercises

1. Within a textbook, chapters cover specific topics. The information required to understand each topic is expected to have appeared before the topic appears in the book. Suppose you are given a list of topics and the other topics on which each depends. How might you select an ordering of the topics for their appearance in the book?

2. Which of these binary relations is a partial order on S?

a. "Square root of" on the set S of all real numbers greater than 1

b. "Is older than" on the set S of all your relatives

c. "Sits in front of" on the set S of all students in a class

d. "Lives diagonally across from" on the set S of all people

3. Write down the graphic representation for the binary relation R on the first 10 integers. R = (3, 2), (4, 6), (7, 6), (8, 1), (9, 7), (9, 8).

4. Write a detailed modification of the algorithm of Section 11.2 that will check if a given ranking is consistent with the given partial order.

5. Write a program to implement the refinement of Exercise 4 and determine its worst-case time and storage requirements.

6. Apply the construction-based algorithm of Section 11.3 to the following data to obtain a consistent ranking. Always select the smallest integer for output.

S = 1, 2, 3, . . . , 12

R = (3, 2), (4, 6), (7, 9), (12, 10), (5, 6), (2, 4), (8, 9), (2, 6), (3, 6), (7, 12), (3, 4), (7, 10)

7. If the second implementation of the construction-based algorithm is applied to the input of Exercise 6, what integers are in the bag after 3 is removed?

8. State clearly and concisely what purpose the bag serves in the better "bag" refinement and why it was introduced.

9. Write an expanded version of the better refinement that will output integers from the bag by selecting the smallest possible integer first.

10. What will the succ, lists and count arrays have in them after phase I, when the input pairs of the running example appear in the order 7 6, 5 3, 2 10, 6 8, 5 6, 7 3, 4 8, 7 1, 4 2?

11. What will the count, succ, and lists arrays look like after phase II as compared to after phase I?

12. Suppose the better implementation were modified so that no count array were stored in memory, and only the succ and lists arrays were available after phase I. Write a new phase II refinement under this constraint.

13. Suppose a solution to the topological sort problem had been constructed by determining what object should be at the bottom of the ranking. How would the better refinement be changed to reflect this new solution?

14. a. Suppose the bag is implemented in a separate record bag instead of using the ranking record. How must the data abstraction implementations change?

b. Suppose the output ranking is not kept in the ranking record but, instead, is simply printed. How must the program and data abstraction implementations change?

15. The function process is itself dependent on the ranking and bag implementations. Modify it so that process becomes independent of these implementations.

16. Suppose task 5b of the better refinement is further expanded to:

5b.i. For each successor of the output object, decrease its predecessor count by 1.

5b.ii. For each successor of the output object, if its predecessor count is zero, add the successor to the bag.

a. ow should the implementation of the data abstraction algorithm (on p. 503) be modified to reflect this refinement?

b. rite implementations for tasks 5b.i and 5b.ii that use traverse as a tool. This should include writing the corresponding two process functions.

c. ow will this expansion of task 5b affect the time required for the algorithm?

17. Suppose the input for Exercise 10 had the pairs below appended after 4 2. What would the graphic depictions of the successor lists be after phase I of topsort?

7 3, 5 6, 4 2, 7 3, 6 8

18. For the input of Exercise 17, what would be in the count array after phase I?

19. For the input of Exercise 17, show what will be in the count array, the successor lists, and the bag after 7 is output.

20. Take as input the Example 11.6 with pairs 10 4 and 10 1 appended after 4 2. What will the second implementation of topsort output in the rank array? What will be the value of next after phase II?

Suggested Assignments

1. Write and run a program whose input will be a series of topological sorting problems. For each problem the program should execute a slight modification of the topsort function called newtopsort. Newtopsort has parameters n and topologicalsort. Newtopsort is to read in and echo print n and all input pairs for the example it has been called to work on. It should print out the relevant portions of the count, succ and lists arrays as they appear after phase I. When newtopsort returns, the program prints the topological sort contained in the topologicalsort array. Except for the modifications needed to do the printing, newtopsort corresponds exactly to the sample implementation of topsort, with one exception: Implement the bag as an array separate from rank. Newtopsort should work correctly for valid input and for any number of objects between 1 and 20. You should make up four input sample problems to use. Two should be valid. One should include replications, and one should contain loops.

2. Suppose, instead of using the sample implementation, you do the following to keep successor information: Declare lists to be an n(n - 1) two-dimensional array and keep the successors of object i in the ith row of lists. For the example, after phase I, lists would be This two-dimensional lists array takes up the same total amount of storage as the one-dimensional lists array of the better implementation. For both implementations, the limiting factor on whether or not the program will run might then be the value of n beyond which available storage is exhausted. Suppose you are willing to give up the guarantee that topsort will always run correctly. This means you may attempt to run problems whose n is large enough that you are not sure whether the implementation of lists can hold all successors. Discuss the relative advantages of the implementations.

3. This is the same as Assignment 1 except, instead of the lists array, use dynamic memory to store the successor lists.

4. Assume your program for Assignment 1 treated the bag with the operations empty-bag, remove, and baginitialization and count with the operations predinitialization and increase as data abstractions.

a. What does this mean?

b. Write a function, number, that also treats them as data abstractions and returns the number of objects in the bag when it is invoked. When it returns, the bag must contain exactly the same objects as before number was invoked. Assume number does not have access to information about the output ranking.

Go to Chapter 12 Return to Table of Contents