*Introduces topological sorting**as an example of the application of the problem solving methodology and programming style presented in this text**background for the problem is given first**each step and decision is explained as the solution evolves**Emphasis is on**top-down design**data abstraction**functional modularization**intelligent choice of information to store and data structure in which to store it*

* Topological sorting* requires ranking a set of objects subject to constraints on the resultant topology--that is, on the placement of the objects. It occurs in many practical situations. For example, textbooks are often written so that each chapter builds on material covered earlier and cannot be understood without this base of information. Determining the order in which to present topics to ensure that all prerequisite material has been covered reduces to finding a topological sort of the topics. Other examples involving topological sorting are presented later.

The following are some examples of problems whose solutions require topological sorting.

Some mathematical concepts and terminology must be defined before the topological sorting problem can be stated and solved in abstract terms. As with the solution to any problem, stating this solution abstractly is desirable so that it will not be tied to a particular application.

Of particular interest are the kinds of binary relations called *partial orders*. A binary relation *R* on *S* is called **a partial order on**** S** if it has the following properties:

*Asymmetry* For any two distinct *a* and *b* in *S*, *aRb* implies

*Transitivity* For any three distinct *a*, *b*, and *c* in *S*, *aRb* and *bRc* implies *aRc*

It is irreflexive, since *a* is divided evenly by but is equal to *a* for all *a* in *S*.

To show that a relation does not have a property requires only one counter-example.

No arrows appear from any node to itself (irreflexivity)

No arrows appear from any node i to another node j and back from j to i (asymmetry)

If arrows appear from i to j and from j to k, then an arrow must appear from i to k (transitivity)

Partial orders are important. They are meant to capture abstractly any concrete example in which the pairs of *R* (or arrows of the diagram) mean "is greater than," "precedes," "is ranked higher than," or "comes before." A special case of a partial order is one whose diagram looks like the following:

3 5 1 2 4 6 8 7

Consider the partial order on the set of nine integers represented by Figure 11.2.

10 value of n

7 3

5 3

2 10

6 8

5 6 m = 9 pairs

4 8

7 1

4 2

7 6

Consider the specific ranking 4, 8, 1, 2, 7, l0, 9, 5, 3, 6 of the 10 objects of the example. It is easy to determine whether or not this ranking violates any constraints of the partial order by checking each one. Since this ranking violates two of the nine constraints, 6 8 and 7 1, it is not a topological sort. Any ranking may be checked in this way. This leads to an obvious algorithm based on the generation of all rankings of the n objects.

The first solution to Example 1l.6 searched through all possible rankings. Another approach is to attempt the construction of the required ranking. To this end consider a ranking, obj_{1}, obj_{2}, . . . , obj* _{n}*, of the

We have just discovered how to construct a topological sort:

A construction-based algorithm:

1. Select any object with no predecessors.

2. Place it in the next position of the output ranking.

3. Remove that object from *S*, and

4. Remove its arrows from the partial order.

The output ranking produced will always be a topological sort.

The *first refinement* describes an implementation of the algorithm.

3. While there is another input pair,

a. record the information in the pair *i, j* phase I

4. Initialize for phase II. phase II

5. While the set *S* is not empty,

a. select an object with no predecessors, and output it as the next object in the output ranking;

b. update the records to reflect the removal of the output objects.

A straightforward representation of the partial order would be to keep an image in memory of all input arrows. We might use two arrays, first and second, for this purpose. First would contain the first integer, and second would contain the second integer of an input pair. The "record the information" task would then involve putting* i *and* j *into the next available locations of first and second, respectively. For Example 11.6, after phase I was executed, the result would be the arrays shown in Figure 11.4.

It is apparent that a lot of time is spent finding the information needed to carry out the processing in phase II. Efficiency can be improved by focusing on the following important question. *Precisely* what information is needed to carry out the processing?

a. Initialize the predecessor counts for each of the* n* objects to zero.

b. Initialize current successors for each of the* n* objects to zero.

3. While there is another input pair,

a. increase the predecessor count of *j *by l;

b. add *j* to the current successors of *i*.

4. Place all objects with zero predecessor count into the bag.

5. While the bag is not empty,

a. remove an object from the bag and output it in the output ranking;

2. a. predinitialization(n,count);

b. succinitialization(n,successor);

4. baginitialization(bag,n,count);

b. update(successor,obj,bag,count)

At some future time a decision may be made to change the implementation of any data abstraction. This requires finding *all* declarations and definitions of the data structures and functions involved and replacing them with the new versions.This modification may be made more easily and more reliably when the originals have been localized in the program. Such localization is called **encapsulation****.** Not all high-level languages provide appropriate tools for achieving the same degree of encapsulation. In the ideal final program the data abstractions are defined in precisely one place; even more important, no operations are allowed on the corresponding data structures except those specified by the functions of the data abstraction. In the implementation above, this means, for example, that for the bag only the baginitialization, emptybag, remove, plus any other operations we take as basic to the bag will appear together. This makes it easier to determine which operations may be affected by a new implementation of the bag.The greater the encapsulation and modularization, the more expeditious the process of achieving correct modifications and adaptations. How to do this in C was discussed in Chapter 5.

count[j] = count[j] + 1;

count[i] = count[i] - 1;

return(count[i] == 0);

static int t=0;

t++;

return(t);

insert

pointer = avail();

setinfo(pointer,j);

setlink(pointer,succ[i]);

succ[i] = pointer;

succinitialization

for (i=1;i<=n;i++)

succ[i] = setnull();

emptybag

emptybag =(bag == ranking.next);

remove

*pobj = ranking.rank[ranking.next];

ranking.next = ranking.next + 1;

Note that avail must correctly initialize and update t.

Task 5b of the better (or "bag") refinement of the algorithm could be done by writing the procedure update from scratch. Instead, it is done here using a tool that is available, the traverse procedure of Chapter 3. First, traverse is modified by adding bag and count as additional parameters of both traverse and process, and changing its name to update.

Process, called by update, must now be implemented so that update** **does its job. Its code is

currentsucc = info(recordpointer);

decrease(currentsucc,count);

if(iszero(currentsucc,count))

{

ranking.rank[*pbag] = currentsucc;

*pbag = *pbag + 1;

}

A complete program using topsort follows. Notice that topsort has two parameters. It is the same as our version, except it copies the rank field of ranking into the output array as a last step.

Because of the implementation, some functions must have access to specific variables. C requires these variables to be either passed as pointers or declared as global. For example, process must have access to rank, and emptybag to next. Incidentally, as written, topsort does not depend on the medium assumed for the input pairs. Normally, we want programs to be independent of the form of the input. This has been accomplished here for the pairs by using the function nextpair(i,j), which inputs the next pair and returns *true* if it is a nonsentinel pair. The program uses 0 0 as the sentinel pair.

#defineLIMIT 21

typedef int outputarray[LIMIT];

main()

/* Reads the number of objects n and the input pairs

specifying a partial order on the objects, and prints

a topological sort of the n objects. Each pair should

not occur more than once in the input and n should

not exceed twenty. The sentinel pair is 0 0.

*/

{

int n,i;

outputarray topologicalsort;

topsort(&n,topologicalsort);

printf("\n A TOPOLOGICAL SORT IS");

for (i=1;i>=n;1++)

printf("\n %d",topologicalsort[i]);

}

typedef int countcollection[LIMIT];

predinitialization(n,count)

/* Initializes all n counts to zero. */

int n;

countcollection count;

{

int i;

for (i=1;i<=n;i++)

count[i] = 0;

}

increase(j,count)

/* Increases the j^{th}count by one. */

int j;

countcollection count;

{

count[j] = count[j] + 1;

}

decrease(j,count)

/* Decreases the j^{th}count by one. */

int j;

countcollection count;

{

count[j] = count[j] - 1;

}

iszero(i,count)

/* Returns true only if the i^{th}count is zero. */

int i;

countcollection count;

{

return(count[i] == 0);

}

#define RECORDLIMIT 191

/* RECORDLIMIT should be at least

(((LIMIT-1)*(LIMIT-2))/2)+1

*/

#define NULL - 1

typedef int listpointer;

typedef struct

{

int succobj;

listpointer link;

}listrecords,recordsarray[RECORDLIMIT];

recordsarray lists;

listpointer setnull()

/* Returns a null pointer. */

{

return(NULL);

}

anotherrecord(recordpointer)

/* Returns true only if recordpointer

points to a record.

*/

listpointer recordpointer;

{

return(recordpointer != NULL);

}

info(pointer)

/* Returns the contents of the succobj field

of the record pointed to by pointer.

*/

listpointer pointer;

{

return(lists[pointer].succobj);

}

listpointer next(pointer)

/* Returns the link field value of

the record pointed to by pointer.

*/

listpointer recordpointer;

{

return(lists[recordpointer].link);

}

setinfo(pointer,value)

/* Copies value into the succobj field

of the record pointed to by pointer.

*/

listpointer pointer;

int value;

{

lists[pointer].succobj = value;

}

setlink(pointer1,pointer2)

/* Copies pointer 2 into the link field

of the record pointed to by pointer 1.

*/

listpointer pointer1,pointer2;

{

lists[pointer 1].link = pointer2;

}

listpointer avail()

/* Returns a pointer to storage allocated

for a list record.

*/

{

static int t=0;

t++;

return(t);

}

typedef listpointer succ_collection[LIMIT];

insert(i,j,succ)

/* Add j into the i^{th}collection in succ. */

int i,j;

succ_collection succ;

{

listpointer pointer,avail();

pointer = avail();

setinfo(pointer,j);

setlink(pointer,succ[i]);

succ[i] = pointer;

}

succinitialization(n,succ)

/* Initializes all n collections of succ to empty. */

int n;

succ_collection succ;

{

int i;

listpointer setnull();

for (i=1;i<=n;i++)

succ[i] = setnull();

}

listpointer access_succ(i,succ)

/* Returns a pointer to the i collection in succ. */

int i;

succ_collection succ;

{

return(succ[i]);

}

typedef struct

{

outputarray rank;

int next;

}rankingrecord;

rankingrecord ranking;

1 allocates storage forranking

copy(n,topologicalsort,ranking)

/* Copies the rank field of ranking

into topologicalsort.

*/

int n;

outputarray topologicalsort;

rankingrecord ranking;

{

int i;

for (i=1;i<=n;i++)

topologicalsort[i] = ranking.rank[i];

}

typedef int bagcollection;

baginitialization(pbag,n,count)

/* Initializes the bag so it contains

only objects whose counts are zero.

*/

bagcollection *pbag;

int n;

countcollection count;

{

int i;

ranking.next = 1;

*pbag = 1;

for(i=1;i<=n;i++)

if(iszero(i,count))

{

ranking.rank[*pbag] = i;

*pbag = *pbag + 1;

}

}

emptybag(bag)

/* Returns true only if bag is empty. */

bagcollection bag;

{

return(bag == ranking.next);

}

remove(pbag,pobj)

/* Sets obj to an object to be removed

from bag, and removes it.

*/

bagcollection *pbag;

int *pobj;

{

*pobj = ranking.rank[ranking.next];

ranking.next = ranking.next + 1;

}

process(listname,recordpointer,pbag,count)

/* Decreases the count of the successor pointed

to by recordpointer and adds it to the bag if

its count has become zero.

*/

listpointer listname,recordpointer;

bagcollection *pbag;

countcollection count;

{

int currentsucc;

currentsucc = info(recordpointer);

decrease(currentsucc,count);

if (iszero(currentsucc,count))

{

ranking.rank[*pbag] = currentsucc;

*pbag = *pbag + 1;

}

}

topsort(pn,topologicalsort)

/* Inputs the number of objects n and the

partial order and generates a solution in

topologicalsort.

*/

int *pn;

outputarray topologicalsort;

{

countcollection count;

succ_collection succ;

bagcollection bag;

int i,j,obj;

printf("\n enter n between 1 and %d\n",LIMIT-1);

scanf("%d",pn);

predinitialization(*pn,count);

succinitialization(*pn,succ);

while (nextpair(&i,&j))

{

increase(j,count);

insert(i,j,succ);

{

baginitialization(&bag,*pn,count);

while(!emptybag(bag))

{

remove(&bag,&obj);

update(succ,obj,&bag,count);

}

copy(*pn,topologicalsort,ranking);

}

nextpair(pi,pj)

int*pi,*pj;

/* Input the next pair and return true

only if it was not the sentinel pair.

*/

{

print("\n enter a pair \n");

scanf("%d %d",pi,pj);

return(!((*pi == 0) && (*pj == 0)));

}

update(succ,obj,pbag,count)

/* Update the counts of all successors of obj

and place any whose counts become zero

into bag.

*/

succ_collection succ;

int obj;

bagcollection *pbag;

countcollection count;

{

listpointer listname,recordpointer,access_succ();

listname = access_succ(obj,succ);

recordpointer = listname;

while(anotherrecord(recordpointer))

{

process(listname,recordpointer,pbag,count);

recordpointer = next(recordpointer);

}

}

We will now analyze the time requirements of this implementation. Refer to the refinement shown in the implementation that used data abstractions on page 503. Reading *n* takes constant time. The count and succ array initializations take time proportional to *n.* The **while** loop read and test takes constant time. Reading the next input pair and tasks 3a and 3b, which involve executing five instructions, take constant time. Each repetition of the loop thus takes constant time. Since the loop is executed once for each of the *m* input pairs, the total loop time will be proportional to *m*. Phase I thus takes some constant time, plus time proportional to *n*, plus time proportional to *m*.

Knuth [1973a] and Wirth [1976] give different implementations of the algorithm for the topological sorting problem. See Aho, Hopcroft, and Ullman [1983] for a somewhat different point of view on its solution.

One final point involving time and storage trade-offs. Suppose an input validation must be added to topsort just prior to the processing of the current *i*, *j* pair read in phase I. The function is to check whether *i*, *j* is a duplicate of an earlier input pair. If so, it is to be ignored; if not, it is to be processed. One way to accomplish this, which requires no additional storage, is to traverse the current list of successors of object *i*. If *j* appears on the list as a successor, then *i*, *j* is a duplicate; otherwise it is not. This takes time proportional to the current number of successors of *i* and thus adds a *total* time to phase I that is proportional to *m*^{2}. Instead, the function can work in constant time if *n*(*n* - 1) additional storage is available. Use the storage for an *n*(*n* - 1) array that is initialized to all zeros. When *i*, *j* is read, simply test the (*i*, *j*)th array entry. If it is zero, the pair is not a duplicate. Then set the entry to 1 to indicate that this *i*, *j* pair has been processed. In this way, an entry of 1 will mean that the pair is a duplicate from now on. This adds *total* time to phase I of *O*(*m*) but requires *n*(*n* - 1) additional storage.

**2**. Which of these binary relations is a partial order on *S*?

**a. **"Square root of" on the set *S* of all real numbers greater than 1

**b.** "Is older than" on the set *S* of all your relatives

**c.** "Sits in front of" on the set *S* of all students in a class

**d. **"Lives diagonally across from" on the set *S* of all people

**3.** Write down the graphic representation for the binary relation *R* on the first 10 integers. *R* = (3, 2), (4, 6), (7, 6), (8, 1), (9, 7), (9, 8).

**4.** Write a detailed modification of the algorithm of Section 11.2 that will check if a given ranking is consistent with the given partial order.

**5.** Write a program to implement the refinement of Exercise 4 and determine its worst-case time and storage requirements.

**6.** Apply the construction-based algorithm of Section 11.3 to the following data to obtain a consistent ranking. Always select the smallest integer for output.

**7.** If the second implementation of the construction-based algorithm is applied to the input of Exercise 6, what integers are in the bag after 3 is removed?

**8.** State clearly and concisely what purpose the bag serves in the better "bag" refinement and why it was introduced.

**9.** Write an expanded version of the better refinement that will output integers from the bag by selecting the smallest possible integer first.

**10.** What will the succ, lists and count arrays have in them after phase I, when the input pairs of the running example appear in the order 7 6, 5 3, 2 10, 6 8, 5 6, 7 3, 4 8, 7 1, 4 2?

**11.** What will the count, succ, and lists arrays look like after phase II as compared to after phase I?

**12.** Suppose the better implementation were modified so that no count array were stored in memory, and only the succ and lists arrays were available after phase I. Write a new phase II refinement under this constraint.

**13.** Suppose a solution to the topological sort problem had been constructed by determining what object should be at the bottom of the ranking. How would the better refinement be changed to reflect this new solution?

**14. a.** Suppose the bag is implemented in a separate record bag instead of using the ranking record. How must the data abstraction implementations change?

**15.** The function process is itself dependent on the ranking and bag implementations. Modify it so that process becomes independent of these implementations.

**16.** Suppose task 5b of the better refinement is further expanded to:

5b.i. For each successor of the output object, decrease its predecessor count by 1.

**c. **ow will this expansion of task 5b affect the time required for the algorithm?

**17.** Suppose the input for Exercise 10 had the pairs below appended after 4 2. What would the graphic depictions of the successor lists be after phase I of topsort?

7 3, 5 6, 4 2, 7 3, 6 8

**18.** For the input of Exercise 17, what would be in the count array after phase I?

**19.** For the input of Exercise 17, show what will be in the count array, the successor lists, and the bag after 7 is output.

**20.** Take as input the Example 11.6 with pairs 10 4 and 10 1 appended after 4 2. What will the second implementation of topsort output in the rank array? What will be the value of next after phase II?

**1.** Write and run a program whose input will be a series of topological sorting problems. For each problem the program should execute a slight modification of the topsort function called newtopsort. Newtopsort has parameters n and topologicalsort. Newtopsort is to read in and echo print n and all input pairs for the example it has been called to work on. It should print out the relevant portions of the count, succ and lists arrays as they appear after phase I. When newtopsort returns, the program prints the topological sort contained in the topologicalsort array. Except for the modifications needed to do the printing, newtopsort corresponds exactly to the sample implementation of topsort, with one exception: Implement the bag as an array separate from rank. Newtopsort should work correctly for valid input and for any number of objects between 1 and 20. You should make up four input sample problems to use. Two should be valid. One should include replications, and one should contain loops.

**2. **Suppose, instead of using the sample implementation, you do the following to keep successor information: Declare lists to be an *n*(*n* - 1) two-dimensional array and keep the successors of object *i* in the *i*th row of lists. For the example, after phase I, lists would be This two-dimensional lists array takes up the same total amount of storage as the one-dimensional lists array of the better implementation. For both implementations, the limiting factor on whether or not the program will run might then be the value of *n* beyond which available storage is exhausted. Suppose you are willing to give up the guarantee that topsort will always run correctly. This means you may attempt to run problems whose *n* is large enough that you are not sure whether the implementation of lists can hold all successors. Discuss the relative advantages of the implementations.

**3.** This is the same as Assignment 1 except, instead of the lists array, use dynamic memory to store the successor lists.

**4. **Assume your program for Assignment 1 treated the bag with the operations empty-bag, remove, and baginitialization and count with the operations predinitialization and increase as data abstractions.