11.4: Analysis of Topsort

We will now analyze the time requirements of this implementation. Refer to the refinement shown in the implementation that used data abstractions on page 503. Reading n takes constant time. The count and succ array initializations take time proportional to n. The while loop read and test takes constant time. Reading the next input pair and tasks 3a and 3b, which involve executing five instructions, take constant time. Each repetition of the loop thus takes constant time. Since the loop is executed once for each of the m input pairs, the total loop time will be proportional to m. Phase I thus takes some constant time, plus time proportional to n, plus time proportional to m.

Initializing the bag at the start of phase II takes time proportional to n, since it involves traversing the count array and placing each object whose count is zero into the bag. The while loop body of phase II, unlike that of phase I, does not take the same amount of time on every repetition. Task 5a takes constant time, but task 5b depends on the number of successors of the object that was removed by task 5a. The loop itself is executed n times, once for each object output.

How can we determine the total loop time required? Notice that each of the n objects will eventually be output in some loop execution. We do not know in what order the objects will be output. However, the time taken in its particular loop execution for each object output is the same. This time is at most a constant plus time proportional to the number of successors of the object. Thus, the total time for the while loop in phase II is the sum of the time required to output each object. The total time is then a constant times n plus time proportional to the sum

[(number of successors of object 1) plus (number of successors of object 2) . . . plus the (number of successors of object n)]

This sum is just the total number of successors, m. Hence the total time for phase II is made up of the same kinds of components as the total time for phase I. We conclude that the total time for this implementation has the same form.

Note that the search-based algorithm could take time O(n!). The straight-forward implementation of the construction-based algorithm could take time O(n ?m). Topsort takes time of the form c1 + c2n + c3m and represents a very substantial improvement. It was made possible because we were able to determine precisely, and to obtain efficiently, the information required to do the processing steps of the algorithm. The topological sort problem is more complex than earlier ones in this book, but it illustrates the same theme. The processing to be done determines the data structures that are most appropriate. Clearly, any implementation of an algorithm that does topological sorting must take time of the form an + bm, since it is necessary to read in all m input pairs and output all n objects.

How much storage does the implementation require? Suppose the program is to run correctly for values of n between 1 and 20. The only difficulty in determining the actual amount of storage required is in deciding what length to declare for lists. This depends on how large m may be. In general, if there are n objects, each object can have no more than n - 1 successors?/FONT>that is, an arrow to every other object. There are then at most n(n - 1) possible successors. Each successor requires one record of the lists array. Actually, because of the asymmetry property, at most one-half of the n(n - 1) possibilities can appear. Hence we need a total of at most 1/2 ?n(n - 1) records for lists. If n is 20, then (20 ?19)/2 records will do. The storage required is proportional to n2.

Knuth [1973a] and Wirth [1976] give different implementations of the algorithm for the topological sorting problem. See Aho, Hopcroft, and Ullman [1983] for a somewhat different point of view on its solution.