8.6: Simulation of an Algorithm

Array (b) has been obtained from array (a), which is sorted, by making two interchanges. These were interchanging a1[2] with al[6], and interchanging a1[4] with a1[9] . These two interchanges represent 20 percent of n interchanges, since n is 10. A nearly sorted input for a of size n is obtained by making 20 percent of n such interchanges in an initially sorted array. The interchanges are to be selected at random. This means that each interchange is obtained by picking a value for i at random, so that each of the locations 1 to n of sa is equally likely to be selected. An integer j may be similarly obtained. Then sa[i] and sa[j] are interchanged. The only details still left to implement in the simulation algorithm are how to generate the required random integers to be placed in a and used for interchange locations, how to update the statistics tables, and how to finalize the tables before printing them.

Figure 8.19 Array (b) Obtained from Sorted Array (a) by Two Interchanges

Assume that a function rng is available that returns a real number as its value. This real number will be greater than 0.0 and less than 1.0. Whenever a sequence of these numbers is generated by consecutive calls to rng, the numbers appear, from a statistical point of view, to have been selected independently. Also assume that the returned value will lie in a segment of length l between 0 and 1, with probability l. For example, if l is of length 0.25, then the returned value will lie in that interval (say, between 0.33 and 0.58) 25 percent of the time. The random number generator ranf of FORTRAN is an example of such a function.

To generate an integer between 1 and 1,000, take the value rng returns, multiply it by 1,000, take the integer part of the result, and add 1 to it. For example, if the value returned is 0.2134, multiplying by 1,000 gives 213.4. Its integer part is 213; adding 1 yields 214. In fact, if any number strictly between 0.213 and 0.214 had been returned, 214 would have been generated. Hence 214 would be generated with probability 1/1,000, the length of the interval from 0.213 to 0.214. The same is true for each integer between 1 and 1,000. Carrying out this procedure n times will fill the array a to generate a "random" input (See Exercise 23 for a more exact approach). An alternative method in C to generate an integer at random between integers low and high uses the library function rand. Evaluating rand()%(high-low+1) + low yields the desired integer between low and high.

To generate a nearly sorted or nearly reverse sorted input, select each of the 20 percent of n interchanges as follows. Call rng twice. Multiply the two returned values by n, take the integer part, and add 1. Or evaluate rand()%(high-low+1) + low twice with high set to n and low set to 1. Either method gives two integers, i and j, between 1 and n. Interchange sa[i] with sa[j].

The statistics tables can keep current running minimums, maximums, and the accumulated sum of the appropriate statistic. When each call to treesort returns, the minimum, maximum, and accumulated sum may be updated for each statistic, int1,int2,comp1, and comp2. When the inner loop of the algorithm is exited, and task d is to be carried out, the statistics tables are ready to be printed, except for the twelve averages. They must be finalized in task d by dividing the accumulated sums by 25.

The algorithm presented in this section can be used directly to obtain simulation results for treesort. Simply replacing the call to treesort in task (i) by a call to suitably modified versions of the other sort programs will produce their simulation results as well. To use simulation in the analysis of other programs, not necessarily sorts, requires some insight into what input is appropriate to generate and what statistics are meaningful to collect. This is not always so straightforward as in our example, but the general procedure and goal for the simulation of an algorithm should now be clear.