SECTION A: RANDOM PROCESSES

The set of values of r is not complete, but there are many of them and no obvious gaps.

The values in the range of r are of approximately equal likelihood.

A great deal of work has been done on developing programs to produce pseudorandom numbers [KNUTH, 1973], and no programmer should be without access to such programs. In some of the exercises a function Rand, available to provide a pseudorandom number on demand, is assumed. Also assumed is a mechanism for reusing the same value or sequence of values produced by it. This assumption is convenient for debugging purposes to isolate the effect of a change in a program, and it provides a way to compare the effect of more than one procedure on the same data. If no other way is available, successive values of Rand may be stored in an array or in a file.

In some languages, notably standard Pascal, the programmer might need to provide a pseudorandom number generator. The subject of generating pseudorandom numbers is too large to pursue here, but one popular algorithm is illustrated in Pascal:

`FUNCTION Random (VAR Seed : INTEGER) : REAL;`

`CONST  Modulus = {65536};`

`Multiplier = {25173};`

`Increment = {13849};`

`BEGIN`

`Seed  : = ((Multiplier * Seed) + Increment) MOD Modulus;`

`Random : = Seed/Modulus`

`END {Random};`

The values of the constants given here work well on a DEC VAX and other machines, but they will not work well in all cases. For example, for a PDP 11/70, Seed is declared externally by

`VAR Seed : 0 . . 65535;`

and the constants become:

`Modulus = 32768;`

`Multiplier = 13077;`

`Increment = 6925;`

The function Rand only produces values between 0 and 1. That does not explicitly allow, for example, the random choice of an index for an array declared as a[ - 2 . . 3]. The requisite mapping can be derived with the following steps:

The expression x = D Rand has values in the range 0 < x < D when 0 < Rand < 1, and D > 0.

The expression x =L + D Rand has values in the range L < x < L + D.

The expression x = L + (T - L) Rand has values in the range L < x < T.

The expression INT(L + (T - L) Rand) takes on integer values L, L + 1, . . . , T - 1 when INT is a function that returns the largest integer less than or equal to its argument.

Consequently, with T = U + 1, INT(L + (U - L + 1) Rand) has possible values L, L + 1, . . . , U. In particular, INT(- 2 + 6 Rand will randomly provide in-bounds indices for the array a[ - 2 . . 3].

A set of 10 values, randomly chosen in the range [- 30 . . 30], may be provided in the array e used in the sorting routines in section 1.5 by:

`procedure ThirtyThirty                       {O(1)`

`for i = 1 to 10 do`

`e[i]  - 30 + 61 * Rand`

`next i`

`end {ThirtyThirty`

It is not feasible to explore hypothesis testing, confidence intervals, analysis of variance, and so on in this text. That material belongs in a course in statistics. However, the applicability of Rand can be extended enormously by a few additional tools. Some acquaintance with the ideas of probability and statistics is helpful in reading the subsections that follow.

Program PGA.1 is appropriate at this point.

A.1 Mean and Standard Deviation

These measures provide a convenient and significant way to reduce a collection of measurements to two values, but intelligent interpretation then requires knowledge of statistics.

A.2 Frequency Distributions

Figure A.1

Now suppose that we wish to write a program to process data that have the distribution model derived from the sample. We may provide a random variate, Line, for test data by the following:

Choose a value with Rand, uniformly distributed between 0 and 1. If PDF[w - 1] < Rand PDF[w], then Line w.

Geometrically, this amounts to drawing the horizontal line from a point on the vertical PDF axis to where it first strikes the stairstep graph and projecting down to a value w. The distribution of Line is that of the pmf (from the histogram), and points may be drawn from this distribution as required.

Figure A.2

Values that have the pmf of any discrete PDF may be generated from a table like Figure A.2 in this way, which is called discrete inversion.

Exercise EA.1 and Problem PA.1 are appropriate at this point.

A.3 Normal Distributions

The most-used probability distribution is not discrete as are those discussed in section A.1 but is the normal distribution:

{Rand is called n times.

In particular, this equation is simplified if n is taken to be 12, in which case:

The z-scores generated in this way can be mapped to a value from a normal distribution by:

`x =  + z`

A function that returns a value from a normal distribution is:

`function Norm(Mu,Sigma)             {O(Rand)`

`z  0`

`for i = 1 to 12 do`

`z  z + Rand`

`next i`

`z  z - 6`

`Norm  Mu + Sigma * z`

`end {Norm`

Note: Some caution must be applied to normal distributions because extreme values can occur, whatever and may be. Sometimes these values are not physically meaningful: the distribution being modeled differs from a normal distribution by not having them. They are "out of bounds" and must be dealt with by checks within the program.

Program PGA.2 is appropriate at this point.

A.4 Exponential Distributions

where LOG is the logarithm to the base e (2.718 . . . ). The distribution of values of ia has both mean and standard deviation equal to 1/.

Exercises

Exercises immediate applications of the text material

Problems

Problems not immediate, but requiring no major extensions of the text material

`time       1  2  3  4  5  6  7  8  9  10`

`----------------------------------------`

`number`

`of         0  0  0  2  1  5  8  9  7   4`

`solutions`

Plot the (relative) frequency distribution of this data. Assume that this represents a pmf, and plot the corresponding PDF. What times correspond to the following possible values of Rand?

(a) 0.007134 (c) 0.567321 (e) 0.410022

(b) 0.916528 (d) 0.757505

Programs

Programs for trying it out yourself. A program, when written for an interactive system, should display instructions that explain to the user what it does and how to use it. It should provide a graceful way for the user to quit.

In a time-sharing system it may not be possible to determine execution times themselves, but it is possible to count statement executions, including those that result from data-determined, conditional branches.

PGA.2 Do PGA.1, except that a normal distribution is to be used for data values, with mean 10 and standard deviation 10. For contrast, also use data already in order and in reverse order.

Projects

Projects for fun; for serious students

PJA.1 Extend PGA.1 and PGA.2 to a serious study of SelectionSort, BubbleSort, and InsertionSort for values of n = 8, 16, 32, 64, 128, 512, and 2048. Use uniform, normal, and exponential variates, in-order and reverse-order data sets, and data sets that are somewhat disarranged from in-order in a variety of ways. For example, choose two indices at random from an in-order sequence and switch the values they point to prior to invoking the sort procedure being studied. Calculate means and standard deviations of execution time. If you have statistics in your background, test the hypothesis that the three sorts have equal execution times (under various circumstances).