The field of *computer science* is so new that one feels obliged to furnish a definition before proceeding with this book. One often quoted definition views computer science as the *study of algorithms*. This study encompasses four distinct areas:

(i) *input:* there are zero or more quantities which are externally supplied;

(ii) *output:* at least one quantity is produced;

(iii) *definiteness:* each instruction must be clear and unambiguous;

An algorithm can be described in many ways. A natural language such as English can be used but we must be very careful that the resulting instructions are definite (condition iii). An improvement over English is to couple its use with a graphical form of notation such as flowcharts. This form places each processing step in a "box" and uses arrows to indicate the next step. Different shaped boxes stand for different kinds of operations. All this can be seen in figure 1.1 where a flowchart is given for obtaining a Coca-Cola from a vending machine. The point is that algorithms can be devised for many common activities.

Returning to our earlier definition of computer science, we find it extremely unsatisfying as it gives us no insight as to why the computer is revolutionizing our society nor why it has made us re-examine certain basic assumptions about our own role in the universe. While this may be an unrealistic demand on a definition even from a technical point of view it is unsatisfying. The definition places great emphasis on the concept of algorithm, but never mentions the word "data". If a computer is merely a means to an end, then the means may be an algorithm but the end is the transformation of data. That is why we often hear a computer referred to as a data processing machine. Raw data is input and algorithms are used to transform it into refined data. So, instead of saying that computer science is the study of algorithms, alternatively, we might say that computer science is the *study of data*:

(ii) languages for describing data manipulation;

(iii) foundations which describe what kinds of refined data can be produced from raw data;

(iv) structures for representing data.

Therefore, computer science can be defined as the study of data, its representation and transformation by a digital computer. The goal of this book is to explore many different kinds of data objects. For each object, we consider the class of operations to be performed and then the way to represent this object so that these operations may be efficiently carried out. This implies a mastery of two techniques: the ability to devise alternative forms of data representation, and the ability to analyze the algorithm which operates on that structure . The pedagogical style we have chosen is to consider problems which have arisen often in computer applications. For each problem we will specify the data object or objects and what is to be accomplished. After we have decided upon a representation of the objects, we will give a complete algorithm and analyze its computing time. After reading through several of these examples you should be confident enough to try one on your own.

There are several terms we need to define carefully before we proceed. These include data structure, data object, data type and data representation. These four terms have no standard meaning in computer science circles, and they are often used interchangeably.

To be more precise lets examine a modest example. Suppose we want to define the data structure natural number (abbreviated natno) where natno = {0,1,2,3, ...} with the three operations being a test for zero addition and equality. The following notation can be used:

structureNATNO

1declareZERO( )natno

2ISZERO(natno)boolean

3SUCC(natno)natno

4ADD(natno, natno)natno

5EQ(natno, natno)boolean

6for allx, ynatnolet

7ISZERO(ZERO) ::=true;ISZERO(SUCC(x)) ::=false

8ADD(ZERO, y) :: =y, ADD(SUCC(x),y) :: =

SUCC(ADD(x, y))

9EQ(x, ZERO) :: =ifISZERO(x)then true else false

10EQ(ZERO, SUCC(y)) :: =false

EQ(SUCC(x),SUCC(y)) :: =EQ(x,y)

11end

endNATNO

ADD(SUCC(SUCC(ZERO)),SUCC(SUCC(SUCC(ZERO))))

SUCC(ADD(SUCC(ZERO),SUCC(SUCC(SUCC(ZERO)))))

SUCC(SUCC(ADD(ZERO,SUCC(SUCC(SUCC(ZERO))))))

SUCC(SUCC(SUCC(SUCC(SUCC(ZERO)))))

**Definition:** A *data* *structure* is a set of domains , a designated domain , a set of functions and a set of axioms . The triple denotes the data structure *d *and it will usually be abbreviated by writing *d*.

An *implementation of a data structure d* is a mapping from *d* to a set of other data structures *e*. This mapping specifies how every object of *d* is to be represented by the objects of *e*. Secondly, it requires that every function of *d* must be written using the functions of the implementing data structures *e*. Thus we say that integers are represented by bit strings, boolean is represented by zero and one, an array is represented by a set of consecutive words in memory.

The choice of an algorithm description language must be carefully made because it plays such an important role throughout the book. We might begin by considering using some existing language; some names which come immediately to mind are ALGOL, ALGOL-W, APL, COBOL, FORTRAN, LISP, PASCAL, PL/I, SNOBOL.

**S**tructured **P**rogramming: **A** **R**easonably **K**omplete **S**et

**S**mart **P**rogrammers **A**re **R**equired To **K**now **S**PARKS.

In order to produce these values, the logical operators

are provided, plus the relational operators

A conditional statement has the form

ifcondthenS_{1 }ifcondthenS_{1}

or

elseS_{2}

To accomplish iteration, several statements are available. One of them is

whileconddo

S

end

where *cond* is as before, *S* is as *S*_{1} before and the meaning is given by

It is well known that all "proper" programs can be written using only the assignment, conditional and while statements. This result was obtained by Bohm and Jacopini. Though this is very interesting from a theoretical viewpoint, we should not take it to mean that this is the way to program. On the contrary, the more expressive our languages are, the more we can accomplish easily. So we will provide other statements such as a second iteration statement, the **repeat-until**,

repeat

S

untilcond

loop

S

forever

As it stands, this describes an infinite loop! However, we assume that this statement is used in conjunction with some test within *S *which will cause an exit. One way of exiting such a loop is by using a

loop

S_{1}

ifcondthen exit

S_{2}

forever

The last statement for iteration is called the **for**-**loop**, which has the form

forvblestarttofinishbyincrementdo

S

end

vblestart

finfinish

incrincrement

while(vble-fin) *incr0do

S

vblevble+incr

end

Another statement within SPARKS is the **case,** which allows one to distinguish easily between several alternatives without using multiple **if**-then**-else** statements. It has the form

A complete SPARKS procedure has the form

procedureNAME (parameter list)

S

end

A procedure can be used as a function by using the statement

For input/output we assume the existence of two functions

**read** (*argument list*), **print** (*argument list*)

Since most of the SPARKS programs will be read many more times than they will be executed, we have tried to make the code readable. This is a goal which should be aimed at by everyone who writes programs. The SPARKS language is rich enough so that one can create a good looking program by applying some simple rules of style.

(i) Every procedure should carefully specify its input and output variables.

(ii) The meaning of variables should be defined.

(v) Documentation should be short, but meaningful. Avoid sentences like ''*i* is increased by one."

(vi) Use subroutines where appropriate.

(v) *Verification*. Verification consists of three distinct aspects: program proving, testing and debugging. Each of these is an art in itself. Before executing your program you should attempt to prove it is correct. Proofs about programs are really no different from any other kinds of proofs, only the subject matter is different. If a correct proof can be obtained, then one is assured that for all possible combinations of inputs, the program and its specification agree. Testing is the art of creating sample data upon which to run your program. If the program fails to respond correctly then debugging is needed to determine what went wrong and how to correct it. One proof tells us more than any finite amount of testing, but proofs can be hard to obtain. Many times during the proving process errors are discovered in the code. The proof can't be completed until these are changed. This is another use of program proving, namely as a methodology for discovering errors. Finally there may be tools available at your computing center to aid in the testing process. One such tool instruments your source code and then tells you for every data set: (i) the number of times a statement was executed, (ii) the number of times a branch was taken, (iii) the smallest and largest values of all variables. As a minimal requirement, the test data you construct should force every statement to execute and every condition to assume the value true and false at least once.

The previous discussion applies to the construction of a single procedure as well as to the writing of a large software system. Let us concentrate for a while on the question of developing a single procedure which solves a specific task. This shifts our emphasis away from the management and integration of the various procedures to the disciplined formulation of a single, reasonably small and well-defined task. The design process consists essentially of taking a proposed solution and successively refining it until an executable program is achieved. The initial solution may be expressed in English or some form of mathematical notation. At this level the formulation is said to be abstract because it contains no details regarding how the objects will be represented and manipulated in a computer. If possible the designer attempts to partition the solution into logical subtasks. Each subtask is similarly decomposed until all tasks are expressed within a programming language. This method of design is called the *top*-*down* approach. Inversely, the designer might choose to solve different parts of the problem directly in his programming language and then combine these pieces into a complete program. This is referred to as the *bottom*-*up* approach. Experience suggests that the top-down approach should be followed when creating a program. However, in practice it is not necessary to unswervingly follow the method. A look ahead to problems which may arise later is often useful.

Suppose we devise a program for sorting a set of *n* 1 distinct integers. One of the simplest solutions is given by the following

"from those integers which remain unsorted, find the smallest and place it next in the sorted list"

fori1tondo

examine A(i)to A(n)and suppose the

smallest integer is at A(j); then

interchange A(i)and A(j).

end

procedureSORT(A,n)

1fori1tondo

2ji

3forkj+ 1ntodo

4ifA(k) <A(j)thenjk

5end

6tA(i);A(i)A(j);A(j)t

7end

endSORT

The obvious question to ask at this point is: "does this program work correctly?"

IF (N. LE. 1) GO TO 100

NM1 = N - 1

DO 101 I = 1, NM1

J = I

JP1 = J + 1

DO 102 K = JP1, N

IF (A(K).LT.A(J)) J = K

102 CONTINUE

T = A(I)

A(I) = A(J)

A(J) = T

101 CONTINUE

100 CONTINUE

At this point you might try the method out on some sample numbers. This method is referred to as *binary search*. Note how at each stage the number of elements in the remaining set is decreased by about one half. We can now attempt a version using SPARKS pseudo code.

procedureBINSRCH(A,n,x,j)

initialize lower and upper

whilethere are more elements to checkdo

let A(mid)be the middle element

case

: x > A(mid): set lower to mid +1

: x < A(mid): set upper to mid-1

:else:found

end

end

not found

endBINSRCH

The above is not the only way we might write this program. For instance we could replace the **while** loop by a **repeat-until** statement with the same English condition. In fact there are at least six different binary search programs that can be produced which are all correct. There are many more that we might produce which would be incorrect. Part of the freedom comes from the initialization step. Whichever version we choose, we must be sure we understand the relationships between the variables. Below is one complete version.

procedureBINSRCH(A,n,x,j)

1lower1;uppern

2whilelowerupperdo

3mid(lower + upper) / 2

4case

5 :x>A(mid):lowermid+ 1

6 :x<A(mid):uppermid- 1

7 :else:jmid;return

8end

9end

10j0

end

*lower * upper * and *A

This, combined with the above assertion implies that *x* is not present.

We have tried to emphasize the need to structure a program to make it easier to achieve the goals of readability and correctness. Actually one of the most useful syntactical features for accomplishing this is the procedure. Given a set of instructions which perform a logical operation, perhaps a very complex and long operation, they can be grouped together as a procedure. The procedure name and its parameters are viewed as a new instruction which can be used in other programs. Given the input-output specifications of a procedure, we don't even have to know how the task is accomplished, only that it is available. This view of the procedure implies that it is invoked, executed and returns control to the appropriate place in the calling procedure. What this fails to stress is the fact that procedures may call themselves (direct recursion) before they are done or they may call other procedures which again invoke the calling procedure (indirect recursion). These recursive mechanisms are extremely powerful, but even more importantly, many times they can express an otherwise complex process very clearly. For these reasons we introduce recursion here.

Most students of computer science view recursion as a somewhat mystical technique which only is useful for some very special class of problems (such as computing factorials or Ackermann's function). This is unfortunate because any program that can be written using assignment, the **if-then-else** statement and the **while** statement can also be written using assignment, **if-then-else** and recursion. Of course, this does not say that the resulting program will necessarily be easier to understand. However, there are many instances when this will be the case. When is recursion an appropriate mechanism for algorithm exposition? One instance is when the problem itself is recursively defined. Factorial fits this category, also binomial coefficients where

can be recursively computed by the formula

Another example is reversing a character string, *S = 'x*_{1}* ... x _{n}'* where SUBSTRING (

procedureREVERSE(S)

nLENGTH(S)

ifn= 1then return(S)

else return(REVERSE(SUBSTRING(S,2,n))

SUBSTRING(S,1,1))

endREVERSE

(i) *a *followed by all permutations of (*b,c,d*)

(ii)* b* followed by all permutations of (*a,c,d*)

(iii)* c* followed by all permutations of (*b,a,d*)

(iv)* d* followed by all permutations of (*b,c,a*)

The expression "followed by all permutations" is the clue to recursion. It implies that we can solve the problem for a set with *n* elements if we had an algorithm which worked on *n* - 1 elements. These considerations lead to the following procedure which is invoked by **call** PERM(A,*1,n*). A is a character string e.g. A ='abcd', and INTERCHANGE (A,k,i) exchanges the k-th character of A with the i-th character of A.

procedurePERM(A,k,n)

ifk = nthen[A);return]

B A

forikntodo

callINTERCHANGE(A,k,i)

callPERM(A,k + 1,n)

A B

end

endPERM

procedureSORT(A,n)

i1

Ll:ifin- 1 // fori1 ton- 1 do//

then[ji;kj+ 1

L2:ifkn//forkj+ 1 tondo//

then[ifA(k) <A(j)

thenjk

kk+ 1;go toL2]

tA(i);A(i)A(j);A(j)t

ii+ 1;go toL1]

endSORT

procedureSORT(A,n)

callSORTL1(A,n,1)

endSORT

procedureSORTLl(A,n,i)

ifin- 1

then[ji;callMAXL2(A,n,j,i+1)

tA(i);A(i)A(j);A(j)t

callSORTL1(A,n,i+ 1)]

endSORTL1

procedureMAXL2(A,n,j,k)

ifkn

then[ifA(k) <A(j)thenjk

callMAXL2(A,n,j,k+ 1)]

endMAXL2

Now let us trace the action of these procedures as they sort a set of five integers

One goal of this book is to develop skills for making evaluative judgements about programs. There are many criteria upon which we can judge a program, for instance:

(i) Does it do what we want it to do?

(ii) Does it work correctly according to the original specifications of the task?

(iii) Is there documentation which describes how to use it and how it works?

(iv) Are subroutines created in such a way that they perform logical sub-functions?

First consider a priori estimation. Suppose that somewhere in one of your programs is the statement

(i) the machine we are executing on:

(ii) its machine language instruction set;

(iii) the time required by each machine instruction;

(iv) the translation a compiler will make from the source to the machine language.

Consider the three examples of Figure 1.4 below.

.fori1tondo

.fori1tondo

.forj1tondo

xx+ lxx+ 1

.xx+ 1

.end

.end

end

(a) (b) (c)

In program (a) we assume that the statement *x* *x* + 1 is not contained within any loop either explicit or implicit. Then its frequency count is one. In program (b) the same statement will be executed *n *times and in program (c) *n*^{2} times (assuming *n* 1). Now 1, *n*, and *n*^{2} are said to be different and increasing orders of magnitude just like 1, 10, 100 would be if we let *n* = 10. In our analysis of execution we will be concerned chiefly with determining the order of magnitude of an algorithm. This means determining those statements which may have the greatest frequency count.

To determine the order of magnitude, formulas such as

often occur. In the program segment of figure 1.4(c) the statement *x* *x* + 1 is executed

Simple forms for the above three formulas are well known, namely,

To clarify some of these ideas, let us look at a simple program for computing the *n*-th Fibonacci number. The Fibonacci sequence starts as

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...

Each new term is obtained by taking the sum of the two previous terms. If we call the first term of the sequence F_{0 }then F_{0 }= 0, F_{1} = 1 and in general

The program on the following page takes any non-negative integer *n* and prints the value F* _{n}*.

1procedureFIBONACCI

2read(n)

3-4ifn< 0then[error');stop]

5-6ifn =0then[stop]

7-8ifn= 1then[stop]

9fnm20;fnm1 1

10fori2tondo

11fnfnm1 +fnm2

12fnm2fnm1

13fnm1fn

14end

15fn)

16endFIBONACCI

Step n< 0n= 0n= 1

--------------------------

1 1 1 1

2 1 1 1

3 1 1 1

4 1 0 0

5 0 1 1

6 0 1 0

7 0 0 1

8 0 0 1

9-15 0 0 0

Step Frequency Step Frequency

--------------------------------

1 1 9 2

2 1 10 n

3 1 11 n-1

4 0 12 n-1

5 1 13 n-1

6 0 14 n-1

7 1 15 1

8 0 16 1

Each statement is counted once, so step 9 has 2 statements and is executed once for a total of 2. Clearly, the actual time taken by each statement will vary. The **for** statement is really a combination of several statements, but we will count it as one. The total count then is 5*n* + 5. We will often write this as *O*(*n*), ignoring the two constants 5. This notation means that the order of magnitude is proportional to *n*.

n 10n n/2^{2}

-----------------

1 10 1/2

5 50 12-1/2

10 100 50

15 150 112-1/2

20 200 200

25 250 312-1/2

30 300 450

*P(n) = c _{k}n^{k} + c_{k-}_{1} n^{k}*-

loglog_{2}n n n2_{2}n n^{2 }n^{3 }^{n}

^{--------------------------------------------------------}

0 1 0 1 1 2

1 2 2 4 8 4

2 4 8 16 64 16

3 8 24 64 512 256

4 16 64 256 4096 65536

5 32 160 1024 32768 2, 147, 483, 648

We end this chapter with a problem from recreational mathematics which uses many of the SPARKS features that have been discussed. A magic square is an *n* x *n* matrix of the integers 1 to *n*^{2} such that the sum of every row, column and diagonal is the same. For example, if n = 5 we have

15 8 1 24 17

16 14 7 5 23

22 20 13 6 4

3 21 19 12 10

9 2 25 18 11

procedureMAGIC(square, n)

//fornodd create a magic square which is declared as an array//

//square(0:n- 1, 0:n- 1)//

//(i,j) is a square position. 2 keyn^{2}is integer valued.//

ifn is eventhen [print('input error');stop]

SQUARE0

square(0,(n - 1)/2) 1; //store 1 in middle of first row//

key2;i0;j(n- 1)/2 //i,jare current position//

whilekeyn^{2}do

(k,l) ((i- 1)modn, (j- 1)modn) //look up and left//

ifsquare(k,l) 0

theni(i+ 1)modn//square occupied, move down//

else(i,j) (k,l) //square (k,l) needs to be assigned//

square(i,j)key//assign it a value//

keykey+ 1

end

n, square) //output result//

endMAGIC

For a discussion of algorithms and how to analyze them see

For a discussion of good programming techniques see

*Structured Programming* by O. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, Academic Press, 1972.

*The Elements of Programming Style* by B. W. Kernighan and P. J. Plauger, McGraw-Hill, 1974.

*ACM Computing Surveys,* Special Issue: Programming, vol. 6, no. 4, December, 1974.

For a discussion of tools and procedures for developing very large software systems see

For a discussion of the more abstract formulation of data structures see

For a further discussion of program proving see

**1. **Look up the word *algorithm* or its older form *algorism* in the dictionary.

**2. **Consider the two statements: (i) Is *n* = 2 the largest value of *n* for which there exists positive integers *x, y *and *z* such that *x ^{n} *+

**3. **Describe the flowchart in figure 1.1 by using a combination of SPARKS and English. Can you do this without using the **go to**? Now make it into an algorithm.

**4.** Discuss how you would actually represent the list of name and telephone number pairs in a real machine. How would you handle people with the same last name.

**5. **Write FORTRAN equivalents of the **while, repeat-until, loop-forever** and **for** statements of SPARKS.

**6. **Can you think of a clever meaning for S.P.A.R.K.S.? Concentrate on the letter *K* first.

**7. **Determine the frequency counts for all statements in the following two SPARKS program segments:

1fori1ton1i1

2forjltoi2whileindo

3fork1toj3xx+ 1

4xx+ 1 4ii+1

5end5end

6end

7end

(a) (b)

**8.** Horner's Rule is a means for evaluating a polynomial *A*(*x*) = *a _{n}x^{n }*+

*A*(*x*) = (... ((*a _{n}*.

**9.** Given *n* boolean variables *x*_{1},..., *x _{n}* we wish to print all possible combinations of truth values they can assume. For instance, if

**10.** Compare the two functions *n*^{2} and 2* ^{n}*/4 for various values of

**11.** Write a SPARKS program which prints out the integer values of *x*, *y*, *z* in nondecreasing order. What is the computing time of your method?

**12.** Write a SPARKS procedure which searches an array *A* (1: *n*) for the element *x*. If *x* occurs, then set *j* to its position in the array else set *j* to zero. Try writing this without using the **go to** statement.

**13.** One useful facility we might add to SPARKS is the ability to manipulate character strings. If *x*, *y* are variables of type character, then we might like to implement the procedures:

Implement these procedures using the array facility.

**14.** Write a SPARKS procedure which is given an argument STRING, whose value is a character string of length *n*. Copy STRING into the variable FILE so that every sequence of blanks is reduced to a single blank. The last character of STRING is nonblank.

**15.** Design a program that counts the number of occurrences of each character in the string STRING of length *n*. Represent your answer in the array ANS(1:*k*,1:2) where ANS(*i*,l) is the *i*-th character and ANS(*i*,2) is the number of times it occurs in STRING.

**16.** Trace the action of the procedure below on the elements 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 searching for l, 3, 13 and 21.

i1;jn

repeatk(i+j)/2

ifA(k)xthenik+ 1

elsejk- 1

untili>j

What is the computing time for this segment in terms of *n*?

**18.** List as many rules of style in programming that you can think of that you would be willing to follow yourself.

**19.** Using the notation introduced at the end of section 1.1, define the structure Boolean with operations AND, OR, NOT, IMP and EQV (equivalent) using only the **if-then-else** statement. e.g. NOT (*X*) :: = **if ****X**** then false else true.**

**20.** Give a version of a binary search procedure which initializes lower to zero and upper to *n* + l.

**21.** Take any version of binary search, express it using assignment, **if-then-else** and **go to** and then give an equivalent recursive program.

**22.** Analyze the computing time of procedure SORT as given in section 1.3.

**23.** Write a recursive procedure for computing the binomial coefficient as defined in section 1.3 where . Analyze the time and space requirements of your algorithm.

**24.** Ackermann's function *A*(*m*,*n*) is defined as follows:

**25.** (Tower of Hanoi) There are three towers and sixty four disks of different diameters placed on the first tower. The disks are in order of decreasing diameter as one scans up the tower. Monks were reputedly supposed to move the disks from tower 1 to tower 3 obeying the rules: (i) only one disk can be moved at any time; (ii) no disk can be placed on top of a disk with smaller diameter. Write a recursive procedure which prints the sequence of moves which accomplish this task.

**26.** Write an equivalent recursive version of procedure MAGIC as given in section 1.4.

**27.** The *pigeon hole principle* states that if a function *f* has *n* distinct inputs but less than *n* distinct outputs then there exists two inputs *a*, *b* such that *a* *b* and *f*(*a*) = *f*(*b*). Give an algorithm which finds the values *a*, *b* for which the range values are equal.

**28.** Given *n*, a positive integer determine if *n* is the sum of all of its divisors; i.e. if *n* is the sum of all *t* such that 1 *t* < *n* and *t* divides *n*.

**29.** Consider the function *F*(*x*) defined by

*F*(*x*) **if** *even*(*x*) **then** *x*/2 **else** *F*(*F*(3*x* + 1))

**30.** If *S* is a set of *n* elements the *powerset *of *S* is the set of all possible subsets of *S*. For example if *S* = (*a*,*b*,*c*,) then POWERSET(*S*) = {( ), (*a*), (*b*), (*c*), (*a*,*b*), (*a*,*c*), (*b*,*c*), (*a*,*b*,*c*)}. Write a recursive procedure to compute powerset (*S*).