In the previous chapters, we studied the representation of simple data structures using an array and a sequential mapping. These representations had the property that successive nodes of the data object were stored a fixed distance apart. Thus, (i) if the element *a _{ij}* of a table was stored at location

(BAT, CAT, EAT, FAT, HAT, JAT, LAT, MAT, OAT, PAT, RAT, SAT, TAT, VAT, WAT)

An elegant solution to this problem of data movement in *sequential* representations is achieved by using *linked* representations. Unlike a sequential representation where successive items of a list are located a fixed distance apart, in a linked representation these items may be placed anywhere in memory. Another way of saying this is that in a sequential representation the order of elements is the same as in the ordered list, while in a linked representation these two sequences need not be the same. To access elements in the list in the correct order, with each element we store the address or location of the next element in that list. Thus, associated with each data item in a linked representation is a pointer to the next item. This pointer is often referred to as a link. In general, a *node* is a collection of data, DATA1, ...,DATA*n *and links LINK1, ...,LINK*m*. Each item in a node is called a *field*. A field contains either a data item or a link.

We recognize that we have come to the end when LINK has a value of zero.

(i) get a node which is currently unused; let its address be *X*;

(ii) set the DATA field of this node to GAT;

(iii) set the LINK field of *X* to point to the node after FAT which contains HAT;

(iv) set the LINK field of the node containing FAT to *X*.

(i) A means for dividing memory into nodes each having at least one link field;

(ii) A mechanism to determine which nodes are in use and which are free;

(iii) A mechanism to transfer nodes from the reserved pool to the free pool and vice-versa.

(ii) RET(*X*) which returns node *X* to the storage pool.

**Example 4.1:** Assume that each node has two fields DATA and LINK. The following algorithm creates a linked list with two nodes whose DATA fields are set to be the values 'MAT' and 'PAT' respectively. *T* is a pointer to the first node in this list.

procedureCREATE2(T)

callGETNODE(I) //get an available node//

TI;DATA(I)'MAT' //store information into the node//

callGETNODE(I) //get a second available node//

LINK(T)I //attach first node to the second//

LINK(I)0:DATA(I)'PAT'

endCREATE2

The resulting list structure is

**Example 4.2:** Let *T* be a pointer to a linked list as in Example 4.1. *T*= 0 if the list has no nodes. Let *X* be a pointer to some arbitrary node in the list *T*. The following algorithm inserts a node with DATA field 'OAT' following the node pointed at by *X*.

procedureINSERT(T,X)

callGETNODE(I)

DATA(I) 'OAT'

ifT= 0then[TI;LINK(I)0] //insert into empty list//

else[LINK(I)LINK(X) //insert afterX//

LINK(X)I]

endINSERT

The resulting list structure for the two cases *T* = 0 and *T* 0 is

**Example 4.3:** Let *X* be a pointer to some node in a linked list *T* as in example 4.2. Let *Y* be the node preceding *X. Y* = 0 if *X* is the first node in *T* (i.e., if *X* = *T*). The following algorithm deletes node *X* from *T*.

procedureDELETE(X, Y, T)

ifY= 0thenTLINK(T) //remove the first node//

elseLINK(Y)LINK(X) //remove an interior

node//

callRET(X) //return node to storage pool//

endDELETE

We have already seen how to represent stacks and queues sequentially. Such a representation proved efficient if we had only one stack or one queue. However, when several stacks and queues co-exist, there was no efficient way to represent them sequentially. In this section we present a good solution to this problem using linked lists. Figure 4.5 shows a linked stack and a linked queue.

T(i) = Top ofith stack 1in

F(i) = Front ofith queue 1im

R(i) = Rear ofith queue 1im

T(i) = 0 1in

F(i) = 0 1im

T(i) = 0 iff stackiempty

F(i) = 0 iff queueiempty

procedureADDS(i, Y)

//add elementYonto stacki//

callGETNODE(X)

DATA(X)Y//store data valueYinto new node//

LINK(X)T(i) //attach new node to top ofi-th stack//

T(i)X//reset stack pointer//

endADDS

procedureDELETES(i, Y)

//delete top node from stackiand setYto be theDATAfield of

this node//

ifT(i) = 0then callSTACK__EMPTY

XT(i) //setXto top node of stacki//

YDATA(X) //Ygets new data//

T(i)LINK(X) //remove node from top of stacki//

callRET(X) //return node to storage pool//

endDELETES

procedureADDQ(i,Y)

//addYto theith queue//

callGETNODE(X)

DATA(X)Y:LINK(X) 0

ifF(i) = 0then[F(i)R(i)X] //the queue was empty//

else[LlNK(R(i))X;R(i)X] //the queue was

not empty//

endADDQ

procedureDELETEQ(i, Y)

//delete the first node in theith queue, setYto itsDATAfield//

ifF(i) = 0thencallQUEUE__EMPTY

else[XF(i);F(i)LINK(X)

//setXto front node//

YDATA(X);callRET(X)] //remove data

and return node//

endDELETEQ

The solution presented above to the *n-*stack, *m-*queue problem is seen to be both computationaliy and conceptually simple. There is no need to shift stacks or queues around to make space. Computation can proceed so long as there are free nodes. Though additional space is needed for the link fields, the cost is no more than a factor of 2. Sometimes the DATA field does not use the whole word and it is possible to pack the LINK and DATA fields into the same word. In such a case the storage requirements for sequential and linked representations would be the same. For the use of linked lists to make sense, the overhead incurred by the storage for the links must be overriden by: (i) the virtue of being able to represent complex lists all within the same array, and (ii) the computing time for manipulating the lists is less than for sequential representation.

Now all that remains is to explain how we might implement the GETNODE and RET procedures.

The storage pool contains all nodes that are not currently being used. So far we have assumed the existence of a RET and a GETNODE procedure which return and remove nodes to and from the pool. In this section we will talk about the implementation of these procedures.

procedureGETNODE(I)

//Iis set as a pointer to the next available node//

ifAV>nthen callNO__MORE__NODES

IAV

AVAV+ 1

endGETNODE

procedureINIT(n)

//initialize the storage pool, through theLlNKfield, to contain nodes

with addresses 1,2,3, ...,nand setAVto point to the first node

in this list//

fori1ton- 1do

LINK(i)i+ 1

end

LlNK(n) 0

AV1

endINIT

This procedure gives us the following list:

procedureGETNODE(X)

//Xis set to point to a free node if there is one onAV//

ifAV= 0then callNO__MORE__NODES

XAV

AVLINK(AV)

endGETNODE

procedureRET(X)

//Xpoints to a node which is to be returned to the available space

list//

LINK(X)AV

AVX

endRET

2) Is *ptr* equal to the value of another variable of type pointer, e.g., is *ptr* = SAMPLE?

The only legal operations we can perform on pointer variables is:

2) Set *ptr* to point to a node.

ptrSAMPLE

whileptr5do

ptr))

ptrptr+ 1

end

Let us tackle a reasonable size problem using linked lists. This problem, the manipulation of symbolic polynomials, has become a classical example of the use of list processing. As in chapter 2, we wish to be able to represent any number of different polynomials as long as their combined size does not exceed our block of memory. In general, we want to represent the polynomial

*A*(*x*) = *a _{m}x^{em}* + ... +

_____________________

| | | |

|COEF | EXP | LINK |

|______|______|_______|

For instance, the polynomial *A*= 3*x*^{14} + 2*x*^{8} + 1 would be stored as

while *B* = 8*x*^{14} - 3*x*^{10} + 10*x*^{6} would look like

procedureATTACH(C,E,d)

//create a new term withCOEF=CandEXP=Eand attach it

to the node pointed at byd//

callGETNODE(I)

EXP(I)E

COEF(I)C

LINK(d)I//attach this node to the end of this list//

dI//move pointerdto the new last node//

endATTACH

(iii) additions/deletions to available space;

procedurePADD(A,B,C)

//polynomialsAandBrepresented as singly linked lists are

summed to form the new list namedC//

1pA;qB//p,qpointers to next term ofA, B//

2callGETNODE(C);dC//initial node forC, returned

later//

3whilep0andq0do//while there are more terms in

AandB//

4case

5 :EXP(p)= EXP(q): //equal exponents//

6xCOEF(p)+ COEF(q)

7ifx0then callATTACH(x, EXP(p),d)

8pLINK(p); qLINK(q) //advance to next

terms//

9 :EXP(p)<EXP(q):

10callATTACH(COEF(q),EXP(q),d)

11qLINK(q) //advance to next term//

12 :else: callATTACH(COEF(p),EXP(p),d)

13pLINK(p) //advance to next term ofA//

14end

15end

16whilep0do//copy remaining terms ofA//

17callATTACH(COEF(p),EXP(p),d)

18pLINK(p)

19end

20whileq 0do//copy remaining terms of B//

21callATTACH(COEF(q),EXP(q),d)

22qLINK(q)

23end

24LINK(d)0; tC; C LINK(C) //delete extra initial

node//

25callRET(t)

26endPADD

(iv) creation of new nodes for *C*.

*A(x) = a _{m}x^{em} + ... + a_{1}x^{e1}, B(x) = b_{n}x^{fn} + ... + b_{1}x^{f1}*

*a _{i},b_{i }*

Then clearly the number of coefficient additions can vary as

0 coefficient additions min {*m*, *n*}.

e_{m} > f_{n} > e_{m-1} > f_{n-1} > ... > f_{2} > e_{n-m+2} > ... > e_{1} > f_{1}.

callREAD(A)

callREAD(B)

callREAD(C)

TPMUL(A, B)

DPADD(T, C)

callD)

Now our user may wish to continue computing more polynomials. At this point it would be useful to reclaim the nodes which are being used to represent T(x). This polynomial was created only as a partial result towards the answer D(x). By returning the nodes of T(x), they may be used to hold other polynomials.

procedureERASE(T)

//return all the nodes ofTto the available space list avoiding repeated

calls to procedureRET//

ifT =0then return

pT

whileLINK(p) 0do//find the end ofT//

pLINK(p)

end

LlNK(p)AV//ppoints to the last node ofT//

AVT//available list now includesT//

endERASE

Study this algorithm carefully. It cleverly avoids using the RET procedure to return the nodes of *T* one node at a time, but makes use of the fact that the nodes of *T* are already linked. The time required to erase *T*(*x*) is still proportional to the number of nodes in *T*. This erasing of entire polynomials can be carried out even more efficiently by modifying the list structure so that the LINK field of the last node points back to the first node as in figure 4.8. A list in which the last node points back to the first will be termed a *circular list*. A *chain *is a singly linked list in which the last node has a zero link field.

procedureCERASE(T)

//return the circular listTto the available pool//

ifT= 0then return;

XLINK(T)

LINK(T)AV

AVX

endCERASE

Figure 4.9 is a schematic showing the link changes involved in erasing a circular list.

A direct changeover to the structure of figure 4.8 however, causes some problems during addition, etc., as the zero polynomial has to be handled as a special case. To avoid such special cases one may introduce a head node into each polynomial; i.e., each polynomial, zero or non-zero, will contain one additional node. The EXP and COEF fields of this node will not be relevant. Thus the zero polynomial will have the representation:

while *A* = 3*x*^{14} + 2*x*^{8} + 1 will have the representation

(i) at line 1 define *p, q* by *p* *LINK* (*A*); *q* *LINK*(*B*)

(ii) at line 3: **while** *p* *A* **and** *q* *B* **do**

(iii) at line 16: **while** *p* *A* **do**

(v) at line 24: replace this line by *LINK*(*d*) *C*

procedureCPADD(A, B, C)

//polynomialsAandBare represented as circular lists with head

nodes so thatEXP(A) =EXP(B) = -1.Cis returned as their sum,

represented as a circular list//

pLINK(A); qLINK(B)

callGETNODE(C);EXP(C) -1 //set up head node//

dC//last node inC//

loop

case

:EXP(p) =EXP(q):ifEXP(p)=-1then[LINK(d)C;return]

xCOEF(p) +COEF(q)

ifx 0then callATTACH(x,EXP(p),d

pLINK(p);qLINK(q)

:EXP(p) <EXP(q):callATTACH(COEF(q),EXP(q),d)

qLINK(q)

:else:callATTACH(COEF(p),EXP(p),d)

pLINK(p)

end

forever

endCPADD

procedureINVERT(X)

//a chain pointed at byXis inverted so that ifX= (a_{1}, ...,a)_{m}

then after executionX= (a, ...,_{m}a_{1})//

pX;q0

whilep0do

rq;qp//rfollowsq; qfollowsp//

pLINK(p) //pmoves to next node//

LINK(q)r//linkqto previous node//

end

Xq

endINVERT

Another useful subroutine is one which concatenates two chains *X *and *Y*.

procedureCONCATENATE(X, Y, Z)

//X= (a_{1}, ...,a),_{m}Y= (b_{1}, ...,b),_{n}m,n0, produces a new chain

Z= (a_{1}, ...,a,_{m}b_{1}, ...,b)//_{n}

ZX

ifX= 0then[ZY;return]

ifY= 0then return

pX

whileLINK(p) 0do//find last node ofX//

pLINK(p)

end

LINK(p)Y//link last node of X to Y//

endCONCATENATE

procedureCONCATENATE(X, Y, Z)

case

:X = 0 :ZY

:Y = 0 : ZX

:else:pX; ZX

whileLINK(p) 0do

pLINK(p)

end

LINK(p)Y

end

endCONCATENATE

Now let us take another look at circular lists like the one below:

Now we can write procedures which insert a node at the front or at the rear of a circular list and take a fixed amount of time.

procedureINSERT__FRONT(A, X)

//insert the node pointed at by X to the front of the circular list

A, where A points to the last node//

ifA = 0then[AX

LINK(X)A]

else[LINK(X)LINK(A)

LINK(A)X]

endINSERT--FRONT

procedureLENGTH(A)

//find the length of the circular listA//

i0

ifA 0then[ptrA

repeat

ii+ 1;ptrLINK(ptr)

untilptr=A]

return(i)

endLENGTH

Let us put together some of these ideas on linked and sequential representations to solve a problem which arises in the translation of computer languages, the processing of equivalence relations. In FORTRAN one is allowed to share the same storage among several program variables through the use of the EQUIVALENCE statement. For example, the following pair of Fortran statements

EQUIVALENCE (*A*(2), *B*(1,2), *C*(4)), (*A*(1),*D*), (*D*,*E*,*F*), (*G*,*H*)

would result in the following storage assignment for these variables:

(i) determine whether there are any conflicts;

(ii) determine the total amount of storage required;

(i) For any variable *x, x ** x, e.g. x* is to be assigned the same location as itself. Thus is *reflexive.*

(ii) For any two variables *x* and *y*, if *x ** y* then *y * x, e.g.* assigning *y* the same location as *x* is the same as assigning *x* the same location as *y*. Thus, the relation is *symmetric.

(iii) For any three variables *x*, *y* and *z*, if *x * y* and *y * z* then *x * z*, e.g. if *x* and *y* are to be assigned the same location and *y* and *z* are also to be assigned the same location, then so also are *x* and *z*. The relation is *transitive.

**Definition:** A relation, , over a set *S*, is said to be an *equivalence relation* over *S* iff it is symmetric, reflexive and transitive over *S*.

Examples of equivalence relations are numerous. For example, the "equal to" (=) relationship is an equivalence relation since: (i)* x *=* x, *(ii)* x *=* y* implies *y *=* x*, and (iii) *x *=* y* and *y *=* z* implies *x *=* z.* One effect of an equivalence relation is to partition the set *S* into equivalence classes such that two members *x* and *y* of *S* are in the same equivalence class iff *x ** y*. For example, if we have 12 variables numbered 1 through 12 and the following equivalences were defined via the EQUIVALENCE statement:

1 5, 4 2, 7 11, 9 10, 8 5, 7 9, 4 6, 3 12 and 12 1

{1, 3, 5, 8, 12}; {2, 4, 6}; {7, 9, 10, 11}.

The first design for this algorithm might go this way:

procedureEQUIVALENCE(m,n)

initialize

fork1tom do

readthe next pair(i,j)

process this pair

end

initialize for output

repeat

output a new equivalence class

untildone

endEQUIVALENCE

procedureEQUIVALENCE(m,n)

declareSEQ(1:n),DATA(1:2m),LINK(1:2m),BIT(1:n)

initialize BIT,SEQ to zero

fork1tomdo

readthe next pair(i,j)

put j on the SEQ(i)list

put i on the SEQ(j) list

end

index1

repeat

ifBIT(index) = 0

then[BIT(index) 1

output this new equivalence class]

indexindex+ 1

untilindex>n

end

procedureEQUIVALENCE (m,n)

//Input:m, the number of equivalence pairs

n, the number of variables

Output: variables 1, ...,nprinted in equivalence classes//

declareSEQ(1:n),BIT(1:n)

DATA(1: 2m),LINK(1: 2m);

fori1tondoSEQ(i)BIT(i) 0end

av1

fork1tomdo//phase 1: process all input//

readnext equivalence pair (i, j)

DATA(av)j;LlNK(av)SEQ(i) //addjto listi//

SEQ(i) av; av av + 1

DATA(av)i;LINK(av)SEQ(j) //addito listj//

SEQ(j)av;avav+ 1

end

index1

repeat//phase 2: output all classes//

ifBIT(index) = 0

then[A new class', index)

BIT(index) 1 //mark class as output//

ptrSEQ(index);top0 //initialize stack//

loop//find the entire class//

whileptr0 do //process a list//

jDATA(ptr)

ifBIT(j) = 0

then[j);BIT(j) 1

tLINK(ptr); LINK(ptr)top

topptr; ptrt]

elseptrLINK(ptr)

end

iftop= 0then exit//stack empty//

ptrSEQ(DATA(top))

topLINK(top)

forever]

indexindex+ 1

untilindex>n

endEQUIVALENCE

In Chapter 2, we saw that when matrices were sparse (i.e. many of the entries were zero), then much space and computing time could be saved if only the nonzero terms were retained explicitly. In the case where these nonzero terms did not form any "nice" pattern such as a triangle or a band, we devised a sequential scheme in which each nonzero term was represented by a node with three fields: row, column and value. These nodes were sequentially organized. However, as matrix operations such as addition, subtraction and multiplication are performed, the number of nonzero terms in matrices will vary, matrices representing partial computations (as in the case of polynomials) will be created and will have to be destroyed later on to make space for further matrices. Thus, sequential schemes for representing sparse matrices suffer from the same inadequacies as similar schemes for polynomials. In this section we shall study a very general linked list scheme for sparse matrix representation. As we have already seen, linked schemes facilitate efficient representation of varying size structures and here, too, our scheme will overcome the aforementioned shortcomings of the sequential representation studied in Chapter 2.

For each nonzero term of *A*, we have one five field node which is in exactly one column list and one row list. The head nodes are marked Hl-H7. As can be seen from the figure, the VALUE field of the head nodes for each column list is used to link to the next head node while the DOWN field links to the first nonzero term in that column (or to itself in case there is no nonzero term in that column). This leaves the RIGHT field unutilized. The head nodes for the row lists have the same ROW and COL values as the head nodes for the column lists. The only other field utilized by the row head nodes is the RIGHT field which is not used in the column head nodes. Hence it is possible to use the same node as the head node for row *i* as for column *i*. It is for this reason that the row and column head nodes have the same labels. The head nodes themselves, linked through the VALUE field, form a circularly linked list with a head node pointed to by *A*. This head node for the list of row and column head nodes contains the dimensions of the matrix. Thus, ROW(*A*) is the number of rows in the matrix *A* while COL(*A*) is the number of columns. As in the case of polynomials, all references to this matrix are made through the variable *A*. If we wish to represent an *n *x* m* sparse matrix with *r* nonzero terms, then the number of nodes needed is *r* + max {*n,m*} + 1. While each node may require 2 to 3 words of memory (see section 4.12), the total storage needed will be less than *nm* for sufficiently small *r*.

Before closing this section, let us take a look at an algorithm to return all nodes of a sparse matrix to the available space list.

procedureMERASE(A)

//return all nodes ofAto available space list. Assume that the available

space list is a singly linked list linked through the field RIGHT

withAVpointing to the first node in this list.//

RIGHT(A)AV; AVA; NEXTVALUE(A)

whileNEXTAdo//erase circular lists by rows//

TRIGHT(NEXT)

RIGHT(NEXT) AV

AVT

NEXTVALUE(NEXT)

end

endMERASE

procedureMREAD(A)

//read in a matrixAand set up its internal representation as

discussed previously. The triples representing nonzero terms

are assumed ordered by rows and within rows by columns.

An auxiliary array HDNODE is used.//

1read(n,m,r) //m, nare assumed positive.ris the number

of nonzero elements//

2pmax{m,n}

3fori1topdo//getpheadnodes for rows and

columns//

4callGETNODE(X); HDNODE(i) X

5 ROW(X) COL(X) 0

6 RIGHT(X) VALUE(X) X //these fields point to

themselves//

7end

8current_row1;LASTHDNODE(1)

9fori1tordo

10read(rrow, ccol, val) //get next triple//

11ifrrow > current__row//current row is done; close it and

begin another//

12then[RIGHT(LAST)HDNODE(current__row)

13current__rowrrow;LASTHDNODE(rrow)]

//LASTpoints to rightmost node//

14callGETNODE(X)

15ROW(X)rrow;COL(X)ccol;VALUE(X)val

//store triple into new node//

16RIGHT(LAST)X;LASTX//link into row list//

17DOWN(VALUE(HDNODE(ccol)))X;

//link into column list//

18VALUE(HDNODE(ccol))X

19

20end

21 //close last row//

ifr0thenRIGHT(LAST)HDNODE(current__row)

22fori1tomdo//close all column lists//

23DOWN(VALUE(HDNODE(i)))HDNODE(i)

24end

25 //set up list of headnodes linked throughVALUEfield//

26callGETNODE(A): ROW(A)n; COL(A)m

//set up headnode of matrix//

27fori1top- 1doVALUE(HDNODE(i))HDNODE(i +1)

end

28ifp =0thenVALUE(A) A

29else[VALUE(HDNODE(p))A

VALUE(A) HDNODE(1)]

30endMREAD

So far we have been working chiefly with singly linked linear lists. For some problems these would be too restrictive. One difficulty with these lists is that if we are pointing to a specific node, say *P*, then we can easily move only in the direction of the links. The only way to find the node which precedes *P* is to start back at the beginning of the list. The same problem arises when one wishes to delete an arbitrary node from a singly linked list. As can be seen from example 4.3, in order to easily delete an arbitrary node one must know the preceding node. If we have a problem where moving in either direction is often necessary, then it is useful to have doubly linked lists. Each node now has two link fields, one linking in the forward direction and one in the backward direction.

*P* = RLINK (LLINK(*P*)) = LLINK (RLINK(*P*)).

procedureDDLETE(X, L)

ifX=Lthen callNO__MORE__NODES

//Lis a list with at least one node//

RLINK(LLINK(X))RLINK(X)

LLINK(RLINK(X))LLINK(X)

callRET(X)

endDDLETE

Insertion is only slightly more complex.

procedureDINSERT(P, X)

//insert nodePto the right of nodeX//

LLINK(P)X //set LLINKandRLINKfields of nodeP//

RLINK(P)RLINK(X)

LLINK(RLINK(X))P

RLINK(X)P

endDINSERT

In a multiprocessing computer environment, several programs reside in memory at the same time. Different programs have different memory requirements. Thus, one program may require 60K of memory, another 100K, and yet another program may require 300K. Whenever the operating system needs to request memory, it must be able to allocate a block of contiguous storage of the right size. When the execution of a program is complete, it releases or frees the memory block allocated to it and this freed block may now be allocated to another program. In a dynamic environment the request sizes that will be made are not known ahead of time. Moreover, blocks of memory will, in general, be freed in some order different from that in which they were allocated. At the start of the computer system no jobs are in memory and so the whole memory, say of size *M* words, is available for allocation to programs. Now, jobs are submitted to the computer and requests are made for variable size blocks of memory. Assume we start off with 100,000 words of memory and five programs *P*1, *P*2, *P*3, *P*4 and *P*5 make requests of size 10,000, 15,000, 6,000, 8,000 and 20,000 respectively. Figure 4.14 indicates the status of memory after storage for *P*5 has been allocated. The unshaded area indicates the memory that is currently not in use. Assume that programs *P*4 and *P*2 complete execution, freeing the memory used by them. Figure 4.15 show the status of memory after the blocks for *P*2 and *P*4 are freed. We now have three blocks of contiguous memory that are in use and another three that are free. In order to make further allocations, it is necessary to keep track of those blocks that are not in use. This problem is similar to the one encountered in the previous sections where we had to maintain a list of all free nodes. The difference between the situation then and the one we have now is that the free space consists of variable size blocks or nodes and that a request for a block of memory may now require allocation of only a portion of a node rather than the whole node. One of the functions of an operating system is to maintain a list of all blocks of storage currently not in use and then to allocate storage from this unused pool as required. One can once again adopt the chain structure used earlier to maintain the available space list. Now, in addition to linking all the free blocks together, it is necessary to retain information regarding the size of each block in this list of free nodes. Thus, each node on the free list has two fields in its first word, i.e., SIZE and LINK. Figure 4.16 shows the free list corresponding to figure 4.15. The use of a head node simplifies later algorithms.

If we now receive a request for a block of memory of size *N*, then it is necessary to search down the list of free blocks finding the first block of size *N* and allocating *N* words out of this block. Such an allocation strategy is called *first fit*. The algorithm below makes storage allocations using the first fit strategy. An alternate strategy, *best fit*, calls for finding a free block whose size is as close to *N* as possible, but not less than *N*. This strategy is examined in the exercises.

procedureFF(n, p)

//AV points to the available space list which is searched for a node

of size at leastn. pis set to the address of a block of sizen

that may be allocated. If there is no block of that size thenp= 0.

It is assumed that the free list has a head node with SIZE field = 0//

pLINK(AV);qAV

whilep0do

ifSIZE(p)nthen[SIZE(p)SIZE(p) -n

ifSIZE(p) = 0thenLINK(q)LINK(p)

elsepp+SIZE(p)

return]

qp;pLINK(p)

end

//no block is large enough//

endFF

(i) the left adjacent block is free, i.e., the block ending at *p* - 1;

(ii) the right adjacent block is free, i.e., the block beginning at *p* + *n.*

The first and last words of each block are reserved for allocation information. The first word of each free block has four fields: LLINK, RLINK, TAG and SIZE. Only the TAG and SIZE field are important for a block in use. The last word in each free block has two fields: TAG and UPLINK. Only the TAG field is important for a block in use. Now by just examining the tag at *p* - 1 and *p* + *n* one can determine whether the adjacent blocks are free. The UPLINK field of a free block points to the start of the block. The available space list will now be a doubly linked circular list, linked through the fields LLINK and RLINK. It will have a head node with SIZE = 0. A doubly linked list is needed, as the return block algorithm will delete nodes at random from *AV*. The need for UPLINK will become clear when we study the freeing algorithm. Since the first and last nodes of each block have TAG fields, this system of allocation and freeing is called the *Boundary Tag method*. It should be noted that the TAG fields in allocated and free blocks occupy the same bit position in the first and last words respectively. This is not obvious from figure 4.19 where the LLINK field precedes the TAG field in a free node. The labeling of fields in this figure has been done so as to obtain clean diagrams for the available space list. The algorithms we shall obtain for the boundary tag method will assume that memory is numbered 1 to *m* and that TAG(0) = TAG(*m* + 1) = 1. This last requirement will enable us to free the block beginning at 1 and the one ending at *m* without having to test for these blocks as special cases. Such a test would otherwise have been necessary as the first of these blocks has no left adjacent block while the second has no right adjacent block. While the TAG information is all that is needed in an allocated block, it is customary to also retain the size in the block. Hence, figure 4.19 also includes a SIZE field in an allocated block.

While these algorithms may appear complex, they are a direct consequence of the doubly linked list structure of the available space list and also of the node structure in use. Notice that the use of a head node eliminates the test for an empty list in both algorithms and hence simplifies them. The use of circular linking makes it easy to start the search for a large enough node at any point in the available space list. The UPLINK field in a free block is needed only when returning a block whose left adjacent block is free (see lines 18 and 24 of algorithm FREE). The readability of algorithm FREE has been greatly enhanced by the use of the **case** statement. In lines 20 and 27 *AV* is changed so that it always points to the start of a free block rather than into

procedureALLOCATE(n,p)

//Use next fit to allocate a block of memory of size at least

n, n> 0. The available space list is maintained as described

above and it is assumed that no blocks of size < are to

be retained.pis set to be the address of the first word in

the block allocated.AVpoints to a node on the available list.//

1pRLINK(AV) //begin search atp//

2repeat

3ifSIZE(p)nthen//block is big enough//

4 [diffSlZE(p) -n

5ifdiff<then//allocate whole block//

6 [RLINK(LLINK(p))RLINK(p) //delete node

fromAV//

7LLINK(RLlNK(p))LLINK(p)

8TAG(p)TAG(p+SlZE(p) - 1) 1 //set

tags//

9AVLLINK(p) //set starting point of next

search//

10return]

11else//allocate lowernwords//

12 [SIZE(p)diff

13UPLINK(p+diff- 1)p

14TAG(p+diff- 1) 0 //set upper portion as

unused//

15AVp//position for next search//

16pp+diff//setpto point to start of allocated

block//

17SlZE(p)n

18TAG(p)TAG(p+n- 1) 1

//set tags for allocated block//

19return]]

20pRLlNK(p) //examine next node on list//

21untilp=RLINK(AV)

22 //no block large enough//

23p0;

24endALLOCATE

procedureFREE(p)

//return a block beginning atpand of size SlZE(p)//

1nSlZE(p)

2case

3 :TAG(p - 1) = 1andTAG(p+n) = 1:

//both adjacent blocks in use//

4TAG(p)TAG(p+n- 1)0 //set up a free

block//

5UPLINK(p+n- 1)p

6LLINK(p)AV;RLINK(p)RLINK(AV)

//insert at right ofAV//

7LLINK(RLlNK(p))p;RLINK(AV)p

8 :TAG(p+n) = 1andTAG(p- 1) = 0: //only left block

free//

9qUPLINK(p- 1 ) //start of left block//

10SIZE(q)SlZE(q) +n

11UPLINK(p+n- 1)q;TAG(p+n- 1) 0

12 :TAG(p + n) = 0andTAG(p- 1) = 1:

//only right adjacent block free//

13RLINK(LLINK(p+n))p//replace block

beginning//

14LLINK(RLINK(p+n))p//atp+nby one//

15LLINK(p)LLINK(p+n) //beginning atp//

16RLINK(p)RLINK(p+n)

17SlZE(p)n+SlZE(p+n)

18UPLINK(p+SIZE(p) - 1)p

19TAG(p) 0

20AVp

21:else://both adjacent blocks free//

22 //delete right free block fromAVlist//

RLINK(LLINK(p+n))RLINK(p+n)

23LLINK(RLINK(p+n))LLINK(p+n)

24qUPLINK(p- 1) //start of left free

block//

25SIZE(q)SIZE(q) +n+SIZE(p+n)

26UPLINK(q+SIZE(q) - 1)q

27AVLLINK(p+n)

28end

29endFREE

An alternative scheme for storage allocation, the Buddy System, is investigated in the exercises.

In Chapter 3, a linear list was defined to be a finite sequence of *n* 0 elements, _{1},...,* _{n}*, which we write as

(i) D = ( ) the null or empty list, its length is zero.

(ii) A = (a, (b,c)) a list of length two; its first element is

the atom 'a' and its second element is the

linear list (b,c).

(iii) B = (A,A,( )) a list of length three whose first two

elements are the lists A, the third element

the null list.

(iv) C = (a, C) a recursive list of length two. C corresponds

to the infinite list C = (a,(a,(a, ...).

Example one is the empty list and is easily seen to agree with the definition. For list A, we have

head (*A*) = '*a*', tail (*A*) = ((*b,c*)).

head (*B*)) = *A*, tail (*B*)) = (*A*, ( ))

head (tail(*B*)) = *A*, tail (tail(*B*)) = (( ))

First, let us restrict ourselves to the situation where the lists being represented are neither shared nor recursive. To see where this notion of a list may be useful, consider how to represent polynomials in several variables. Suppose we need to devise a data representation for them and consider one typical example, the polynomial *P*(*x,y,z*) =

x^{10} y^{3} z^{2} + 2x^{8} y^{3} z^{2} + 3x^{8} y^{2} z^{2} + x^{4} y^{4} z + 6x^{3} y^{4} z + 2yz

One can easily think of a sequential representation for *P*, say using

((*x*^{10} + 2*x*^{8})*y*^{3} + 3*x*^{8}*y*^{2})*z*^{2} + ((*x*^{4} + 6*x*^{3})*y*^{4} + 2*y*)*z*

It is a little surprising that every generalized list can be represented using the node structure:

________________________

| | | |

|TAG = 0/1 | DATA | LINK |

|__________|______|______|

Now that we have seen a particular example where generalized lists are useful, let us return to their definition again. Whenever a data object is defined recursively, it is often easy to describe algorithms which work on these objects recursively. If our programming language does not allow recursion, that should not matter because we can always translate a recursive program into a nonrecursive version. To see how recursion is useful, let us write an algorithm which produces an exact copy of a nonrecursive list *L* in which no sublists are shared. We will assume that each node has three fields, TAG, DATA and LINK, and also that there exists the procedure GETNODE(*X*) which assigns to *X* the address of a new node.

procedureCOPY(L)

//Lpoints to a nonrecursive list with no common sublists.COPY

returns a pointer to a new list which is a duplicate ofL//

ptr0

ifL0then

[ifTAG(L) = 0thenqDATA(L) //save an atom//

elseqCOPY(DATA(L)) //recursion//

rCOPY(LINK(L)) //copy tail//

callGETNODE(ptr) //get a node//

DATA(ptr)q;LINK(ptr)r//combine head and tail//

TAG(ptr)TAG(L)]

return(ptr)

endCOPY

(ii) The label L1 is attached to the first executable statement.

Now, each recursive call is replaced by a set of instructions which do the following:

(vi) Insert an unconditional branch to the beginning of the procedure.

(viii) If the stack is empty then execute a normal return.

(xiii) Use the index of the label of the return address to execute a branch to that label.

procedureCOPY(L)

//nonrecursive version//

i0 //initialize stack index//

1:ptr0

ifL0then

[ifTAG(L) = 0

thenqDATA(L)

else[STACK(i+ 1)q//stack local variables//

STACK(i+ 2)r

STACK(i+ 3)ptr

STACK(i+ 4)L//stack parameter//

STACK(i+ 5)2//stack return address//

ii+ 5;

LDATA(L);go to1 //set parameter and

begin//

2:qSTACK(i) //remove function value//

ii- 1]

STACK(i+ 1)q;STACK(i+ 2)r//stack

variables and//

STACK(i+ 3)ptr;STACK(i+ 4)L

//parameter for second//

STACK(i+ 5) 3;ii+5 //recursive call//

LLINK(L);go to1

3:rSTACK(i);ii1//remove function value//

callGETNODE(ptr)

DATA(ptr) q; LlNK(ptr) r

TAG(ptr) TAG(L)]

ifi0then[addrSTACK(i); L STACK(i- 1)

//execute a return//

t STACK(i- 2); r STACK(i- 3)

q STACK(i- 4);

STACK(i- 4)ptr;ptrt //store function

value//

ii- 4;go toaddr] //branch to 2 or 3//

return(ptr)

endCOPY

Now let us consider the computing time of this algorithm. The null

list takes a constant amount of time. For the list

which has the representation of figure 4.24 *L* takes on the values given in figure 4.25.

Levels of Continuing Continuing

recursion Value of L Levels L Levels L

1 q 2 r 3 u

2 s 3 u 4 v

3 t 4 w 5 o

4 o 5 x 4 v

3 t 6 o 3 u

2 s 5 x 2 r

1 q 4 w 3 o

2 r

1 q

procedureEQUAL(S,T)

//Sand T are nonrecursive lists, each node having three fields:TAG,

DATAandLINK.The procedure returns the valuetrueif the

lists are identical elsefalse//

ansfalse

case

:S =0andT =0:anstrue

:S0Tand0:

ifTAG(S) =TAG(T)

then[ifTAG(S)=0

thenansDATA (S) = DATA(T)

elseansEQUAL(DATA(S),DATA(T))

ifansthen

ansEQUAL(LINK(S),LINK(T))]

end

return(ans)

endEQUAL

Procedure DEPTH is a very close transformation of the definition which is itself recursive.

procedureDEPTH(S)

//Sis a nonrecursive list having nodes with fieldsTAG, DATAan

LINK. The procedure returns the depth of the list//

max0

ifS= 0then return(max) //null list has zero depth//

ptrS

whileptr0do

ifTAG(ptr) = 0thenans0

elseansDEPTH(DATA(ptr)) //recursion//

ifans>maxthenmaxans//find a new maximum//

ptrLINK(ptr)

end

return(max+ 1)

endDEPTH

In this section we shall consider some of the problems that arise when lists are allowed to be shared by other lists and when recursive lists are permitted. Sharing of sublists can in some situations result in great savings in storage used, as identical sublists occupy the same space. In order to facilitate ease in specifying shared sublists, we extend the definition of a list to allow for naming of sublists. A sublist appearing within a list definition may be named through the use of a list name preceding it. For example, in the list *A* = (*a*,(*b,c*)), the sublist (*b,c*) could be assigned the name *Z* by writing *A* = (*a,Z*(*b,c*)). In fact, to be consistent we would then write *A*(*a,Z*(*b,c*)) which would define the list *A* as above.

(i) REF(*X*) = 1 accessible only via *X*

(ii) REF(*Y*) = 3 pointed to by *Y* and two points from *Z*

(iii) REF(*Z*) = 1 accessible only via *Z*

(iv) REF(*W*) = 2 two pointers to list *C*

Now a call to LERASE(*T*) (list erase) should result only in a

procedureLERASE(X)

//recursive algorithm to erase a list assuming aREFfield in each

head node which has the number of pointers to this list and a

TAGfield such thatTAG(X) = 0 ifDATA(X) is actually an atom

andTAG(X) = 1 ifDATA(X) is a link to a sublist. The storage

pool is assumed linked through the fieldLINKwithAVpointing

to the first node in the pool//

REF(X)REF(X) - 1 //decrement reference count//

ifREF(X) 0then return

YX//Ywill traverse top level ofX//

whileLINK(Y) 0do

Y LINK(Y)

ifTAG(Y) = 1then callLERASE (DATA (Y))// recursion//

end

LINK(Y)AV//attach top level nodes to avail list//

AVX

endLERASE

(i) reference count of Z becomes zero;

(ii) next node is processed and REF(Y) reduces to 1;

(iv) the top level nodes of Z are linked into the available space list

As remarked at the close of the last section, garbage collection is the process of collecting all unused nodes and returning them to available space. This process is carried out in essentially two phases. In the first phase, known as the marking phase, all nodes in use are marked. In the second phase all unmarked nodes are returned to the available space list. This second phase is trivial when all nodes are of a fixed size. In this case, the second phase requires only the examination of each node to see whether or not it has been marked. If there are a total of *n* nodes, then the second phase of garbage collection can be carried out in *O*(*n*) steps. In this situation it is only the first or marking phase that is of any interest in designing an algorithm. When variable size nodes are in use, it is desirable to compact memory so that all free nodes form a contiguous block of memory. In this case the second phase is referred to as memory compaction. Compaction of disk space to reduce average retrieval time is desirable even for fixed size nodes. In this section we shall study two marking algorithms and one compaction algorithm.

procedureDRIVER

//driver for marking algorithm.nis the number of nodes in the

system//

1fori 1 tondo //unmark all nodes//

2MARK(i) 0

3end

4MARK(0) 1;TAG(0) 0 //boundary conditions//

5foreach pointer variableXwithMARK (X) = 0do

6 MARK (X) 1

7ifTAG(X) = 1then callMARK1(X) //X is a list node//

8end

9endDRIVER

procedureMARK1 (X)

//Xis a list node. Mark all nodes accessible fromX. It is assumed

that MARK(0) = 1 and TAG(0) = 0.ADDandDELETE

perform the standard stack operations//

1PX; initialize stack

2loop//follow all paths fromP, Pis a marked list node//

3loop//move downwards stackingRLINK's if needed//

4QRLINK(P)

5ifTAG(Q) = 1andMARK(Q) = 0then callADD(Q)

//stackQ//

6MARK(Q) 1 //Qmay be atomic//

7PDLINK(P)

//any unmarked nodes accessible fromP?//

8ifMARK(P) = 1orTAG(P) = 0thenexit

9MARK(P) 1

10forever

11MARK(P) 1 //Pmay be an unmarked atomic node//

12ifstack emptythen return//all accessible nodes marked//

13callDELETE(P) //unstack P//

14forever

15endMARK1

A B C D F G F D C B H B A I J I A

The tag will be reset to 1 when the node gets off the *T* - X path list.

procedureMARK2(X)

//same function as MARK1//

//it is assumed thatMARK(0) = 1andTAG(0) = 0//

1PX;T0 //initializeT-Xpath list//

2repeat

3QDLINK(P) //go down list//

4case

5 :MARK(Q) = 0andTAG(Q) = 1: //Qis an unexamined

list node//

6MARK(Q) 1;TAG(P) 0

7DLINK(P)T;TP//addPtoT-Xpath list//

8PQ//move down to exploreQ//

9:else://Qis an atom or already examined//

10MARK(Q) 1

11 L1:QRLINK(P) //move right ofP//

12case

13 :MARK(Q) = 0andTAG(Q) = 1: //exploreQ

further//

14MARK(Q) 1

15RLINK(P)T;TP

16PQ

17:else:MARK(Q) 1 //Qis not to be explored;

back up on path list//

18whileT0do//while path list not empty//

19QT

20ifTAG(Q) = 0 //link toPthroughDLINK//

21then[T<DLINK(Q);DLINK(Q)P

22TAG(Q) 1;PQ;go toL1]

//Pis node to right ofQ//

23TRLINK(Q);RLINK(Q)P

24PQ

25end

26end

27end

28untilT= 0

29endMARK2

When all requests for storage are of a fixed size, it is enough to just link all unmarked (i.e., free) nodes together into an available space list. However, when storage requests may be for blocks of varying sizes, it is desirable to compact storage so that all the free space forms one contiguous block. Consider the memory configuration of figure 4.30. Nodes in use have a MARK bit = 1 while free nodes have their MARK bit = 0. The nodes are labeled 1 through 8, with *n _{i}*, 1

procedureCOMPACT(MEMORY,MARK,SlZE,M,NEW__ADDR)

//compact storage following the marking phase of garbage collection.

Nodes in use haveMARKbit=1.SlZE(i)= number of words

in that node.NEW__ADDRis a free field in each node. Memory

is addressed 1 to M and is an arrayMEMORY.//

//phase I: Scan memory from left assigning new addresses to nodes

in use. AV = next available word//

AV 1;i1 //variableiwill scan all nodes left to right//

whileiMdo

ifMARK(i) = 1then[//relocate node//

NEW__ADDR(i) AV

AV AV + SIZE(i)]

ii+ SIZE(i) //next node//

end

//phase II: update all links. Assume the existence of a fictitious

node with address zero andNEW__ADDR(0) = 0//

i1

whileiMdo

ifMARK(i) = 1then[//update all links to reflect new

addresses//

LINK1(i) NEW_ADDR(LINK1(i))

LlNK2(i) NEW_ADDR(LINK2(i))]

ii + SIZE(i)

end

//phase III: relocate nodes//

i1

whileiMdo

ifMARK (i) = 1then[//relocate to NEW_ADDR(i)//

kNEW_ADDR(i); lk

forjitoi+ SIZE(i) - 1do

MEMORY(k) MEMORY(j)

k k+ 1

end

ii+ SIZE(l)]

elseii+SIZE(i)

end

endCOMPACT

Suppose we have two character strings *S = *'*x _{1 }... x_{m}*' and

We begin by defining the data structure STRING using the axiomatic notation. For a set of operations we choose to model this structure after the string operations of *PL/I.* These include:

(i) NULL produces an instance of the null string;

(ii) ISNULL returns true if the string is null else false;

(iii) IN takes a string and a character and inserts it at the end of the string;

(iv) LEN returns the length of a string;

(v) CONCAT places a second string at the end of the first string;

(vi) SUBSTR returns any length of consecutive characters;

(vii) INDEX determines if one string is contained within another.

We now formally describe the string data structure.

structureSTRING

declareNULL( )string; ISNULL (string)boolean

IN (string, char)string; LEN (string)integer

CONCAT (string, string) string

SUBSTR (string, integer, integer)string

LINK_DEST 3184string, string)integer;

for allS,Tstring,i,jinteger,c,dcharlet

ISNULL (NULL) :: = true; ISNULL (IN(S,c)) :: =false

LEN (NULL) :: = 0; LEN (IN(S,c)) :: = 1 + LEN(S)

CONCAT (S,NULL):: = S

CONCAT (S,IN(T,c)) :: = IN (CONCAT (S,T),c)

SUBSTR (NULL,i,j):: = NULL

SUBSTR (IN(S,c),i,j) :: =

ifj = 0ori+ j - 1 > LEN(IN(S,c))thenNULL

else ifi+ j - 1 = LEN(IN(S,c))

thenIN(SUBSTR (S,i,j - 1 ),c)

elseSUBSTR(S,i,j)

INDEX (S,NULL) :: = LEN (S) + 1

INDEX (NULL,IN(T,d)) :: = 0

INDEX (IN(S,c),IN(T,d)) :: =

ifc=dandINDEX(S,T) = LEN(S) - LEN(T) + 1

thenINDEX(S,T)elseINDEX(S,IN(T,d))

end

endSTRING

As an example of how these axioms work, let *S = `abcd'.* This will be represented as

Now suppose we follow the axioms as they apply to SUBSTR(*S*,2,1). By the SUBSTR axioms we get

SUBSTR(S,2,1) = SUBSTR(IN(IN(IN(NULL,a),b),c),2,1)

= SUBSTR(IN(IN(NULL,a),b),2,1)

= IN(SUBSTR(IN(NULL,a),2,0),b)

= IN(NULL,b)

='b'

Suppose we try another example, SUBSTR(*S*,3,2)

= IN(SUBSTR(IN(IN(IN(NULL,a),b),c),3,1),d)

= IN(IN(SUBSTR(IN(IN(NULL,a),b),3,0),c),d)

= IN(IN(NULL,c),d)

= 'cd'

procedureSINSERT(S,T,i)

//insert string T after the i-th character of S destroying the original//

//strings S,T and creating a new string S//

1case//degenerate cases//

2 :i< 0ori> LENGTH(S):stop

3 : T = 0:return

4 : S = 0: S T;return

5end

//at this point LENGTH(S) > 0, LENGTH(T) > 0, 0 i

LENGTH(S)//

6 ptr S; j 1

7whilej < ido//find i-th character of S//

8ptrLINK (ptr); j j + 1 //ptr points to i-th node//

9end

10ifi = 0then[save S; ptr T; S T] //save i + 1 character//

11else[save LINK (ptr)

12 LINK (ptr) T] //attach list T to list S//

13whileLINK (ptr) 0do//find end of T//

14 ptr LINK (ptr)

15end

16 LINK (ptr) save //point end of T to//

//i + 1-st character of S//

17endSINSERT

Examine how the algorithm works on the strings below.

ptrS;j0

whilej<ido

ptrLINK(ptr);jj+ 1 //findi-th character of S//

end

saveLINK(ptr) //savei+ 1-st character of S//

elseLINK(ptr)LINK(T)

LINK(T)save//attach end ofTtoS//

ifptr=Sandi0thenST

Now let us develop an algorithm for a more sophisticated application of strings. Given two strings *S* and PAT we regard the value of PAT as a pattern to be searched for in *S*. If it occurs, then we want to know the node in *S* where PAT begins. The following procedure can be used.

*S* = '*aaa* ... *a*'; PAT = '*aaa* ... *ab*'

procedureFIND(S,PAT,i)

//find in stringSthe first occurrence of the stringPATand return

ias a pointer to the node inSwherePATbegins. Otherwise return

ias zero//

i0

ifPAT= 0orS= 0then return

pS

repeat

savep;qPAT//save the starting position//

whilep0andq 0andDATA(p) =DATA(q)do

pLINK(p);qLINK(q) //characters match,

move to next pair//

end

ifq= 0then[isave;return] //a match is found//

pLINK(save) //start at next character inS//

untilp= 0 //loop until no more elements inS//

endFIND

procedureNFIND(S,PAT,i)

//stringSis searched forPATandiis set to the first node inS

wherePAToccurs else zero. Initially the last and first characters

ofPATare checked for inS//

i0

ifPAT= 0orS= 0then return

pqPAT;t 0

whileLINK(q) 0do//qpoints to last node ofPAT//

qLINK(q);tt+ 1

end//t+ 1 is the length ofPAT//

jrsaveS

fork1totwhilej0do//findt+ 1-st node of S//

jLINK(j)

end

whilej0do//whileShas more chars to inspect//

pPAT;rsave

ifDATA(q) =DATA(j) //check last characters//

then[whileDATA(p) =DATA(r)andpqdo

pLINK(p);rLINK(r) //check pattern//

end

ifp=qthen[isave;return]] //success//

saveLINK(save);jLINK(j)

end

endNFIND

PAT = 'a b c a b c a c a b'

S= ' -a b? ? ? . . . . ?'

PAT = 'a b c a b c a c a b'

S= ' -a b c a? ? . . . ?'

PAT = 'a b c a b c a c a b'

For the example pattern above, PAT = abcabcacab, we have

j1 2 3 4 5 6 7 8 9 10

PAT a b c a b c a c a b

f 0 0 0 1 2 3 4 0 1 2

With this definition for NEXT, the pattern matching rule translates to the following algorithm:

procedurePMATCH(S,PAT)

//determine ifPATis a substring ofS//

1iS//iwill move throughS//

2jPAT//jwill move throughPAT//

3whilei0andj 0do

4ifDATA(i) =DATA(j)

5then[//match//

6iLINK(i)

7jLINK(j)]

8else[ifj=PATtheniLINK(i)

9elsejNEXT(j]

10end

11ifj= 0then print('amatch')

12else print('no match')

13endPMATCH

(note that *f*^{1} (*j*) = *f*(*j*) and f* ^{m}*(

This directly yields the following algorithm to compute* f.*

procedureFAIL(PAT,f)

//compute the failure function forPAT=p_{1}p_{2}... p//_{n}

1f(1) 0

2forj2tondo//computef(j)//

3if(j- 1)

4whilep_{j}_{p}i+ 1_{ and i > 0 do}

5if(i)

6end

7ifp=_{j}p+ 1_{i}thenf(j)i+ 1

8elsef(j) 0

9end

10endFAIL

INTEGER EXP(200), COEF(200), LINK(200)

SUBROUTINE INIT(N)

C LINK N NODES TO FORM THE INITIAL AVAILABLE

C SPACE LIST AND SET AV TO POINT TO FIRST NODE.

C ARRAYS EXP, COEF AND LINK AND VARIABLE AV

C ARE IN BLANK COMMON.

INTEGER EXP(200),COEF(200),LINK(200),AV

COMMON EXP,COEF,LINK,AV

M = N - 1

DO 10 I = 1, M

10 LINK(I) = I + 1

LINK(N) = 0

AV= 1

RETURN

END

PROCEDURE INIT(N: INTEGER);

{This procedure initializes the available space list.LINKis a global

array of type integer.AVis a global variable of type integer.}

VAR I: INTEGER; {Local variable I}

BEGIN FOR I: = 1 TO N - 1 DO LINK(I): = I + 1;

LINK(N): = 0;

AV: = 1

END;

30 bits 15 bits 15 bits

|---------------------------------------|

| COEF | EXP | LINK |

|---------------------------------------|

node structure using one 60 bit word per node

INTEGER FUNCTION COEF(J) SUBROUTINE SCOEF(J,I)

C EXTRACT COEF FIELD OF J BY C SET THE COEF FIELD OF J TO I BY

C USING A RIGHT SHIFT BY 30 BITS. C FIRST ZEROING OUT OLD

COEF = SHIFT(J,-30) C CONTENTS AND THEN USE AN OR

RETURN C TO PUT IN NEW VALUE. ASSUME I IS

END C IN RIGHT RANGE.

J = (J.AND.7777777777B).OR.

SHIFT(I.AND.7777777777B,30)

RETURN

END

INTEGER FUNCTION EXP(J) SUBROUTINE SEXP(J,I )

C EXTRACT EXP FIELD BY SHIFTING C SET EXP FIELD OF J TO I

C RIGHT AND ZEROING OUT EXTRA

BITSJ= (J.AND.77777777770000077777B)

EXP = SHIFT(J, - 15).AND.77777B .OR.SHIFT(I,15)

RETURN RETURN

END END

FUNCTION LINK(J) SUBROUTINE SLINK(J,I)

C EXTRACT LINK FIELD OF J C SET LINK FIELD OF J TO I

LINK = J.AND.77777B J = (J.AND.77777777777777700000B)

RETURN .OR. I

END RETURN

END

16 bits 16 bits

|---------------------------|

| EXP | LINK |

|---------------------------|

INTEGER FUNCTION EXP(J) SUBROUTINE SEXP(J,I)

C EXTRACT EXP FIELD OF J C SET EXP FIELD OF J TO I

EXP = SHIFT(J, - 16).AND.177777B J = (J.AND.177777B).OR.SHIFT(I,16)

RETURN RETURN

END END

(NOTE: the AND operation is needed

to mask out sign bit extension)

INTEGER FUNCTION LINK(J) SUBROUTINE SLINK(J,I)

C EXTRACT LINK FIELD OF J C SET LINK FIELD OF J TO I

LINK = J.AND. 177777B J = (J.AND.37777600000B).OR.I

RETURN RETURN

END END

These subroutines and functions assume that negative numbers are stored using '1s complement arithmetic. A "B' following a constant means that the constant is an octal constant.

VAR LINK: ARRAY [1 . . 200] OF 0 . . 32767;

EXP: ARRAY [1 . . 200] OF 0 . . 32767;

COEF: ARRAY [1 . . 200] OF - 1073741823 . . 1073741823;

TYPE POLY = PACKED RECORD COEF: -1073741823 . . 1073741823

EXP: 0 . . 32767

LINK: 0 . . 32767

END;

VAR NODE: ARRAY [1 . . 200] OF POLY

usage ... NODE [subscript]. COEF; NODE [subscript]. EXP; NODE [subscript]. LINK

"Multiword list items" by W. T. Comfort, *CACM*, vol. 7, no. 6, June 1964, pp. 357-362.

**1. **Write an algorithm LENGTH(*P*) to count the number of nodes in a singly linked list *P*, where *P* points to the first node in the list. The last node has link field 0.

**2. **Let *P* be a pointer to the first node in a singly linked list and *X* an arbitrary node in this list. Write an algorithm to delete this node from the list. If *X* = *P*, then *P* should be reset to point to the new first node in the list.

**3. **Let *X = *(*x _{1}, x_{2}, ..., x_{n}) and *Y =

**4. **Do exercise 1 for the case of circularly linked lists.

**5. **Do exercise 2 for the case of circularly linked lists.

**6. **Do exercise 3 for the case of circularly linked lists.

**7. **Devise a representation for a list where insertions and deletions can be made at either end. Such a structure is called a deque. Write a procedure for inserting and deleting at either end.

**8. **Consider the hypothetical data object *X*2. *X*2 is a linear list with the restriction that while additions to the list may be made at either end, deletions can be made from one end only.

**9. **Give an algorithm for a singly linked circular list which reverses the direction of the links.

**10. **Let *P* be a pointer to a circularly linked list. Show how this list may be used as a queue. I.e., write algorithms to add and delete elements. Specify the value for *P* when the queue is empty.

**11. **It is possible to traverse a singly linked list in both directions (i.e., left to right and a restricted right to left traversal) by reversing the links during the left to right traversal. A possible configuration for a list *P* under this scheme would be:

i) Write an algorithm to move *P, n* nodes to the right from a given position (*L,P*).

ii) Write an algorithm to move *P, n* nodes left from any given position (*L,P*).

**12. **Consider the operation XOR (exclusive OR, also written as ) defined as below (for *i,j* binary):

This differs from the usual OR of logic in that

**13.** Write an algorithm PREAD(*X*) to read in *n* pairs of coefficients and exponents, (*c _{i},e_{i}*) 1

**14.** Let *A* and *B* be pointers to the head nodes of two polynomials represented as in exercise 13. Write an algorithm to compute the product polynomial *C = A * B*. Your algorithm should leave *A* and *B* unaltered and create *C* as a new list. Show that if *n* and *m* are the number of terms in *A* and *B* respectively, then this multiplication can be carried out in time *O*(*nm ^{2}*) or

**15.** Let *A* be a pointer to the head node of a univariate polynomial as in section 4.4. Write an algorithm, PEVAL(*A,x*) to evaluate the polynomial *A* at the point *x*, where *x* is some real number.

**16.** Extend the equivalence algorithm of section 4.6 to handle the case when the variables being equivalenced may be subscripted. The input now consists of 4-tuples (*i*, *ioff*, *j*, *joff*) where *i* and *j* as before represent the variables. Ioff and joff give the positions in *i* and *j* being equivalenced relative to the start of *i* or *j*. For example, if *i* represents the array *A* dimensioned *A*(-6:10) and *j*, the array *B* dimensioned *B*(1:20), the equivalence EQUIVALENCE(*A*(5),*B*(6)) will be represented by 4-tuple (*i*,12, *j*,6). Your algorithm will begin by reading in these 4-tuples and setting them up in lists as in section 4.6. Each node on a list will have three fields: IOFF, JVAR, JOFF. Thus, for the equivalence 4-tuple above, the node will be put into the list for *i* and the node onto the list for *j*. Now process these lists outputting equivalence classes. With each class, output the relative position of each member of the class and check for conflicting equivalences .

In exercises 17-21 the sparse matrices are represented as in section 4.7

**17.** Let *A* and *B* be two sparse matrices represeted as in section 4.7. Write an algorithm. MADD(*A,B,C*) to create the matrix *C = A + B*. Your algorithm should leave the matrices *A* and *B* unchanged and set up *C* as a new matrix in accordance with this data representation. Show that if *A* and *B* are *n *x* m* matrices with *r _{A}* and

**18.** Let *A *and* B* be two sparse matrices. Write an algorithm MMUL(*A,B,C*) to set up the structure for *C = A * B*. Show that if *A* is a *n *x* m* matrix with *r _{A}* nonzero terms and if

**19.** Write an algorithm to write out the terms of a sparse matrix *A* as triples (*i, _{j},a_{ij}*). The terms are to be output by rows and within rows by columns. Show that this operation can be performed in time

**20.** Write an algorithm MTRP(*A,B*) to compute the matrix *B = A ^{T}*, the transpose of the sparse matrix

**21.** Design an algorithm to copy a sparse matrix. What is the computing time of your method?

**22.** A simpler and more efficient representation for sparse matrices can be obtained when one is restricted to the operations of addition, subtraction and multiplication. In this representation nodes have the same fields as in the representation of section 4.7. Each nonzero term is represented by a node. These nodes are linked together to form two circular lists. The first list, the rowlist, is made up by linking nodes by rows and within rows by columns. This is done via the RIGHT field. The second list, the column list, is made up by linking nodes via the DOWN field. In this list, nodes are linked by columns and within columns by rows. These two lists share a common head node. In addition, a node is added to contain the dimensions of the matrix. The matrix *A* of figure 4.11 has the representation shown on page 206.

**23.** For the representation of exercise 22 write algorithms to

* solid links represent row list

dashed links represent column list

**24. **Compare the sparse representations of exercise 22 and section 4.7 with respect to some other operations. For example how much time is needed to output the entries in an arbitrary row or column?

**25. **(a) Write an algorithm *BF*(*n*,*p*) similar to algorithm* FF* of section 4.8 to allocate a block of size *n* using a *best fit* strategy. Each block in the chain of available blocks has a SIZE field giving the number of words in that block. The chain has a head node, *AV* (see Figure 4.16). The best fit strategy examines each block in this chain. The allocation is made from the smallest block of size *n*. *P* is set to the starting address of the space allocated.

(b) Which of the algorithms BF and FF take less time?

(d) Do (c) for a sequence that can be met by FF but not by BF.

**26. **Which of the two algorithms ALLOCATE and FREE of section 4.8 require the condition TAG(0) = TAG(*m* + 1) = 1 in order to work right? Why?

**27. **The boundary tag method for dynamic storage management maintained the available space list as a doubly linked list. Is the XOR scheme for representing doubly linked lists (see exercise 12) suitable for this application? Why?

**28. **Design a storage management scheme for the case when all requests for memory are of the same size, say *k*. Is it necessary to coalesce adjacent blocks that are free? Write algorithms to free and allocate storage in this scheme.

**29. **Consider the dynamic storage management problem in which requests for memory are of varying sizes as in section 4.8. Assume that blocks of storage are freed according to the LAFF discipline (Last Allocated First Freed).

i) Design a structure to represent the free space.

ii) Write an algorithm to allocate a block of storage of size *n*.

iii) Write an algorithm to free a block of storage of size *n* beginning at *p*.

ii) Under the maximization criteria of (i), how small can the ratio

storage allocated

----------------- get?

M

**31. **[Buddy System] The text examined the boundary tag method for dynamic storage management. The next 6 exercises will examine an alternative approach in which only blocks of size a power of 2 will be allocated. Thus if a request for a block of size *n* is made, then a block of size 2^{}log^{2}n^{ }is allocated. As a result of this, all free blocks are also of size a power of 2. If the total memory size is 2* ^{m} *addressed from 0 to 2

**32. **[Buddy System Allocation] Using the available space list structure of exercise 31 write an algorithm to meet a request of size *n *if possible. Note that a request of size *n *is to be met by allocating a block of size 2^{k} k = [log_{2}*n*]. To do this examine the available space lists AVAIL(i), *k* *i* *m* finding the smallest *i* for which AVAIL(*i*) is not empty. Remove one block from this list. Let *P *be the starting address of this block. If *i* > *k*, then the block is too big and is broken into two blocks of size 2* ^{i}*-1 beginning at

**33. **[Locating Buddies] Two nodes in the tree of exercise 32 are said to be buddies if they are sibling nodes. Prove that two nodes starting at *x* and *y* respectively are buddies iff:

i) the KVALS for *x* and *y* are the same: and

**34. **[Freeing and Coalescing Blocks] When a block with KVAL *k* becomes free it is to be returned to the available space list. Free blocks are combined into bigger free blocks iff they are buddies. This combining follows the reverse process adopted during allocation. If a block beginning at *P* and of size *k* becomes free, it is to be combined with its buddy *P* 2* ^{k} *if the buddy is free. The new free block beginning at

**35.** i) Does the freeing algorithm of exercise 34 always combine adjacent free blocks? If not, give a sequence of allocations and freeings of storage showing this to be the case.

ii) How small can the ratio------------------------be for the Buddy System?

iv) How much time does the freeing algorithm take in the worst case to free a block of storage?

**36. **[Buddy system when memory size is not a power of 2]

**37. **Write a nonrecursive version of algorithm LERASE(*X*) of section 4.9.

**38. **Write a nonrecursive version of algorithm EQUALS(*S,T*) of section 4.9.

**39. **Write a nonrecursive version of algorithm DEPTH(*S*) of section 4.9.

**40. **Write a procedure which takes an arbitrary nonrecursive list *L* with no shared sublists and inverts it and all of its sublists. For example, if *L* = (*a*, (*b,c*)), then inverse (*L*) = ((*c,b*),*a*).

**41. **Devise a procedure that produces the list representation of an arbitrary list given its linear form as a string of atoms, commas, blanks and parentheses. For example, for the input *L* = (*a*,(*b,c*)) your procedure should produce the structure:

**42. **One way to represent generalized lists is through the use of two field nodes and a symbol table which contains all atoms and list names together with pointers to these lists. Let the two fields of each node be named ALINK and BLINK. Then BLINK either points to the next node on the same level, if there is one, or is a zero. The ALINK field either points to a node at a lower level or, in the case of an atom or list name, to the appropriate entry in the symbol table. For example, the list *B*(*A*,(*D,E*),( ),*B*) would have the representation:

iv) GETNODE(*X*) ... gets a node for use.

**43. **Rewrite algorithm MARK1 of section 4.10 using the conventions of section 4.9 for the tag field.

**44. **Rewrite algorithm MARK1 of section 4.10 for the case when each list and sublist has a head node. Assume that the DLINK field of each head node is free and so may be used to maintain a linked stack without using any additional space. Show that the computing time is still *O*(*m*).

**45. **When the DLINK field of a node is used to retain atomic information as in section 4.9, implementing the marking strategy of MARK2 requires an additional bit in each node. In this exercise we shall explore a marking strategy which does not require this additional bit. Its worst case computing time will however be *O*(*mn*) where *m *is the number of nodes marked and *n *the total number of nodes in the system. Write a marking algorithm using the node structure and conventions of section 4.9. Each node has the fields: MARK, TAG, DLINK and RLINK. Your marking algorithm will use variable *P* to point to the node currently being examined and NEXT to point to the next node to be examined. If *L *is the address of the as yet unexplored list node with least address and *P* the address of the node currently being examined then the value of NEXT will be min {*L,P* + 1 }. Show that the computing time of your algorithm is *O*(*mn*).

**46. **Prove that MARK2(*X*) marks all unmarked nodes accessible from *X.*

**47. **Write a composite marking algorithm using MARK1, MARK2 and a fixed amount *M* of stack space. Stack nodes as in MARK1 until the stack is full. When the stack becomes full, revert to the strategy of MARK2. On completion of MARK2, pick up a node from the stack and explore it using the composite algorithm. In case the stack never overflows, the composite algorithm will be as fast as MARK1. When *M* = 0, the algorithm essentially becomes MARK2. The computing time will in general be somewhere in between that of MARK1 and MARK2.

**48. **Write a storage compaction algorithm to be used following the marking phase of garbage collection. Assume that all nodes are of a fixed size and can be addressed NODE(*i*), 1 *i* *m*. Show that this can be done in two phases, where in the first phase a left to right scan for free nodes and a right to left scan for nodes in use is carried out. During this phase, used nodes from the right end of memory are moved to free positions at the left end. The relocated address of such nodes is noted in one of the fields of the old address. At the end of this phase all nodes in use occupy a contiguous chunk of memory at the left end. In the second phase links to relocated nodes are updated.

**49.** Write a compaction algorithm to be used in conjunction with the boundary tag method for storage management. Show that this can be done in one left to right scan of memory blocks. Assume that memory blocks are independent and do not reference each other. Further assume that all memory references within a block are made relative to the start of the block. Assume the start of each in-use block is in a table external to the space being allocated.

**50.** Design a dynamic storage management system in which blocks are to be returned to the available space list only during compaction. At all other times, a TAG is set to indicate the block has become free. Assume that initially all of memory is free and available as one block. Let memory be addressed 1 through *M*. For your design, write the following algorithms:

(ii) FREE(*n*,*p*) ... free a block of size *n* beginning at *p*.

**51.** Write an algorithm to make an in-place replacement of a substring of *X *by the string *Y*. Assume that strings are represented using fixed size nodes and that each node in addition to a link field has space for 4 characters. Use to fill up any unused space in a node.

**52.** Using the definition of STRING given in section 4.11, simulate the axioms as they would apply to (i) CONCAT(*S,T*) (ii) SUBSTR(S,2,3), (iii) INDEX(*S,T*) where *S* = 'abcde' and *T*= 'cde'.

**53.** If *X* = (*x*_{1}, ...,*x _{m}*) and

**54.** Let *X* and* Y* be strings represented as singly linked lists. Write a procedure which finds the first character of *X *which does not occur in the string *Y*.

**55.** Show that the computing time for procedure NFIND is still *O*(*mn*). Find a string and a pattern for which this is true.

**56.** (a) Compute the failure function for the following patterns:

**57.** The definition of the failure function may be strengthened to

(a) Obtain the new failure function for the pattern PAT of the text.

______________________

| | |

| Exponent | Link |

|_____________|________|

| Coefficient |

|______________________|

iii) RET(*X*) ... return node *X* to the storage pool.

Write and test (using the routines i-iv above) the following routines:

vii) PADD(*X, Y, Z*) ... *Z* = *X* + *Y*

viii) PSUB(*X, Y, Z*) ... *Z* = *X* - *Y*

ix) PMUL(*X, Y, Z*) ... *Z* = *X* * *Y*

xi) PERASE(*X*) ... return the circular list *X* to the storage pool.

Use the routine LENGTH to check that all unused nodes are returned to the storage pool.

* LN* = LENGTH(*Z*) + LENGTH(AVAIL)

**59. **[Programming Project] In this project, we shall implement a complete linked list system to perform arithmetic on sparse matrices using the representation of section 4.7. First, design a convenient node structure assuming VALUE is an integer for the computer on which the program is to be run. Then decide how many bits will be used for each field of the node. In case the programming language you intend to use is FORTRAN, then write function subprograms to extract various fields that may be packed into a word. You will also need subroutine subprograms to set certain fields. If your language permits use of structure variables, then these routines would not be needed. To begin with, write the basic list processing routines:

b) GETNODE(*X*) ... provide node *X* for use.

c) RET(*X*) ... return node *X* to the available space list.

Test these routines to see if they work. Now write and test these routines for matrix operations:

line 1:n m rn= # or rows

m= # or columns

r= # of nonzero terms

line 2

triples of (row, columns, value) on each line

liner+ 1

f) MERASE(*A*) . . . return all nodes of the sparse matrix *A* to the available space list.

g) MADD(*A,B,C*) ... create the sparse matrix *C* = *A* + *B*. *A* and *B* are to be left unaltered.

i) MMUL(*A,B,C*) ... create the sparse matrix *C* = *A* * *B*. *A* and *B* are to be left unaltered.

j) MTRP(*A,B*) ... create the sparse matrix *B *= *A ^{T}*.

**60.** [Programming Project] Do the project of exercise 59 using the matrix representation of exercise 22.