A file, as described in earlier chapters, is a collection of records where each record consists of one or more fields. For example, the records in an employee file could contain these fields:

Degree (Highest Degree Obtained)

Sample data for such a file is provided in figure 10.1.

Record E# Name Occupation Degree Sex Location MS Salary

-----------------------------------------------------------------------

A 800 HAWKINS programmer B.S. M Los Angeles S 10,000

B 510 WILLIAMS analyst B.S. F Los Angeles M 15,000

C 950 FRAWLEY analyst M.S. F Minneapolis S 12,000

D 750 AUSTIN programmer B.S. F Los Angeles S 12,000

E 620 MESSER programmer B.S. M Minneapolis M 9,000

The primary objective of file organization is to provide a means for record retrieval and update. The update of a record could involve its deletion, changes in some of its fields or the insertion of an entirely new record. Certain fields in the record are designated as key fields. Records may be retrieved by specifying values for some or all of these keys. A combination of key values specified for retrieval will be termed a *query*. Let us assume that in the employee file of figure 10.1. the fields Employee Number, Occupation, Sex and Salary have been designated as key fields. Then, some of the valid queries to the file are:

Retrieve the records of all employees with

Q3: Salary > average salary of all employees

Q4: (Sex = M *and* Occupation = Programmer) *or* (Employee Number > 700 and Sex = F)

Q1: Simple Query: The value of a single key is specified.

Q2: Range Query: A range of values for a single key is specified

Q3: Functional Query: Some function of key values in the file is specified (e.g. average or median)

Q4: Boolean Query: A boolean combination of Q1 - Q3 using logical operators *and, or, not.*

The mode of update may, again, be either real time or batched. Real time update is needed, for example, in a reservation system. As soon as a seat on a flight is reserved, the file must be changed to indicate the new status. Batched update would be suitable in a bank account system where all deposits and withdrawals made on a particular day could be collected on a transaction file and the updates made at the end of the day. In the case of batched update one may consider two files: the Master File and the Transactions File. The Master File represents the file status after the previous update run. The transaction file holds all update requests that haven't yet been reflected in the master file. Thus, in the case of batched update, the master file is always 'out of date' to the extent that update requests have been batched on the transaction file. In the case of a bank file using real time retrieval and batched update, this would mean that only account balances at the end of the previous business day could be determined, since today's deposits and withdrawals haven't yet been incorporated into the master file.

The simplest situation is one in which there is only one key, the only queries allowed are of type Q1 (simple query), and all retrievals and updates are batched. For this situation tapes are an adequate storage medium. All the required functions can be carried out efficiently by maintaining the master file on a tape. The records in the file are ordered by the key field. Requests for retrieval and update are batched onto a transaction tape. When it is time to process the transactions, the transactions are sorted into order by the key field and an update process similar to algorithm VERIFY2 of section 7.1 is carried out, creating a new master file. All records in the old master file are examined, changed if necessary and then written out onto a new master file. The time required for the whole process is essentially *O*(*n + m *log* m*) where *n* and *m* are the number of records in the master and transaction files respectively (to be more accurate this has to be multiplied by the record length). This procedure is good only when the number of transactions that have been batched is reasonably large. If *m = *1 and *n = *10* ^{6}* then clearly it is very wasteful to process the entire master file. In the case of tapes, however, this is the best we can do since it is usually not possible to alter a record in the middle of a tape without destroying information in an adjacent record. The file organization described above for tape files will be referred to as:

When records are of variable size, binary search can no longer be used as given the address of the first and last records in a file one can no longer calculate the address of the middle record. The retrieval time can be considerably reduced by maintaining an index to guide the search. An index is just a collection of key value and address pairs. In the case of the file of figure 10.1 stored in the physical sequence *B,E,D,A,C* at addresses *t*_{1,2}, *t*_{2,2}, *t*_{3,2}, *t*_{4,2} and *t*_{5,2} the index could contain five entries, one for each record in the file. The entries would be the pairs (510,*t*_{1,2}) (620,*t*_{2,2}) (750,*t*_{3,2}) (800,*t*_{4,2}) (900,*t*_{6,2}). An index which contains one entry for every record in the file will be referred to as a *dense index*. If a dense index is maintained for the primary key then retrieval of a record with primary key = *x* could be carried out by first looking into the index and finding the pair (*x,*addr). The desired record would then be retrieved from the location addr. The total number of accesses needed to retrieve a record would now be one plus the number of accesses needed to locate the tuple (*x*,addr) in the index. In section 10.2 we shall look at efficient indexing techniques that allow index searches to be carried out using at most three accesses even for reasonably large files. This means that a retrieval from the file of 10^{5} records discussed earlier could be carried out making at most four accesses rather than the seventeen accesses needed by a binary search. Since all addresses are kept in the index, it is not necessary to have fixed size records.

Let us summarize the ideas we have discussed. File organization is concerned with representing data records on external storage media. The choice of a representation depends on the environment in which the file is to operate, e.g., real time, batched, simple query, one key, multiple keys, etc. When there is only one key, the records may be sorted on this key and stored sequentially either on tape or disk. This results in a sequentially ordered file. This organization is good for files operating in batched retrieval and update mode when the number of transactions batched is large enough to make processing the entire file cost effective. When the number of keys is more than one or when real time responses are needed, a sequential organization in itself is not adequate. In a general situation several indexes may have to be maintained. In these cases, file organization breaks down into two more or less distinct aspects: (i) the directory (i.e. collection of indexes) and (ii) the physical organization of the records themselves. This will be referred to as the physical file. We have already discussed one possible physical organization i.e. sequential (ordered and unordered). In this general framework, processing a query or update request would proceed in two steps. First, the indexes would be interrogated to determine the parts of the physical file that are to be searched. Second, these parts of the physical file will be searched. Depending upon the kinds of indexes maintained, this second stage may involve only the accessing of records satisfying the query or may involve retrieving nonrelevant records too.

One of the important components of a file is the directory. A directory is a collection of indexes. The directory may contain one index for every key or may contain an index for only some of the keys. Some of the indexes may be dense (i.e., contain an entry for every record) while others may be nondense (i.e., contain an entry for only some of the records). In some cases all the indexes may be integrated into one large index. Whatever the situation, the index may be thought of as a collection of pairs of the form (key value, address). If the records *A,B,C,D,E *of figure 10.1 are stored on disk addresses* a*_{l},*a*_{2}, ...,*a*_{5} respectively then an index for the key Employee Number would have entries (800,*a*_{1}); (510,*a*_{2}); (950,*a*_{3}); (750,*a*_{4}) and (620,*a*_{5}). This index would be dense since it contains an entry for each of the records in the file. We shall assume that all the key values in an index are distinct. This may appear to be restrictive since several records may have the same key value as in the case of the Occupation key in figure 10.1. Records *A,D,E* all have the value 'programmer' for the Occupation key. This difficulty can be overcome easily by keeping in the address field of each distinct key value a pointer to another address where we shall maintain a list of addresses of all records having this value. If at address b_{1}, we stored the list of addresses of all programmer records i.e. *a*_{1}, *a*_{4} and *a*_{5}, and at *b*_{2} we stored the address list for all analysts, i.e*. a*_{2} and *a*_{3} then we could achieve the effect of a dense index for Occupation by maintaining an index with entries ('programmer', *b*_{1}) and ('analyst', *b*_{2}). Another alternative would be to change the format of entries in an index to (key value, address 1. address 2, ...address *n*). In both cases, different entries will have distinct key values. The second alternative would require the use of variable size nodes. The use of variable size nodes calls for complex storage management schemes (see section 4.8), and so we would normally prefer the first alternative. An index, then, consists of pairs of the type (key value, address), the key values being distinct. The functions one wishes to perform on an index are: search for a key value; insert a new pair; delete a pair from the index; modify or update an existing entry. These functions are the same as those one had to perform on a dynamic table (section 9.2). An index differs from a table essentially in its size. While a table was small enough to fit into available internal memory, an index is too large for this and has to be maintained on external storage (say, a disk). As we shall see, the techniques for maintaining an index are rather different from those used in chapter 9 to maintain a table. The reason for this is the difference in the amount of time needed to access information from a disk and that needed to access information from internal memory. Accessing a word of information from internal memory takes typically about 10^{-8 }seconds while accessing the same word from a disk could take about 10^{-1 }seconds.

The simplest type of index organization is the cylinder-surface index. It is useful only for the primary key index of a sequentially ordered file. It assumes that the sequential interpretation of disk memory is that of figure 10.2 and that records are stored sequentially in increasing order of the primary key. The index consists of a cylinder index and several surface indexes. If the file requires *c* cylinders (1 through *c*) for storage then the cylinder index contains *c* entries. There is one entry corresponding to the largest key value in each cylinder. Figure 10.4 shows a sample file together with its cylinder index (figure 10.4(b) ). Associated with each of the *c* cylinders is a surface index. If the disk has *s* usable surfaces then each surface index has *s* entries. The *i*'th entry in the surface index for cylinder *j*, is the value of the largest key on the *j*'th track of the *i*'th surface. The total number of surface index entries is therefore *c ** s.* Figure 10.4(c) shows the surface index for cylinder 5 of the file of figure 10.4(a). A search for a record with a particular key value *X* is carried out by first reading into memory the cylinder index. Since the number of cylinders in a disk is only

Aus. Australia HF heavy fighter

AB attack bomber LB light bomber

Br bomber MB medium bomber

C cylinder NB night bomber

FB fighter bomber NF night fighter

Fr fighter N no

GB Great Britain S surface

Ger. Germany y yes

HB heavy bomber

(Condensed from *Airplanes *by Enzo Angelucci, McGraw-Hill Book Co., New York, 1971.)

cylinder highest key value

----------------------------------

1 Bregeut 691

2 Heinkel III H

3 Junkers 188 E-l

4 Messerschmitt Sturmvogel

5 Spitfire Mk XVI

6 Vengeance

surface highest key value

--------------------------

1 Mitsubishi G 4M1

2 Mosquito MkV1

3 Nakajima B5N2

4 Spitfire MkXVI

The overflow handling techniques discussed in section 9.3.2 are:

(ii) open addressing (a) random

= loading factor = (number of entries)/(number of slots in hash table)

*Note *For the same C uses more space than L as overflows are handled in a separate area.

The AVL tree of section 9.2 provided a means to search, insert and delete entries from a table of size *n* using at most *O*(log *n*) time. Since these same functions are to be carried out in an index, one could conceivably use AVL trees for this application too. The AVL tree would itself reside on a disk. If nodes are retrieved from the disk, one at a time, then a search of an index with *n* entries would require at most 1.4 log *n* disk accesses (the maximum depth of an AVL tree is 1.4 log *n*). For an index with a million entries, this would mean about 23 accesses in the worst case. This is a lot worse than the cylinder sector index scheme of section 10.2.1. In fact, we can do much better than 23 accesses by using a balanced tree based upon an *m*-way search tree rather than one based on a binary search tree (AVL trees are balanced binary search trees).

**Definition:** An *m-way search tree, T*, is a tree in which all nodes are of degree *m*. If *T* is empty, (i.e., *T* = 0) then *T* is an *m*-way search tree. When *T* is not empty it has the following properties:

*n*, *A*_{o}, (*K*_{1},*A*_{1}), (*K*_{2},*A*_{2}), ..., (*K _{n}*,

(iii) All key values in the subtree *A _{i}* are less than the key value

(iv) All key values in the subtree *A _{n}* are greater than

(v) The subtrees *A _{i}*, 0

schematic

node format

------------------------

a 2,b,(20,c),(40,d)

b 2,0,(10,0),(15,0)

c 2,0,(25,0),(30,e)

d 2,0,(45,0),(50,0)

e 1,0,(35,0)

procedureMSEARCH(T,X)

//Search them-way search treeTresiding on disk for the key

valueX. Individual node format isn,A_{0}, (K_{1},A_{1}), ...,(K,_{n}A),_{n}

n<m. A triple (P,i,j) is returned.j= 1 impliesXis found

at nodeP, keyK. Else_{i}j= 0 andPis the node into which

Xcan be inserted//

1PT; K_{0}[-];Q0 //Qis the parent ofP//

2whileP0do

3inputnode P from disk

4Let P define n,A_{0},(K_{1},A_{1}), ..., (K,_{n}A)_{n}

5K1 [+]_{n+}

6Let i be such that K_{i}X<K1_{i+}

7ifX=K_{i}then[//Xhas been found//return(P,i,1)]

8QP;PA_{i}

9end

//Xnot inT;return node into which insertion can take place//

10return(Q,i,0)

11endMSEARCH

Clearly, the potentials of high order search trees are much greater than those of low order search trees. To achieve a performance close to that of the best *m*-way search trees for a given number of entries *n*, it is necessary that the search tree be balanced. The particular variety of balanced *m*-way search trees we shall consider here is known as a *B*-tree. In defining a *B*-tree it is convenient to reintroduce the concept of failure nodes as used for optimal binary search trees in section 9.1. A failure node represents a node which can be reached during a search only if the value *X* being searched for is not in the tree. Every subtree with root = 0 is a point that is reached during the search iff *X* is not in the tree. For convenience, these empty subtrees will be replaced by hypothetical nodes called failure nodes. These nodes will be drawn square and marked with an *F*. The actual tree structure does not contain any such nodes but only the value 0 where such a node occurs. Figure 10.11 shows the 3-way search tree of figure 10.9 with failure nodes. Failure nodes are the only nodes that have no children.

(i) the root node has at least 2 children

(ii) all nodes other than the root node and failure nodes have at least m*/*2 children

(iii) all failure nodes are at the same level.

The 3-way search tree of figure 10.11 is not a *B*-tree since it has failure nodes at levels 3 and 4. This violates requirement (iii). One possible *B*-tree of order 3 for the data of figure 10.9 is shown in figure 10.12. Notice that all nonfailure nodes are either of degree 2 or 3. In fact, for a *B*-tree of order 3, requirements (i), (ii) and the definition

Number of Key Values in a B-Tree

N+ 1 = number of failure nodes inT

= number of nodes at levell+ 1

2m/2- 1^{l}

and so,N2m/2- 1^{l}_{ }-_{ }1_{,}l1

t=_{i}t_{s}+t+_{l}m( + 2)t_{c}

=a+bm

wherea=t_{s}+t= seek time + latency time_{l}

b= ( + 2)t_{c}andt_{c}= transmission time per character

For an index with *N* entries, the number of levels, *l*, is bounded by:

The maximum search time is therefore given by (10.2).

m,A_{0,}(K_{1},A_{1}) ...,(K,_{m}A)_{m}

andK<_{i}K+1, 1_{i}i<m.

This node may be split into two nodes *P* and *P*' with the following formats

nodeP:m/2 - 1,A_{0},(K_{1},A_{1}), ...,(K_{}_{m/2}_{-1},A_{}_{m/2}_{-1})

nodeP':m-m/2,A_{}_{m/2}, (K_{}_{m/2}_{+1},A_{}_{m/2}_{+1}), ...,(K,_{m}A)_{m}

procedureINSERTB(T,X)

//Key valueXis inserted into theB-tree,T, of orderm.T

resides on a disk.//

1A0;KX//(K,A) is tuple to be inserted//

2 (P,i,j,)MSEARCH(T,X) //Pis node for insertion//

3ifj0then return//Xalready inT//

4whileP0do

5insert (K,A)into appropriate position in P. Let the resulting

node have the form: n,A_{0}, (K_{1},A_{1}), ..., (K_{n},A_{n}).

6ifnm- 1then[//resulting node is not too big//

outputP onto disk;return]

//Phas to be split//

7Let P and P'be defined as in equation 10.4

8outputP and P'onto the disk

9KK_{}_{m/2};AP';PPARENT(P)

10end

//new root is to be created//

11create a new node R with format1,T,(K,A)

12TR;outputT onto disk

13endINSERTB

s= (total number of splittings)/N

(p- 2)/ {1 + (m/2 - 1)(p- 1)}

< 1/(m/2 - 1).

procedureDELETEB(T,X)

//deleteXfrom theB-tree,T, of orderm. It is assumed that

Treside on a disk//

1 (P,i,j)MSEARCH(T,X)

2ifj 1then return//Xnot inT//

3let P be of the form:,(_{n,}A_{o}K_{1},A_{1}), ..., (K) and_{n},A_{n}K=_{i }X

4ifA_{o }0then[//deletion from non leaf, find key to move up//

5QA//move to right subtree//_{i}

6loop

7let Q be of the form:

n',A'_{0}, (K'_{1},A'_{1}) ..., (K')_{n},A'_{n'}

8ifA'_{0}= 0then exit//Qis a leaf

node//

9QA'_{0}

10forever

11replace K_{i}in node P by K'_{1}and write the

altered nodePonto disk.

12PQ,i1;

13let n,A_{0},(K_{1},A_{1}), ...,(K,_{n}A)_{n}be as

defined by the new P]

//delete K_{i }from nodeP, a leaf//

14from P: n,A_{0},(K_{1},A_{1}), ...,(K)_{n},A_{n}delete(K)_{i},A_{i}and replace n

byn- 1

15whilen<m/2 - 1andPTdo

16ifP has a nearest right sibling, Y,then

17 [let Y: n",A"_{0}, (K"_{1,}A"_{1}), ...,(K"",_{n}A"")_{n}and

18let Z: n',A'_{0}, (K'_{1},A'_{1}),...,(K',_{n'}A') be the parent of_{n'}P

and Y

19let A'_{j }=Y and A'-1 =_{j}P

20ifn"m/2then[//redistribute key values//

21 //updateP// (K+1,_{n}A+1) (_{n}K'_{j},A"_{0});

n n + 1

22 //updateZ//K'_{j }K"_{1}

23 //updateY// (n",A"_{0},(K"_{1},A"_{1}),...)

(n"- 1,A"_{1,}(K"_{2,}A"_{2})...)

24outputnodes P,Z and Y onto

disk

25return]

//combineP, K'and_{j}Y//

26r2m/2 - 2

27outputr,A_{0,}(K_{1,}A_{1}) ... (K),(_{n},A_{n}K'_{j,}A"_{0}),(K"_{1,}A"_{1}),

... (K"_{n",}A"_{n"}) as new node P

28 //update// (n,A_{0}, ...,) (n'- 1,A.'_{0},...,(K'1,_{j-}A'1),_{j-}

(K'+1,_{j}A'+1)...)_{j}

29PZ]

else

30 [//P must have a left sibling//

this is symmetric to lines 17-29

and is left as an exercise]

31end

32ifn0then[outputP:(n,A_{0},...,(K)]_{n},A_{n}

33else[TA_{0}]

34endDELETEB

For a variation of *B*-trees see exercises 10-14 through 10-20.

An index structure that is particularly useful when key values are of varying size is the trie. A *trie *is a tree of degree* m * 2 in which the branching at any level is determined not by the entire key value but by only a portion of it. As an example consider the trie of figure 10.18. The trie contains two types of nodes. The first type we shall call a *branch node *and the second an *information node*. In the trie of figure 10.18 each branch node contains 27 link fields. All characters in the key values are assumed to be one of the 26 letters of the alphabet. A blank is used to terminate a key value. At level 1 all key values are partitioned into 27 disjoint classes depending on their first character. Thus, LINK(*T*,*i*) points to a subtrie containing all key values beginning with the *i*-th letter (*T *is the root of the trie). On the *j *th level the branching is determined by the* j *th character. When a subtrie contains only one key value, it is replaced by a node of type information. This node contains the key value, together with other relevant information such as the address of the record with this key value, etc. In the figure, branch nodes are represented by rectangles while ovals are used for information nodes.

The search algorithm for tries is very straightforward and one may readily verify that the worst case search time is *O*(*l*) where *l *is the number of levels in the trie (including both branch and information

procedureTRIE(T,X)

//Search a trieTfor key valueX. It is asssumed that branching

on thei-th level is determinded by thei-th character of the key

value.//

KX' ' //concatenate a blank to the end ofX//

i1;PT

whileP is a branch nodedo

Ci-th character ofK

PLINK (P,C);i+ 1

end

ifP= 0orKEY(P)Xthen return(0) //Xnot in trie//

return(P)

endTRIE

(a) SAMPLE(*X,i*) = *i*'th character of *X*.

Some other possible choices for this function are (*X *= *x*_{1} *x*_{2} ... *x _{n}*)

(c) SAMPLE(X,i) = xr(Xi) for r(X,i) a randomization function

The problems associated with sequential organizations were discussed earlier. The most popular sequential organization scheme is ISAM, in which a cylinder-surface index is maintained for the primary key. In order to efficiently retrieve records based on other keys, it is necessary to maintain additional indexes on the remaining keys (i.e. secondary keys). The structure of these indexes may correspond to any of the alternative index techniques discussed in the previous section. The use of secondary key indexes will be discussed in greater detail in connection with inverted files.

In this organization, records are stored at random locations on disk. This randomization could be achieved by any one of several techniques. Some of these techniques are direct addressing, directory lookup and hashing.

In direct addressing with equal size records, the available disk space is divided out into nodes large enough to hold a record. The numeric value of the primary key is used to determine the node into which a particular record is to be stored. No index on this key is needed. With primary key = Employee #, the record for Employee # = 259 will be stored in node 259. With this organization, searching and deleting a record given its primary key value requires only one disk access. Updating a record requires two (one to read and another to write back the modified record). When variable size records are being used an index can be set up with pointers to actual records on disk (see figure 10.24). The number of accesses needed using this scheme is one more than for the case when memory was divided into fixed size nodes. The storage management scheme becomes more complex (section 10.4). The space efficiency of direct accessing depends on the identifier density *n*/*T* (*n* = number of distinct primary key values in the file, *T* = total number of possible primary key values). In the case of internal tables, this density is usually very low and direct addressing was very space inefficient.

This is very similar to the scheme of figure 10.24 for the case of direct addressing with variable size records. Now, however, the index is not of direct access type but is a dense index maintained using a structure suitable for index operations. Retrieving a record involves searching the index for the record address and then accessing the record itself. The storage management scheme will depend on whether fixed size or variable size nodes are being used. Except when the identifier density is almost 1, this scheme makes a more efficient utilization of space than does direct addressing. However it requires more accesses for retrieval and update, since index searching will generally require more than 1 access. In both direct addressing and directory lookup, some provision must be made for collisions (i.e. when two or more records have the same primary key value). In many applications the possibility of collisions is ruled out since the primary key value uniquely identifies a record. When this is not the case, some of the schemes to be discussed in section 10.3.3 and section 10.3.4 may be used.

The principles of hashed file organization are essentially the same as those for a hashed index. The available file space is divided into buckets and slots. Some space may have to be set aside for an overflow area in case chaining is being used to handle overflows. When variable size records are present, the number of slots per bucket will be only a rough indicator of the number of records a bucket can hold. The actual number will vary dynamically with the size of records in a particular bucket.

Linked organizations differ from sequential organizations essentially in that the logical sequence of records is generally different from the physical sequence. In a sequential organization, if the *i*'th record of the file is at location *l _{i}* then the

The logical order of records in any particular list may or may not

Notice that in addition to key values and pointers to lists, each index entry also contains the length of the corresponding list. This information is useful when retrieval on boolean queries is required. In order to meet a query of the type, retrieve all records with Sex = female and Occupation = analyst, we search the Sex and Occupation indexes for female and analyst respectively. This gives us the pointers *B* and *B*. The length of the list of analysts is less than that of the list of females, so the analyst list starting at *B* is searched. The records in this list are retrieved and the Sex key examined to determine if the record truly satisfies the query. Retaining list lengths enables us to reduce search time by allowing us to search the smaller list. Multilist structures provide a seemingly satisfactory solution for simple and range queries. When boolean queries are involved, the search time may bear no relation to the number of records satisfying the query. The query *K*1 = *XX* *and K*2 = *XY* may lead to a *K*1 list of length *n* and a *K*2 list of length *m*. Then, min{*n*,*m*} records will be retrieved and tested against the query. It is quite possible that none or only a very small number of these min{*n*,*m*} records have both *K*1 = *XX* and *K*2 = *XY*. This situation can be remedied to some extent by the use of *compound keys*. A compound key is obtained by combining two or more keys together. We could combine the Sex and Occupation keys to get a new key Sex-Occupation. The values for this key would be: female analyst, female programmer, male analyst and male programmer. With this compound key replacing the two keys Sex and Occupation, we can satisfy queries of the type, all male programmers or all programmers, by retrieving only as many records as actually satisfy the query. The index size, however, grows rapidly with key compounding. If we have ten keys *K*_{1}, ...,*K*_{10}, the index for *K _{i}* having

The coral ring structure is an adaptation of the doubly linked multilist structure discussed above. Each list is structured as a circular list with a headnode. The headnode for the list for key value *K _{i}* =

Conceptually, inverted files are similar to multilists. The difference is that while in multilists records with the same key value are linked together with link information being kept in individual records, in the case of inverted files this link information is kept in the index itself. Figure 10.28 shows the indexes for the file of figure 10.1. A slightly different strategy has been used in the *E*# and salary indexes than was used in figure 10.26, though the same strategy could have been used here too. To simplify further discussion, we shall assume that the index for every key is dense and contains a value entry for each distinct value in the file. Since the index entries are variable length (the number of records with the same key value is variable), index maintenance becomes more complex than for multilists. However, several benefits accrue from this scheme. Boolean queries require only one access per record satisfying the query (plus some accesses to process the indexes). Queries of the type *K*1 = *XX* or *K*2 = *XY* can be processed by first accessing the indexes and obtaining the address lists for all records with *K*1 = *XX* and *K*2 = *XY*. These two lists are then merged to obtain a list of all records satisfying the query. *K*1 = *XX* *and* *K*2 = *XY* can be handled similarly by intersecting the two lists. *K*1 = .not. *XX* can be handled by maintaining a universal list, *U*, with the addresses of all records. Then, *K*1 = .not. *XX* is just the difference between *U* and the list for *K*1 = *XX*. Any complex boolean query may be handled in this way. The retrieval works in two steps. In the first step, the indexes are processed to obtain a list of records satisfying the query and in the second, these records are retrieved using this list. The number of disk accesses needed is equal to the number of records being retrieved plus the number to process the indexes. Exercise 34 explores the time savings that can result using this structure rather than multilists.

In order to reduce file search times, the storage media may be divided into cells. A cell may be an entire disk pack or it may simply be a cylinder. Lists are localised to lie within a cell. Thus if we had a multilist organization in which the list for KEY1 = PROG included records on several different cylinders then we could break this list into several smaller lists where each PROG list included only those records in the same cylinder. The index entry for PROG will now contain several entries of the type (addr, length), where addr is a pointer to the start of a list of records with KEY1 = PROG and length is the number of records on this list. By doing this, all records in the same cell (i.e. cylinder) may be accessed without moving the read/write heads. In case a cell is a disk pack then using cellular partitions it is possible to search different cells in parallel (provided the system hardware permits simultaneous reading/writing from several disk drives).

procedureGETNODE(I)

ifAV= 0then[stop]

IAV;XGET(AV);AVLINK(X)

endGETNODE

procedureRET(I)

LINK(I)AV;WRITE(I);AVI

endRET

For additional material on file structures and data base management systems see:

*File structures for on line systems* by D. Lefkovitz, Hayden Book Co., New Jersey, 1969.

*Data Management for on line systems* by D. Lefkovitz, Hayden Book Co., New Jersey, 1974.

*Computer data base organization *by J. Martin, Prentice-Hall, Englewood Cliffs, 1975.

Additional material on indexes can be found in:

1. A file of employee records is being created. Each record has the following format:

E# NAME Occupation Location

All four fields have been designated as keys. The file is to consist of the following 5 records:

A 10 JAMES PROG MPLS

B 27 SHERRY ANAL NY

C 39 JEAN PROG NY

D 50 RODNEY KEYP MPLS

E 75 SUSAN ANAL MPLS

d) Sequential ordered by name, inverted on *E*# and location, ringed on occupation.

**2. **Write an algorithm to process a tape file in the batched mode. Assume the master file is ordered by increasing primary key value and that all such values are distinct. The transaction file contains transactions labeled: update, delete and insert. Each such transaction also contains the primary key value of the record to be updated, deleted or inserted. A new updated master file is to be created. What is the complexity of your algorithm?

**3. **Show that all *B*-trees of order 2 are full binary trees.

**4. **(a) Into the following 2-3 tree insert the key 62 using algorithm INSERTB.

**5.** Complete line 30 of algorithm DELETEB.

**6.** Write insertion and deletion algorithms for *B-*trees assuming that with each key value is associated an additional field *F* such that *F* = 1 iff the corresponding key value has not been deleted. Deletions should be accomplished by simply setting the corresponding *F* = 0 and insertions should make use of deleted space whenever possible without restructuring the tree.

**7.** Write algorithms to search and delete keys from a *B*-tree by position. I.e., SEARCH(*k*) finds the *k-*th smallest key and DELETE(*k*) deletes the *k-*th smallest key in the tree. (*Hint:* In order to do this efficiently additional information must be kept in each node. With each pair (K* _{i},A_{i}*) keep (number of key values in the subtree A

* 8.* Program the AVL tree insertion algorithm of section 9.2 and also the

**9.** Modify algorithm INSERTB so that in case *n = m* in line 6 then we first check to see if either the nearest left sibling or the nearest right sibling of *P* has fewer than *m* - 1 key values. If so, then no additional nodes are created. Instead, a rotation is performed moving either the smallest or largest key in *P* to its parent. The corresponding key in the parent together with a subtree is moved to the sibling of *P* which has space for another key value.

**10.** [Bayer and McCreight] The idea of exercise 9 can be extended to obtain improved *B-*tree performance. In case the nearest sibling, *P*', of *P* already has* m *- 1 key values then we can split both *P* and *P*' to obtain three nodes *P*, *P*' and *P*\" with each node containing (2*m - *2)/3, (2*m - *1)/3 and 2*m*/3 key values. Figure 10.29 below describes this splitting procedure when *P*' is *P*'s nearest right sibling.

Rewrite algorithm INSERTB so that node splittings occur only as described above.

**11.** A* B**-tree, *T*, of order *m* is a search tree that is either empty or is of height 1. When *T* is not empty then the extended tree *T* (i.e., *T* with failure nodes added) satisfies the following conditions:

(i) The root node has at least 2 and at most 2(2*m *- 2)/3 + 1 children.

(ii) The remaining nonfailure nodes have at most *m* and at least (2*m - *1)/3 children each.

(iii) All failure nodes are on the same level.

For a *B**-tree of order *m* and containing *N* key values show that if *x= *(2*m - *1)/3 then

(a) The depth, *d*, of *T* satisfies:

(b) the number of nodes *p* in *T* satisfies:

What is the average number of splittings if *T* is built up starting from an empty* B**-tree?

**12.** Using the splitting technique of exercise 10 write an algorithm to insert a new key value *X* into a *B**-tree, *T*, of order *m*. How many disk accesses are made in the worst case and on the average? Assume that *T* was initially of depth *l* and that *T* is maintained on a disk. Each access retrieves or writes one node.

**13.** Write an algorithm to delete the identifier *X* from the *B**-tree, *T*, of order *m*. What is the maximum number of accesses needed to delete *X* from a *B**-tree of depth *l.* Make the same assumptions as in exercise 12.

**14.** The basic idea of a *B*-tree may be modified differently to obtain a *B*'-tree. A *B*'-tree of order *m* differs from a *B-*tree of order *m* only in that in a *B*'-tree identifiers may be placed only in leaf nodes. If *P *is a nonleaf node in a *B*'-tree and is of degree *j *then the node format for *P* is: *j*, *L*(1), *L*(2), ... *L*(*j - *1) where *L*(*i*), 1 *i *< *j *is the value of the largest key in the *i*'th subtree of *P*. Figure 10.30 shows two *B*'-trees of order 3. Notice that in a *B*'-tree the key values in the leaf nodes will be increasing left to right. Only the leaf nodes contain such information as the address of records having that key value. If there are *n* key values in the tree then there are *n *leaf nodes. Write an algorithm to search for *X* in a *B*'-tree *T* of order 3. Show that the time for this is *O*(log* n*).

**15.** For a *B*'-tree of order 3 write an algorithm to insert *X*. Recall that all non leaf nodes in a *B*'-tree of order 3 are of degree 2 or 3. Show that the time for this is *O*(log *n*).

**16.** Write an algorithm to delete *X* from a *B*'-tree, *T*, of order 3. Since all key values are in leaf nodes, this always corresponds to a deletion from a leaf. Show that if *T *has *n* leaf nodes then this requires only *O*(log *n*) time.

**17.** Let *T* and *T*' be two *B*'-trees of order 3. Let *T*\" be a *B*'-tree of order 3 containing all key values in *T* and *T*\". Show how to construct *T*\" from *T* and *T*' in *O*(log *n*) time.

**18.** Write computer programs to insert key values into AVL trees, *B*-trees of order 3, *B**-trees of order 3 and *B*'-trees of order 3. Evaluate the relative performance of these four representations of internal tables.

**19.** Do exercise 18 when the required operations are search for *X*, insert *X* and delete *X*.

**20.** Obtain SEARCH, INSERT and DELETE algorithms for *B*'-trees of order *m*. If the tree resides on disk, how many disk accesses are needed in the worst case for each of the three operations. Assume the tree has *n* leaf nodes.

**21.** Draw the trie obtained for the following data: Sample the keys left to right one character at a time. Using single character sampling obtain an optimal trie for the above data (an optimal trie is one with the fewest number of levels).

AMIOT, AVENGER, AVRO, HEINKEL, HELLDIVER, MACCHI, MARAUDER, MUSTANG, SPITFIRE, SYKHOI

**22.** Write an algorithm to insert a key value *X* into a trie in which the keys are sampled left to right, one character at a time.

**23.** Do exercise 22 with the added assumption that the trie is to have no more than 6 levels. Synonyms are to be packed into the same information node.

**24.** Write an algorithm to delete *X* from a trie *T* under the assumptions of exercise 22. Assume that each branch node has a count field equal to the number of information nodes in the subtrie for which it is the root.

**25.** Do exercise 24 for the trie of exercise 23.

**26.** In the trie of Figure 10.23 the nodes _{1} and _{2} each have only one child. Branch nodes with only one child may be eliminated from tries by maintaining a SKIP field with each node. The value of this field equals the number of characters to be skipped before obtaining the next character to be sampled. Thus, we can have SKIP (_{3}) = 2 and delete the nodes _{1} and _{2}. Write algorithms to search, insert and delete from tries in which each branch node has a skip field.

**27.** Write an algorithm to insert a record, *Z*, into a multilist file with *n* keys. Assume that the order of records in individual lists is irrelevant. Use the primitives SEARCH(*X*) and UPDATE(*X,A*) to search and update an index. UPDATE(*X,A*) changes the address pointer for *X* to *A*. How many disk accesses (in addition to those needed to update the index) are needed?

**28.** Write an algorithm to delete an arbitrary record *Z* from a multilist file (see exercise 27). How many disk accesses are needed?

**29.** Write an algorithm to insert record *Z *into a multilist file assuming that individual lists are ordered by the primary key KEY(*Z,*1). How many accesses are needed for this (exclude index accesses)?

**30.** Assuming that each list in a multilist file is a doubly linked list, write an algorithm to delete an arbitrary record *Z*. The forward link for key *i* is ALINK(*Z*,*i*) while the corresponding backward link is BLINK(*Z*,*i)*.

**31.** Write an algorithm to output all key values for record *Z* in a ring structure. How many accesses are needed for this? How many accesses are needed to do the same in a multilist file?

**32.** Write an algorithm to delete an arbitrary record *Z* from a ring file.

**33.** Describe briefly how to do the following:

(iii) How could the following be carried out using an inverted organization:

a) output all records with KEY1 = PROG *and* KEY2 = *NY*

b) output all records with KEY1 = PROG *or* KEY2 = *NY*

c) output all records with KEY1 = PROG *and* KEY2 *NY*

How many accesses are needed in each case (exclude accesses to get indexes)?

**34.** A 10^{5} record file is maintained as an inverted file on a disk with track capacity 5000 characters. This disk has 200 tracks on each of its 10 surfaces. Each record in the file is 50 characters long and has five key fields. Each key is binary (i.e., has only two distinct values) and so the index for each key can be maintained as a binary bit string of length 10^{5} bits. If 1 character is 6 bits long, then each index takes about 4 tracks. How should the 5 indexes be stored on disk so as to minimize total seek time while processing the indexes in order to determine which records satisfy a given boolean query *Q*? This processing involves reading in 1 track of each index and testing the query against records represented by this track. Then the next set of index tracks is input and so on. How much time does it take to process all the indexes in order to determine which records are to be retrieved? Assume a seek time of 1/10 sec and a latency time of 1/40 sec. Also assume that only the input time is significant. If *k *records satisfy this query, how much more time is needed to retrieve these *k *records? Using other file structures it may be necessary to read in the whole file. What is the minimum time needed to read in the entire file of 10^{5} records? How does this compare with the time needed to retrieve *k *records using an inverted file structure?

**35.** Determine how many disk accesses are needed to allocate and free storage using algorithms ALLOCATE and FREE of section 4.8 for the boundary tag method when the storage being managed is disk storage.

**36.** Write storage management algorithms for disk storage maintaining a list of free blocks in internal memory. Use first fit for allocation. Adjacent free blocks should be coalesced when a block is freed. Show that both these tasks can be accomplished in time *O*(*n*) where *n *is the number of free blocks.