CHAPTER 17: SPECIAL-PURPOSE HARDWARE FOR INFORMATION RETRIEVAL

Lee A. Hollaar

Department of Computer Science, University of Utah

Many approaches for using special purpose hardware to enhance the performance of an information retrieval have been proposed or implemented. Rather than using conventional digital computers or parallel processors developed for general applications, these special purpose processors have been designed specifically for particular facets of information retrieval, such as the rapid searching of unstructured data or manipulating surrogates or indices for the information in the database. After discussing the reasons why a special purpose processor may be desirable, three different techniques for implementing a text scanner are presented--associative memories, finite state machines, and cellular arrays. The problems of using a conventional microprocessor are discussed, along with possible hardware enhancements to improve its performance. System performance considerations, especially when the use of an index or other surrogate shift the performance from text scanning to a "seek and search" mode, are covered. Finally, special purpose processors for handling surrogates and the use of optical processing are presented.

17.1 INTRODUCTION

The previous chapters have described techniques for implementing information retrieval systems on general purpose computing systems, including parallel machines designed to handle a variety of applications. However, there are times when an information retrieval system implemented using general purpose hardware may not be cost-effective or provide adequate performance. For example, to achieve satisfactory performance most information retrieval systems use some form of index. However, for a database consisting of messages that must be available as soon as they are received, there is no time available to construct, update, and maintain an index. Instead, special purpose searching hardware can provide the desired solution.

Special purpose hardware development for information retrieval has generally fallen into two areas--systems for rapidly searching unstructured data and systems for manipulating surrogates or indices for the information in the database. While most special purpose systems have been implemented using conventional digital logic, recently the use of optical processing has been suggested. The availability of fast microprocessors may also lead to special purpose systems that resemble a conventional workstation, but have additional logic to enhance their processing of information, more efficient connection and use of peripheral devices such as disk drives, or operating system software optimized for the particular application.

17.1.1 Search Patterns

The design of a special purpose processor is heavily influenced by the types of operations that will be performed. In the case of text searching, these are the patterns and operators that are allowed when specifying a query. The most basic operation is performing an exact match search for a specified string of characters, which may represent either a single word or phrase. Proper handling of phrases may be more complex than simply searching for the words of the phrase each separated by a blank. Unless the text in the database has been carefully edited so that it fits a standard form, the words of the desired phrase may not be separated only by a single blank, but could be separated by a newline sequence (if the phrase started on one line and ended on another), more than one blank or some other space character (like a horizontal tab), typographic markup (such as changing from roman to italic type), a footnote reference, or other complications. Most special purpose text searchers can only handle the most basic phrase structures.

In addition to exactly matching a string of characters, it is often desirable to specify a more complex match pattern. A regular expression can specify not only a specific character to match at a pattern location, but can indicate a set of characters (such as an upper- or lowercase A, any vowel, the start of a line, or a word delimiter). It also can indicate that a pattern element can optionally occur or can occur more than once, or indicate two or more alternative patterns that are acceptable in a location.

While regular expressions could be used to specify the search pattern, often an information retrieval system will allow only a specific set of operations based on the nature of searching text. The most common of these operations is the don't care or wildcard, which specifies that any character is acceptable in a particular location in the search pattern (single-character or fixed-length don't care, FLDC) or an arbitrary number of characters are acceptable at the location (variable-length don't care, VLDC). The VLDC can either come at the start of a word (an initial VLDC, matching a pattern with any prefix), end of a word (a terminal VLDC, matching a pattern with any suffix), at both the start or end of a word (matching a specified stem), embedded within a word (an embedded VLDC), or any combination of these. Some text searchers cannot handle VLDCs, or can handle only initial or terminal VLDCs.

One other special pattern may specify that a numeric string should occur at the pattern location, and its value should be within a given range. Another pattern may specify an error tolerance, where the pattern is matched if there are fewer than n characters that don't match. These options are not commonly available on hardware-based text searchers (Yu et al. 1986).

Search expression often contain two or more patterns combined with Boolean operators, the most common being and, or, and and not. Word proximity, an extended form of and, indicates that two search terms need to be located within a specified number of words in the database for a match to occur. Proximity can be either directed (X must be followed within n words by Y) or undirected (R and S must be within n words of each other, but in any order). Like phrases, word proximity is complicated by markup or other information embedded in the text.

A search expression also can specify that a term, phrase, or Boolean combination must occur in particular contexts for a match to occur. For example, a search may look for "hardware and text retrieval in abstract" or "patent and software within 2 sentences." Other operators may expand the number of terms in a query. A macro facility allows the user to include a predefined subexpression into a query, while a thesaurus can automatically form the or of all synonyms of a word. For example, "airplane" might expand to "airplane or plane or 747 or DC-10 . . . ."

17.1.2 Hardware Implementations

Many people have proposed implementations of information retrieval functions using custom hardware. Many of these consider only one aspect of information retrieval, without considering its effect on the overall retrieval system and its performance. Simple searchers, capable of exact matching against a small number of patterns (possibly containing don't cares) have been particularly popular as a VLSI design project. However, only a small number of these have been incorporated into complete information retrieval systems.

The algorithms implemented in hardware differ from the algorithms described in the previous chapters in two fundamental ways. They are much simpler but do many things in parallel or inherently in the logic. The reason for the simplicity is that each alternative in an algorithm often requires its own logic or increases the complexity of the other hardware. This not only complicates the design, but can result in slower performance because of longer data paths and increased loading of output drivers. It is also easier to design a simple function that is then replicated many times than a large single unit incorporating a variety of functions.

Many operations that would be cumbersome on a conventional processor are very simple to perform in custom hardware. As an example, the program to reverse the order of the bits of a word takes a number of steps of isolating, aligning, and replacing each bit of the word. In hardware, it simply takes the changing of the connections from straight-through to reversed. No active logic is required and the operation is done in zero time. If operations do not have to share hardware, they can be done simultaneously (for example, the shift-and-add being done in parallel with the decrementing of a counter during a multiplication). Because of this, most hardware solutions are not simply faster implementations of standard information retrieval algorithms.

17.2 TEXT SEARCHERS

The most commonly proposed special purpose hardware system for text information retrieval is a searcher. The rationale for needing a special search engine is that a conventional processor cannot search rapidly enough to give reasonable response time. For example, egrep on a 12 MIPS RISC processor (Sun SPARCstation-l) can search at about 250 KBytes per second. The searching of a 10 GByte database would take 40,000 seconds (over 11 hours). If a large database resides on a number of disk drives, it will be faster and more efficient not to bring the raw information from the disk into a central processor, but instead search it with a processor closer to the disk. Because each search processor is operating only on the data on its disk, the search time will remain the same as the database size increases and the additional disks (each with a searcher) necessary to hold the database increases the parallelism of the search.

However, because of the slow searching speeds for a central processor, most commercial information retrieval systems use some sort of inverted file index rather than actually searching the documents. The argument in the past against inverted files was that they substantially increase the storage requirements of a database. The index may take more disks than the database itself (Bird et al. 1978), but this may be less of a concern with lower-cost disk drives. This means that to be competitive, the cost of searching must be comparable to the cost of a disk drive.

However, there are a number of problems associated with inverted files or other document surrogates besides their storage requirements. In many cases, information is discarded to reduce the size of the surrogate. For inverted files, commonly occurring words are generally not included in the index. This means that a search for a phrase like "changing of the guard" really finds anything that has "changing," any two words, and "guard." The phrase "To be or not to be, that is the question" matches any document containing the word "question," since all other words in the phrase are discarded. A similar problem exists for superimposed code words, where the combining of the codes for two words makes it seem like another word is present when it is not.

These artifacts caused by the use of a surrogate tend to confuse a user because it is not clear why a document that doesn't match the query was retrieved. Time is spent trying to understand an irrelevant document, perhaps more time than would be spent reviewing a relevant one. But even if this were not the case, there are other problems associated with the use of a surrogate for locating documents. Time must be spent building and maintaining the surrogate, a nontrivial amount of time for very large databases. Also, a document is not retrievable until it has been entered into the surrogate data structure. This may not be acceptable for a message retrieval system, where incoming documents must be available as soon as they have arrived.

17.2.1 Searcher Structure

Most proposed text searchers divide the work into three activities: search control, term matching, and query resolution. Search control includes taking a request from the processor running the retrieval system, configuring the search hardware (typically by loading appropriate tables and registers), managing the reading of the data from the disks, and reporting back the results to the retrieval system. This search control is best performed by a conventional microprocessor (or, in the past, a minicomputer).

Searching for particular terms or patterns is performed by the special hardware, with the results passed to a query resolver that handles operations such as Booleans, contexts, and proximities. This considerably simplifies the search patterns. A good example of this is a query that specifies that five different terms must occur in the same paragraph, in any order. The regular expression for handling this is very complex, consisting of looking for the start of paragraph pattern, then all possible orderings of the five terms (120 sequences), then the end of the paragraph pattern. A better way is for the term matcher to look for the individual terms of the pattern. The query resolver then combines the term matcher results, which indicate that particular terms have been detected to determine if the pattern has been matched. Because this latter processing is required only when a term has been found, its realtime requirements are considerably less than that of comparing characters from the disk, and it can be handled by a conventional processor.

In effect, the term matcher converts a document into a list of entries that are a shorthand description of the document as it pertains to the query. The query resolution program can then process this greatly reduced information according to whatever scheme is desired. This could be a Boolean query language enhanced with contexts and word location proximity, weighted term document scoring, or any other technique.

Three different techniques have been proposed for implementing the term matcher: associative memories, finite state machines, and cellular arrays (Hollaar 1979). As their performance improves, standard microprocessors, perhaps augmented with simple hardware assists, can also be used for the term matcher.

If commodity disk drives are used to keep costs low, rather than special drives like those with parallel reads, the disk transfer and seek rates will determine the speed of the searcher. There are two possible options: match the term comparator to the nominal transfer rate of the disk (raw read bit rate), or match it to the effective transfer rate (average bit rate when seeks and rotational latency are considered). In either case, the same performance will result, since it is based on how many characters come from the disk in a unit of time. Matching to the effective transfer rate permits a lower comparator speed at the expense of buffer memory. How much lower depends on the seek characteristics for the typical or worst-case queries. Matching to the nominal transfer rate was used for many searcher implementations, because of the high cost of memory for buffers at the time they were proposed.

17.2.2 Search Performance

Text searchers were proposed as a means of improving the response time over a conventional computer doing the search, without the problems associated with using a surrogate such as an inverted file. Using a special purpose searcher that operates at disk speed, the time necessary to complete a user's query is the time it takes to read all the data from a disk drive. While this obviously varies with the type of disk, for most low-cost, high-capacity disks it is about five minutes. This is far too long for an interactive system, and complicates the search control by requiring that a number of different user queries must be combined in a batch to give reasonable performance.

One solution to this problem is to make the search go faster by using higher performance disks, but this substantially increases the system cost because of the low production volumes and resulting higher costs for such disks. Furthermore, the speedup is on the order of 10, while a factor of 100 or more is desirable for an interactive system.

A more reasonable solution is to combine the attributes of searching and using a surrogate to overcome the difficulties with each approach. An inexact surrogate can be used to eliminate documents that have no hope of matching a query. Superimposed codewords will provide a list of documents that is a superset of the documents containing the search terms; if the term doesn't really occur in the document, but is an artifact of the superimposed codeword scheme, it will be eliminated by the search. A fully inverted file, where every term is indexed with its location within a document is not necessary; phrases, contexts, and word location proximities can be handled by the following search.

Rather than the 50 percent to 300 percent overhead for a fully inverted file (Bird et al. 1978), a partially inverted file could have an overhead of less that 5% (Hollaar 1991). This is because only one entry is necessary in the list of documents containing a word no matter how many times a word occurs in the document, and information regarding the location of the word within the document is not stored. The partially inverted surrogate also provides a quick feedback to the user on whether the follow-up search should even be started. It may indicate that too few documents (and possibly no documents, if a term that is not in the database is used) or too many documents would be searched, based on the user's intuition of what is in the database. The search could then be canceled and the query refined, based on information from the index.

It is not necessary to index a document before it is available. As long as the number of unindexed documents remains low relative to an average search, they can simply be added to the list of documents for every search, making them available as soon as the text is loaded.

Seek and search mode

When a surrogate is used to reduce the number of documents to be searched, the search goes from a scan to a seek and search mode. This changes the critical disk parameter from its transfer rate to its seek time relative to document transfer time. To see why this is so, consider an ESDI drive like the Maxtor XT-8760, where the seek time and rotational latency to position to the start of a document is about 20 milliseconds. It takes another 25 msec to read 35,000 characters (the size of the average U. S. Patent, for example) off two tracks, for a total of 45 msec. The effective transfer rate is about 750 KBytes per second, 40 percent of the nominal transfer rate of 1,875 KBytes per second. The effective transfer rate will be less for smaller documents, since the seek time will remain the same but the amount of data read will be less. A disk drive with the same seek characteristics, but which could read the data in zero time, would be only 2.25 times faster.

Just as parallel transfer drives are not particularly effective in a seek and search mode, optical drives with their high seek times are even more devastating to search performance. If the seek time in the small document example above were changed to a 150 ms positioning time (typical of today's optical disks), the effective transfer rate is only 148 KBytes per second, less than 8 percent of the nominal transfer rate.

Substantial improvements can be made by using an appropriate file system. A randomly organized file system, where blocks are placed in any convenient free location on the disk and a pointer (either in the previous block or in a master block) is used to locate the next data block(such as used in most file systems) is convenient for a time sharing system, where files come and go. However, using our example ESDI disk drive, if we have to do an average seek before reading a 512 byte block, we will have an effective transfer rate of under 50 KBytes per second. The use of a large block size improves this, at the expense of higher unusable disk capacity due to internal fragmentation.

The use of a contiguous file system, where each document is stored in consecutive disk blocks, substantially improves disk performance. In one disk revolution after positioning, almost 50 blocks can be read, rather than just one. Since the documents are seldom removed (or expanded) after they are loaded into a archival text database, the problems of file reorganization and lost disk space are minimal.

17.3 SURROGATE PROCESSORS

While text searching has been the most commonly proposed information retrieval operating to be implemented using special purpose hardware, a number of people have examined manipulating document surrogates using custom processors. These have included processors for the merging of the sorted lists of documents in an inverted file index and for searching the document signatures to find specified patterns.

17.3.1 List Merging

The index for an inverted file contains pointers to items within the database. These pointers are generally stored in sorted order to ease their combining to determine those documents that match a particular query. The basic merging operation produces a list of pointers that is the union (for an OR operation), intersection (for an AND), or Boolean difference (for an AND NOT). While this is a simple operation to program on a general purpose computer, the overhead associated with aligning data, processing auxiliary fields, and flow of control such as loop counters mean that the mainframe computers of the 1970s were saturated keeping up with disk reads.

Stellhorn (1977) proposed a system that took blocks of the sorted lists and merged them in a block-parallel, bit-serial fashion. Logic following the merge operation removed either duplicate (for an OR) or duplicate and single entries (for an AND). Hollaar (1978) developed a system based on a simple merge element (both serial and parallel versions were designed) connected by a network or arranged as a binary tree, which was programmed to match the merge operation specified by the query. Both of these systems were 10 to 100 times faster than a conventional computer of the time. However, as processors have become less expensive and much faster, special purpose hardware for list merging is does not seem cost-effective today.

17.3.2 Signature Processors

A signature file, as discussed in a previous chapter, provides an alternative to storing a list of document pointers for each word in the database. A signature is developed for each document that attempts to capture the contents of the document. These signatures, which are considerably smaller than the actual documents, are then searched for potential matches. These matches can be directly presented to the user, or can be searched to eliminate false matches caused by information dropped when the signatures were created.

The most common signature is based on superimposed codewords. Keywords or word stems are selected from the document and hashed to produce an n-bit vector that has m bits set. These vectors are then ORed together to form the signature. Determining whether a document contains a particular word simply requires checking to see if the bits corresponding to the hash of that word are set in the signature. Note that superimposed codewords do not readily allow don't cares or context and proximity operations, and may produce "false drops" when the bit vectors representing two words are ORed, and the result has all the bits of another word, not present in the document, set.

Ahuja and Roberts (1980) proposed storing the superimposed codeword signatures in an associative memory. To reduce the cost, rather than have comparators for every bit in the associative memory, they divided the signatures into blocks and stored each block in a memory module. During the associative search, a codeword is read from the memory, compared against the desired pattern, whether there was a match is saved, and the next codeword from the memory is examined. This word-serial, bit-parallel approach reduces the hardware requirements to one comparator (for as many bits as are in the codeword) per memory module, plus the associated control logic.

Lee (1985) (also Lee and Lochovsky [1990]) observed that it is not necessary to check every bit position of the signatures, but only those that are set in the hash of the word of interest. Since the number of bits set in a hash is small compared to the number of bits in the codeword, this can substantially improve the search performance. Since the same bits in each codeword are being skipped, a word-parallel, bit-serial architecture is best.

17.4 OPTICAL PROCESSORS

The previously discussed special purpose systems were all based on conventional digital electronics. Spurred by work on laser-based compact disk technology and the use of fiber optics for high-speed data transmission, there has been renewed interest in employing optically based approaches in information retrieval. This has included improving the bandwidth of optical mass storage and optical searching of information.

17.4.1 Improved Optical Storage

As we previously mentioned, most optical storage systems now in use do not have either the data rate or access speed of magnetic disk memories. This is particularly true for compact disks, which were originally designed to handle the relatively low data rates needed for reproducing audio information. However, there are ways of substantially increasing the data rates of optical disks beyond the standard techniques of increasing the bit density or rotational speed of the disk.

It is difficult to build a magnetic read head that can read data from many adjacent, closely spaced tracks at the same time. However, it is possible to have a laser beam that illuminates many adjacent tracks, with the information from each track falling on a different position of a photodetector array. This allows a byte or word of data to be read out with only minimal problems of alignment or skewed data. In the case of a parallel-head magnetic drive, the relative timing of the bits of the word depends on the mechanical alignment of the different heads, which can change with temperature or movement during seeks.

Even more interesting is the use of a holographic memory, rather than a rotating optical disk. This offers the potential of not only improved data rates, but also better storage densities and access times. Berra et al. (1989) discusses a page hologram memory capable of holding 725 MBytes on a 6-inch square card, and volume holograms capable of holding up to 10¹² bits. Since access to information in the holographic memory does not require moving a read head assembly and waiting for a disk to rotate to the start of the data, access times are reduced and rotational latency is eliminated. Instead, the laser beam can be deflected to the proper position in under 100 msec and all the information on the page will be immediately available.

17.4.2 Optical Data Processing

Although it is possible to convert the optical signals from a laser disk or holographic memory to electrical signals and process them using conventional digital electronics, one also can use optical means for comparing the information against a pattern without the need for conversion. Berra et al. (1989) discuss a number of these techniques, and present an optical full text search design (Mitkas et al. 1989).

Optical comparisons are performed by using an exclusive-OR operation on dual rail logic (both true and complement of each bit is available). A 1 bit is stored as a TF pattern, and a 0 bit as FT. A spatial light modulator (SLM), which is a one-or two-dimensional array that can alter some parameter of the optical signals passing through the array, such as intensity or phase, is placed in the light path. In its simplest form, the SLM is programmed to form a mask in which individual bits are either passed (P) or blocked (X). XP is used for comparing against a 1, PX for comparing against a 0, and XX can be used for a don't care in a particular bit position. If all pattern bits match the data bits, no light will be passed. It only takes a lens and a single detector to determine if there is a match.

For example, if the pattern is 10d1, where d is a don't care, the mask will be XPPXXXXP. For data of 1011 (dual rail form of TFFTTFTF) or 1001 (TFFTFTTF), no light will be passed, while for data 0011 (FTFTTFTF), the resulting light pattern will be FTFFFFFF, so light is passed and a mismatch is indicated.

The optical text search machine contains a page composer, which generates optical patterns to be fed to the optical comparator. Its main element is a fast optical scanner that converts the sequential (and possibly byte-parallel) input from the optical disk to two-dimensional patterns consisting of shifting sequences of the bytes of text. The optical comparator masks these patterns with an array programmed with all the search terms. Photodetectors sense whether no light passes, indicating a match of a particular term. It is possible to do 100,000 parallel comparisons using an SLM in a single step. Other optical techniques can be employed to allow the multiple input streams to be handled through the same SLM, further increasing its performance.

17.5 SUMMARY

A number of special-purpose hardware systems have been proposed to provide most cost-effective or faster performance for information retrieval systems. Most of these have been text scanners, and several of these have been used in special applications.

While high-speed microprocessors and general purpose parallel machines can now provide much of the performance that previously required special purpose logic (such as for index merging), it still appears that hardware augmentation to those processors can improve the performance of a large system at a small cost. Special purpose hardware for information retrieval will continue to be an interesting research and development area.

REFERENCES

AHUJA, S. R., & C. S. ROBERTS. 1980, "An Associative/Parallel Processor for Partial Match Retrieval Using Superimposed Codes." Proceedings of the 7th Annual Symposium on Computer Architecture, May 6-8, 1980 (published as SIGARCH Newsletter, vol. 8, no. 3) pp. 218-27.

BERRA, P. B., A. GHAFOOR, P. A. MITKAS, S. J. MARCINKOWSKI, & M. GUIZANI. 1989. IEEE Transactions on Knowledge and Data Engineering, (1), 111-32.

BIRD, R. M., J. B. NEWSBAUM, & J. L. TREFFTZS. 1978, "Text File Inversion: An Evaluation." Fourth Workshop on Computer Architecture for Non-Numeric Processing, Syracuse University, August 1978 (published as SIGIR vol. 13, no. 2; SIGARCH vol. 7, no. 2; and SIGMOD vol. 10, no. 1), pp. 42-50.

BIRD, R. M., & J. C. TU. 1979, "Associative Crosspoint Processor System." U. S. Patent 4,152, 762, May 1, 1979.

BURKOWSKI, F. J. 1982, "A Hardware Hashing Scheme in the Design of a Multiterm String Comparator." IEEE Transactions on Computers, C-31, (9), 825-34.

COPELAND, G. P. 1978, "String Storage and Searching for Data Base Applications: Implementation of the INDY Backend Kernel." Fourth Workshop on Computer Architecture for Non-Numeric Processing, Syracuse University, August 1978 (published as SIGIR vol. 13, no. 2; SIGARCH vol 7, no. 2; and SIGMOD vol. 10, no. 1), pp. 8-17.

CREW, B. L., & M. N. GUNZBURG. 1967, "Information Storage and Retrieval System." U. S. Patent 3,358,270, December 12, 1967.

FOSTER, M. J., & H. T. KUNG. 1980. "Design of Special-Purpose VLSI Chips: Examples and Opinions." Proceedings of the 7th Annual Symposium on Computer Architecture, May 6-8, 1980 (published as SIGARCH Newsletter, vol. 8, no. 3) pp. 300-07.

HASKIN, R. L. 1980. "Hardware for Searching Very Large Text Databases." Ph.D. Thesis, University of Illinois at Urbana-Champaign, August.

HASKIN, R. L., & L. A. HOLLAAR. 1983, "Operational Characteristics of a Hardware-based Pattern Matcher." ACM Transactions on Database Systems, 8(1).

HOLLAAR, L. A. 1978. "Specialized Merge Processor Networks for Combining Sorted Lists." ACM Transactions on Database Systems, 3(3), 272-84.

HOLLAAR, L. A. 1979. "Text Retrieval Computers." Computer, 12(3), 40-52.

HOLLAAR, L. A., & R. L. HASKIN. 1984, "Method and System for Matching Encoded Characters." U. S. Patent 4,450,520, May 22, 1984.

HOLLAAR, L. A. 1991. "Special-Purpose Hardware For Text Searching: Past Experience, Future Potential." Information Processing & Management, Vol. 27, No. 4, pp. 371-78.

LEE, D. L. 1985. "The Design and Evaluation of a Text-Retrieval Machine for Large Databases." Ph.D. Thesis, University of Toronto, September.

LEE, D. L., & F. H. Lochovsky. 1990, "HYTREM--A Hybrid Text-Retrieval Machine for Large Databases." IEEE Transactions on Computers, 39(1), 111-23.

MAYPER, V. Jr., A. L. NAGY, R. M. BIRD, J. C. TU, & L. S. MICHAELS. 1980. "Finite State Automaton with Multiple State Types." U. S. Patent 4,241,402, December 23, 1980.

MITKAS, P. A., P. B. BERRA, & P. S. GUILFOYLE. 1989, "An Optical System for Full Text Search." Proceedings of SIGIR 89.

MUKHOPADHYAY, A . 1979, "Hardware Algorithms for Nonnumeric Computation." IEEE Transactions on Computers, C-28(6), 384-89.

ROBERTS, D. C. 1978. "A Specialized Computer Architecture for Text Retrieval." Fourth Workshop on Computer Architecture for Non-Numeric Processing, Syracuse University, August 1978 (published as SIGIR vol. 13, no. 2; SIGARCH vol. 7, no. 2; and SIGMOD vol . 10, no . 1), pp . 51-59.

STELLHORN, W. H. 1977. "An Inverted File Processor for Information Retrieval." IEEE Transactions on Computers, C-26(12), 1258-67.

YU, K.-I, S.-P. HSU, R. E. HEISS, Jr., & L. Z. HASIUK. 1986. "Pipelined for Speed: The Fast Data Finder System." Quest, Technology at TRW, 9(2), Winter 1986/1987, 4-19.

Go to Chapter 18 Back to Table of Contents