Data Structures Using C: 9.3.2 Searching a Hash Table

9.3.2 Searching a Hash Table

How would you search a hash table when asked to find a given key, such as 642? If the key were stored in the table, it would either be found at its hashing address or at the end of a linear probe. If its hashing address were empty, or if the search came to an empty element in the linear probe before finding the key, you might conclude that it was not stored in the table. In this case, the hash address of 642 is 21. Element 21 is not empty but does not have 642 as its value. If you do a linear probe, you will eventually come to element 3, which is empty, without finding 642; thus you would conclude that 642 is not in the table.

If the key searched for were 802, you could go to its hash address (20), find that element occupied with a different key, and start the linear probe, finding 802 at element 0.

Notice that if the search key value is present in the table, the search will require exactly as many probes to find it as were required to enter it into the table. The search traces out the same path. Instead, if the search is for a key value that is not present, the search will take exactly as many probes as would be required to enter it into the current table.

A search that finds a key value is called successful, and one that fails to find a key value is unsuccessful. Theoretically, one can calculate the number of probes required for every possible key value. Knowing the frequency with which each possible key value (there are 1,000 in our example) will be searched for, one can determine the average number of probes required to search the table. The maximum number of probes can always be determined given the table. In the example, a successful search would require at most seven probes, while an unsuccessful search (for example, for key value 248) could require nine probes. If one never had to search for a key not stored, and all stored keys were searched for with equal frequency, the average number of probes would be

  1/18 ?/FONT> 1 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 2 + 1/18 ?/FONT> 1

+ 1/18 ?/FONT> 4 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 2 + 1/18 ?/FONT> 1

+ 1/18 ?/FONT> 5 + 1/18 ?/FONT> 4 + 1/18 ?/FONT> 7 + 1/18 ?/FONT> 3 + 1/18 ?/FONT> 1 + 1/18 ?/FONT> 3

or 40/18 = 2.22 probes. This calculation involves multiplying the frequency of search for each record by the number of probes required to enter it, and then adding all these products. In this case, each frequency was 1/18, and the required probes were noted when the hash table was built.

If no collisions occurred in building the table, every successful search would take exactly one probe. If the hash function actually assigned every key value not stored in the table to an empty address, exactly one probe would be required for any search. This would be the ideal situation. It is apparent now that a desirable hashing function will "scatter" or distribute the key values throughout the table, so that relatively few collisions ensue.

The hash table was of length 23 and stored eighteen key values. Its usage ratio is 18/23; the table is 78 percent full. Intuitively, the lower this ratio, the greater the storage wasted, but the lower the likelihood of a collision. Appropriate hashing functions must be selected with care in real applications, but it is possible to achieve very quick searches using hash tables.