Collections

The .NET Framework class library’s System.Collections namespace contains classes that serve as containers for groups, or collections, of data. Hashtable is one example of a System.Collections type. It implements hash tables, which feature lightning-fast lookups. Another example is ArrayList, which represents resizable arrays. The presence of these and other types defined in System.Collections means you can spend more time writing code that makes your application unique and less time writing tedious infrastructural code.

The following table lists the core collection classes defined in System.Collections. Succeeding sections formally introduce the Hashtable and ArrayList classes. Other collection classes, including Stack and SortedList, are used in sample programs presented in this chapter and others.

System.Collections Collection Classes
Class	Implements
ArrayList	Resizable arrays
BitArray	Bit arrays
Hashtable	Tables of key/value pairs structured for fast lookups
Queue	First-in, first-out (FIFO) buffers
SortedList	Tables of sorted key/value pairs accessible by key or index
Stack	Last-in, first-out (LIFO) buffers

One characteristic of all the collection classes in System.Collections (with the exception of BitArray, which stores Boolean values) is that they’re weakly typed. In other words, they store instances of System.Object. Weak typing enables collections to store virtually any kind of data because all .NET Framework data types derive directly or indirectly from System.Object. Unfortunately, weak typing also means that you have to do a lot of casting. For example, if you put a string in a Hashtable in C# and then retrieve it, you have to cast it back to a string to call String methods on it. If you’re morally opposed to casting, you can use the System.Collections classes CollectionBase and DictionaryBase as base classes for strongly typed collections of your own. However, it’s very likely that a future version of the .NET Framework will support something called generics, which are analogous to C++ templates. If you can stomach a moderate amount of casting for now, building type-safe collection classes should be a lot easier in the future.

Hash Tables

Since the dawn of computing, programmers have searched for ways to optimize data retrieval operations. When it comes to fast lookups, nothing beats the hash table. Hash tables store key/value pairs. When an item is inserted, the key is hashed and the resulting value (modulo the table size) is used as an index into the table, specifying where the item should be stored. When a value is retrieved, the key is hashed again. The resulting index reveals the item’s precise location in the table. A well-designed hash table can retrieve items with just one lookup, irrespective of the number of items in the table.

System.Collections.Hashtable is the FCL’s hash table data type. The following code uses a Hashtable object to build a simple French-English dictionary. The values stored in the table are the French words for the days of the week. The keys are the English words for the same:

Hashtable?table?=?new?Hashtable?();
table.Add?("Sunday",??? "Dimanche");
table.Add?("Monday",??? "Lundi");
table.Add?("Tuesday",?? "Mardi");
table.Add?("Wednesday", "Mercredi");
table.Add?("Thursday",? "Jeudi");
table.Add?("Friday",??? "Vendredi");
table.Add?("Saturday",? "Samedi");

With the hash table initialized in this manner, finding the French equivalent of Tuesday requires one simple statement:

string?word?=?(string)?table["Tuesday"];

Items can also be added to a Hashtable using string indexes:

Hashtable?table?=?new?Hashtable?();
table["Sunday"]????= "Dimanche";
table["Monday"]????= "Lundi";
table["Tuesday"]???= "Mardi";
table["Wednesday"]?= "Mercredi";
table["Thursday"]??= "Jeudi";
table["Friday"]????= "Vendredi";
table["Saturday"]??= "Samedi";

Semantically, there’s a difference between adding items with Add and adding them with indexers. Add throws an exception if the key you pass to it is already in the table. Indexers don’t. They simply replace the old value with the new one.

Physically, a Hashtable stores items added to it in System.Collections.DictionaryEntry objects. Each DictionaryEntry object holds a key and a value that are exposed through properties named Key and Value. Because Hashtable implements the FCL’s IDictionary interface, which in turn derives indirectly from IEnumerable, you can use the C# foreach command (or the Visual Basic .NET For Each command) to enumerate the items in a Hashtable. The following code writes all the keys and values stored in a Hashtable named table to the console window:

foreach?(DictionaryEntry?entry?in?table)
????Console.WriteLine?("Key={0},?Value={1}\n",?entry.Key,?entry.Value);

Hashtable also has methods for removing items (Remove), removing all items (Clear), checking for the existence of items (ContainsKey and ContainsValue), and more. To find out how many items a Hashtable contains, read its Count property. To enumerate only a Hashtable’s keys or values, use its Keys or Values property.

Two factors control Hashtable lookup performance: the Hashtable’s size and the uniqueness of the hashes produced from the input keys. A Hashtable’s size is dynamic; the table automatically grows as new items are added to it to minimize the chance of collisions. A collision occurs when two different keys hash to identical table indexes. Hashtable uses a double hashing algorithm to mitigate the negative effect of collisions on performance, but the best performance comes when there are no collisions at all.

Grow operations are expensive because they force the Hashtable to allocate new memory, recompute the table indexes, and copy each item to a new position in the table. By default, a Hashtable is sized for 0 items, meaning that many grow operations are required to grow it to a respectable size. If you know in advance approximately how many items you’ll be adding to a Hashtable, set the table’s initial size by passing a count to the class constructor. The following statement creates a Hashtable whose size is optimized for 1,000 items:

Hashtable?table?=?new?Hashtable?(1000);

Initializing the Hashtable size in this manner doesn’t affect lookup performance, but it can improve insertion speed by a factor of 2 or more.

When a Hashtable grows, it always assumes a size that’s a prime number to minimize the likelihood of collisions. (Statistically, if n is a random number, n modulo m is more likely to produce a unique result if m is a prime number.) By default, a Hashtable expands its memory allocation when the item count exceeds a predetermined percentage of the table size. You can control the percentage by varying the load factor. A load factor of 1.0 corresponds to 72 percent, 0.9 corresponds to 65 percent (0.9 72), and so on. Valid load factors range from 0.1 to 1.0. The following statement sizes a Hashtable for 1,000 items and sets its load factor to 0.8, meaning that the Hashtable will grow when the item count reaches approximately 58 percent of the table size:

Hashtable?table?=?new?Hashtable?(1000,?0.8f);

The default load factor (1.0) is fine for most applications, so chances are you’ll never need to change it.

Maximizing the uniqueness of hash values generated from input keys is also critical to a Hashtable’s performance. By default, Hashtable hashes an input key by calling the key’s GetHashCode method, which all objects inherit from System.Object. If you key values with instances of a class whose GetHashCode method does a poor job of generating unique hash values, do one of the following to optimize performance:

Override GetHashCode in a derived class and provide an implementation that produces unique hash values.
Create a type that implements IHashCodeProvider and pass a reference to an instance of that type to Hashtable’s constructor. Hashtable will respond by calling the object’s IHashCodeProvider.GetHashCode method to hash the input keys.

Many FCL data types, including strings, hash just fine and therefore work well as Hashtable keys right out of the box.

Hashtable calls a key’s Equals method—another method inherited from System.Object—to compare keys. If you use a custom data type as Hashtable keys and the Equals method your type inherits from System.Object doesn’t accurately gauge equality, either override Equals in the derived class or pass Hashtable’s constructor an IComparer interface whose Compare method is capable of comparing keys.

Resizable Arrays

The FCL’s System namespace contains a class named Array that models the behavior of static arrays. System.Collections.ArrayList encapsulates dynamic arrays—arrays that can be sized and resized as needed. ArrayLists are useful when you want to store data in an array but don’t know up front how many items you’ll be storing.

Creating an ArrayList and adding items to it is simplicity itself:

ArrayList?list?=?new?ArrayList?();
list.Add?("John");
list.Add?("Paul");
list.Add?("George");
list.Add?("Ringo");

Add adds an item to the end of the array and grows the array’s memory allocation if necessary to accommodate the new item. The related Insert method inserts an item into an ArrayList at a specified position and moves higher-numbered items upward in the array. Insert also grows the array if that’s necessary.

If you know approximately how many items you’ll be adding to an ArrayList, you should specify a count at creation time to minimize the number of resizing operations performed. The following code creates an ArrayList containing 100,000 integers:

ArrayList?list?=?new?ArrayList?();
for?(int?i=0;?i<100000;?i++)
????list.Add?(i);

The next code sample does the same, but does it in half the time (10 milliseconds versus 20 milliseconds on the machine I tested it on):

ArrayList?list?=?new?ArrayList?(100000);
for?(int?i=0;?i<100000;?i++)
????list.Add?(i);

To retrieve an item from an ArrayList, use a 0-based index:

int?i?=?(int)?list[0];

And to assign a value to an existing array element, do this:

list[0]?=?999;

The Count property reveals how many items an ArrayList contains. Consequently, one way to iterate through the items in an ArrayList is as follows:

for?(int?i=0;?i<list.Count;?i++)
????Console.WriteLine?(list[i]);

You can also iterate with foreach:

foreach?(int?i?in?list)
????Console.WriteLine?(i);

To remove items from an ArrayList, call Remove, RemoveAt, RemoveRange, or Clear. When items are removed, items with higher indexes are automatically shifted down to fill the void. If you delete the item at index 5, for example, the item at index 6 becomes the item at index 5, the item at index 7 becomes the item at index 6, and so on.

Instances of ArrayList automatically allocate memory to accommodate new items. They don’t automatically release that memory when items are removed. To downsize an ArrayList to fit exactly the number of items that it currently contains, call TrimToSize. The following example adds 1000 integers to an ArrayList, deletes the first 500, and then resizes the array to fit the remaining 500:

//?Add?items
ArrayList?list?=?new?ArrayList?(1000);
for?(int?i=0;?i<1000;?i++)
????list.Add?(i);

//?Remove?items
list.RemoveRange?(0,?500);

//?Resize?the?array
list.TrimToSize?();

The number of items that an ArrayList can hold without allocating additional memory is called its capacity. You can find out the capacity of an ArrayList from its Capacity property. Capacity is a get/set property, so you can use it to set the capacity as well as to read it. The ability to increase an ArrayList’s capacity on the fly comes in handy if you don’t know how many items the array will store when you create it, but you do know approximately how many it will store when you start adding items.

The WordCount Application

WordCount is a console application that provides statistics on word usage in text files. To use it, go to a command prompt and type the command name followed by a file name, as in the following:

wordcount?readme.txt

The output consists of an alphabetically sorted list of words found in the file and the number of times that each word appears. WordCount uses a StreamReader, a Hashtable, and an ArrayList to do its work. For good measure, it also throws in a SortedList. Its source code appears in Figure 3-3.

When executed, WordCount opens the input file and reads through it a line at a time with repeated calls to StreamReader.ReadLine. It extracts the words from each line by calling a local method named GetWords, and it uses each word returned by GetWords as a key in the Hashtable. If the key doesn’t exist—meaning the word hasn’t been encountered before—WordCount adds a 1 to the Hashtable and keys it with the word. If the key does exist—meaning the word has been encountered before—WordCount reads the associated integer value from the Hashtable, increments it by one, and writes it back using the same key. By the time WordCount reaches the end of the file, every word it encountered is represented as a key in the Hashtable, and the value associated with each key is a count of the number of times the word appears. Thus, the Hashtable’s purpose is twofold:

It provides a super-fast way to determine whether a word has been encountered before.
It provides a store for the word list and associated occurrence counts.

How do ArrayList and SortedList fit into the picture? When GetWords begins parsing a line of text, it has no idea how many words it will encounter. Because it can’t very well store the results in a static array, it uses a dynamic array—an ArrayList—instead. After parsing is complete, GetWords allocates a static array just large enough to hold the items in the ArrayList and copies the ArrayList to the static array with ArrayList.CopyTo. Then it returns the static array to the caller.

Method Main uses a SortedList object to sort the word list before writing it to the console window. One simple statement copies the Hashtable to the SortedList; a foreach loop extracts items from the SortedList and outputs them to the screen. Because the values used to key the items in the SortedList are strings, the simple act of inserting them into the SortedList sorts them alphabetically.

WordCount.cs

using?System;
using?System.IO;
using?System.Collections;

class?MyApp
{
????static?void?Main?(string[]?args)
????{
????????//?Make?sure?a?file?name?was?entered?on?the?command?line
????????if?(args.Length?==?0)?{
????????????Console.WriteLine?("Error:?Missing?file?name");
????????????return;
????????}

????????StreamReader?reader?=?null;
????????Hashtable?table?=?new?Hashtable?();

????????try?{
????????????//?Iterate?through?the?file?a?word?at?a?time,?creating

????????????//?an?entry?in?the?Hashtable?for?each?unique?word?found
????????????//?and?incrementing?the?count?for?each?word?that's
????????????//?encountered?again
????????????reader?=?new?StreamReader?(args[0]);

????????????for?(string?line?=?reader.ReadLine?();?line?!=?null;
????????????????line?=?reader.ReadLine?())?{
????????????????string[]?words?=?GetWords?(line);

Figure 3-3

WordCount source code.

????????????????foreach?(string?word?in?words)?{
????????????????????string?iword?=?word.ToLower?();
????????????????????if?(table.ContainsKey?(iword))
????????????????????????table[iword]?=?(int)?table[iword]?+?1;
????????????????????else
????????????????????????table[iword]?=?1;
????????????????}
????????????}

????????????//?Sort?the?Hashtable?entries?using?a?SortedList
????????????SortedList?list?=?new?SortedList?(table);

????????????//?Display?the?results
????????????Console.WriteLine?("{0}?unique?words?found?in?{1}",
????????????????table.Count,?args[0]);

????????????foreach?(DictionaryEntry?entry?in?list)
????????????????Console.WriteLine?("{0}?({1})",
????????????????????entry.Key,?entry.Value);
????????}
????????catch?(Exception?e)?{
????????????Console.WriteLine?(e.Message);
????????}
????????finally?{
????????????if?(reader?!=?null)
????????????????reader.Close?();
????????}
????}

????static?string[]?GetWords?(string?line)
????{
????????//?Create?an?ArrayList?to?hold?the?intermediate?results
????????ArrayList?al?=?new?ArrayList?();

????????//?Parse?the?words?from?the?line?and?add?them?to?the?ArrayList
????????int?i?=?0;
????????string?word;
????????char[]?characters?=?line.ToCharArray?();

????????while?((word?=?GetNextWord?(line,?characters,?ref?i))?!=?null)
????????????al.Add?(word);

????????//?Return?a?static?array?that?is?equivalent?to?the?ArrayList
????????string[]?words?=?new?string[al.Count];
????????al.CopyTo?(words);
????????return?words;
????}

????static?string?GetNextWord?(string?line,?char[]?characters,
????????ref?int?i)
????{
????????//?Find?the?beginning?of?the?next?word
????????while?(i?<?characters.Length?&&
????????????!Char.IsLetterOrDigit?(characters[i]))
????????????i++;

????????if?(i?==?characters.Length)
????????????return?null;

????????int?start?=?i;

????????//?Find?the?end?of?the?word
????????while?(i?<?characters.Length?&&
????????????Char.IsLetterOrDigit?(characters[i]))
????????????i++;

????????//?Return?the?word
????????return?line.Substring?(start,?i?-?start);
????}
}