13.5.3 Overview of Storage Reclamation

There is a spectrum of reclamation techniques, but the two basic underlying approaches are garbage collection and reference counters. Garbage collection represents one extreme. Nodes that are no longer needed in a dynamic data structure are deleted from the structure, thus becoming garbage. They are not returned at that time to the list of available space, even though they are then reusable. Reference counters, at the other extreme, return such unused nodes to the list of available space as soon as possible, when their reference counters become zero. These methods of reclamation are done implicitly by the programming language in conjunction with the operating system.

In addition, programmer control allows the programmer to use commands signaling that nodes are no longer needed by a dynamic data structure. The nodes may then be returned to the list of available space directly, left for garbage collection, or, if reference counters are used, their reference counts will be decremented and tested for possible return to the list of available space. In C the standard library function free can be used for such signaling.

In the design of a specific reclamation system, a number of trade-offs are possible. For example, reference counters require additional storage and additional time to increment, decrement, or test when insertions and deletions occur. However, they do not always work. The reason is that garbage can accumulate when loops are present in the dynamic data structures (as in Example 13.9). If the amount of garbage generated in this way is significant, it can limit the size of the problem that can be solved. Still, reference counters allow the reclamation process to be spread over time. Garbage collection, in contrast, occasionally requires a relatively large block of time. Garbage collection is a more complex technique, involving more sophisticated algorithms. Recall that when garbage collection is invoked, even though many available nodes may exist, they are not known. It is not possible simply to look at a node of the heap to determine if it is in use.

To summarize, garbage collection works in two phases. The first phase, or marking phase, requires the traversal of all dynamic data structures in existence. Recall that the heads of such structures are assumed to be known. Initially, whether in use or not, every node has a field that is marked "unused." As each node of a dynamic data structure is accessed in the traversal, it is marked "used." When the marking phase is completed, the heap then consists of nodes that are marked "unused" and nodes that are marked "used." Consequently, it is now possible to look at a node and determine its status. The second phase, or collection phase, involves a traversal through the nodes of the heap. Each node marked "unused" is inserted on the list of available space. Each node marked "used" is merely marked "unused" so that the garbage collector will work correctly the next time it is invoked.

The marking algorithms known are quite elegant. The traversal of the dynamic data structures cannot simply use a stack, since no nodes are known in which to implement the stack. Instead, these algorithms allow the traversals to be accomplished by using tag fields, usually one bit, associated with each node. It is even possible to do the traversals with no tag fields, with a fixed amount of additional storage required. Of course, if we had an infinite amount of storage available, no reclamation would be necessary. Storage management would be trivial! The next section features three traversal algorithms for use in phase I of the garbage collector, or in other applications where storage is at a premium. See Pratt [1975] for more on storage management. The algorithms to be discussed next and generalizations are also given in Standish [1980].