13.7: Pitfalls: Garbage Generation and Dangling References

This chapter has shown how a list of available space can be maintained to keep track of storage known to be currently unused. Storage may be allocated dynamically, during program execution, by deleting appropriate elements from this list when needed.

The need and means for storage reclamation have also been discussed. Garbage collection and reference counter techniques (or some combination) may be used for this purpose. When a node of a binary tree, a list, or other dynamic data structure is not needed, the programmer frequently knows at what point this occurs. Some high-level languages allow commands to be used by the programmer to signal this occurrence. For example, C provides the function free. Like malloc (and other allocation functions), its parameter will be a pointer type variable. When p points to a node that the programmer knows is no longer needed, free(p) may be invoked. This signals that the storage pointed to by p may be relinquished and reclaimed for use for another dynamically allocated data structure. Exactly how free uses this information is determined by the compiler.

High-level languages are usually designed with particular storage management schemes in mind. However, the language specifications, or standards, do not normally specify that these schemes must be used. It is possible for a compiler to use garbage collection or reference counters, to leave the reclamation entirely to the programmer, or to use some combination. If necessary programmers may even write their own programs to manage storage. We will not discuss these problems further except to illustrate two potential dangers.

Consider the following disaster segment as an illustration of these dangers.

(1) p = q;

(2) r = malloc ( sizeof(binarytreerecord));

(3) r -> leftptr = tree;

(4) free(tree);

(5) process(&s);

(6) s -> leftptr = lefttree;

Suppose p, q, r, s, lefttree, and tree are all pointer variables pointing to nodes of the same type. Before the segment is executed, the situation might be as shown in Figure 13.16(a). Statement (1), when executed, results in the situation sketched in Figure 13.16(b).

The tree, pointed to by p originally, now has no pointer pointing to it. Hence there is no way the program can access any of its nodes. Also, the storage allocated to this tree, while no longer in use, has not been returned to the list of available space. Unless the storage management technique uses garbage collection or reference counters, that storage is lost. It cannot be accessed and will never be returned to the list of available space. It truly has become garbage. If free(p) had been invoked before statement (1), this might have been avoided, depending on whether free recognizes that the other nodes of the tree may also be returned. If free merely returns the storage for the node to which p points, this could have been avoided only by the programmer traversing the tree pointed to by p and "freeing" each of its nodes.

Suppose, instead, that free uses reference counters and, recognizing that the tree pointed to by p has no other references, returns its storage to the list of available space. Now statement (2) is executed. The result is shown in Figure 13.16(c).

After statement (4) is executed, assuming that free simply returns the storage allocated to the node pointed to by tree to the list of available space, we get Figure 13.16(e). However, the node pointed to by r?/FONT>leftptr is now on the list of available space. R?/FONT>leftptr is called a dangling reference because it points to a node that is on the list of available space. When process is invoked in statement (5), it is possible that it invokes malloc, so storage is allocated to s. Then we have the situation shown in Figure 13.16(f).

(a) Initial Situation

(b) Result of Statement (1)

(c) Result of Statement (2)

(d) Result of Statement (3)

(e) Result of Statement (4)

(f) Result of Statement (5)

(g) Result of Statement (6)

Figure 13.16 Simulation of the Disaster Segment

Statement (6) produces Figure 13.16(g). The binary tree that r now points to has been changed, because of the dangling reference. The programmer, unaware that this has occurred, may find the program's output incorrect and be unable to determine what went wrong. This is especially difficult to discover, since the next time the program executes, process may not need to invoke malloc. Even if it does, the storage allocated by malloc to s might not cause this problem. Still, the programmer may not realize that an error has occurred.

Of the two problems, garbage generation and dangling reference generation, the latter is more serious. Garbage generation may cause longer execution time, or, at worst, program failure. Dangling references, however, may cause serious errors undetected by the programmer.