*Dr. Dobb's Journal* June 1998

Sidebar:

Starting this month, Jon Bentley will be a regular contributor to "Algorithm Alley" Jon is best known for the "Programming Pearls" column he wrote for* Communications of the ACM* and later turned into a series of often-recommended books.

This month, Jon takes a hard look at a common tradeoff: Should you optimize your code to be easy to read or to be efficient? Recursive algorithms are often elegant and easy to express, but are less efficient than iterative techniques. Or are they?

As Jon shows, the truth is much more complicated than you might think.

-- Tim Kientzle

Too often, I find myself wrestling with a piece of code, beating at it with *while* and *for* loops, all to no avail. The code gets bigger, buggier, and uglier. After hours (or days) of frustration, I finally try recursion and a clean function jumps out in a dozen lines of code.

So why do I make the same mistake over and over again, and blithely ignore recursion? I'll fall back on the old excuse of the way I was raised. I grew up programming in Fortran without recursion. And during my early years of recurring, it was notoriously expensive.

On modern systems, though, recursion can be almost free. It can also be much more expensive (relatively) than it was in the bad old days. I recently confronted this ghost of my childhood by studying the cost of recursion. In this month's "Algorithm Alley," I'll tell the story of my experiments, starting with some examples, and ending with substantial speedups for a new data structure. It is sprinkled along the way with exercises so you can download the programs and try them (details on the program follow shortly).

This analysis has been a wonderfully liberating experience for me. New insights into the cost of recursion allow me to use this powerful tool more comfortably. And when recursion is too expensive, simple techniques let me retrieve the performance while maintaining much of the elegance.

I'll start with two important functions on strings. The *strlen* function returns the number of characters in a null-terminated string. The typical implementation is usually along the lines of Example 1(a). The loop increments the pointer *s* until it finds the null character, incrementing the integer *n* along the way.

As Example 1(b) shows, you can implement the same function recursively (the leading *r* is for recursion). The length of a null string is zero; the length of a nonnull string is one more than the length of the substring that omits the first character. The two programs perform a similar sequence of computations: test **s* for null, add 1 to *s*, and repeat. The different mechanisms for repetition can have very different overheads, though.

**Exercise 1. **Estimate the run times of the two *strlen* functions on your machine. Assume that the input is a very long string. How will the run times vary across hardware, compilers, and optimization level?

The first two lines of Table 1 show the run time of the two functions across three 200-MHz machines, three different compilers, and two extremes of optimization. The first entry on the first line says that on a Pentium Pro with no optimization, the *istrlen* function takes about 30*n* nanoseconds on an input of length *n*; the second entry says that the optimized version is twice as fast. The two rows together show that recursion is expensive: from factor of four on an (optimized) Pentium Pro to a whopping factor of 29 on an (optimized) UltraSPARC. Caveat hackor!

The next experiment (see Example 2 (a)) implements the *strcmp* function that compares two strings. The recursive version (Example 2 (b)) peels off the first character of each before recurring.

The third and fourth lines of Table 1 give the coefficient in nanoseconds for comparing two equal strings of length *n*. That data shows that recursion without optimization is more expensive than iteration by about a factor of five. With the highest level of optimization, though, recursion has exactly the same cost as iteration. The clever compilers noticed that the recursion is always the final call in the function (usually called "tail recursion"), and transformed it to iteration.

When I first implemented the recursive length function, I used C's "?:" conditional expression to yield more succinct (but not necessarily clearer) code; see Example 2 (c).

**Exercise 2.** Will using a conditional expression change the run time of the program?

I had assumed the answer was no. The final line in Table 1 shows that on the Pentium, the times are similar. On the other systems, though, the difference is enormous. I had originally written most of my recursive functions using conditional expressions; I rewrote the experiments to avoid this vexing anomaly.

All of the experiments in this column (and many others not reported here) are contained in the 300-line C program called "recit.c" (for recursion and iteration), which is available electronically; see "Resource Center," page 3. After compiling the program, run it with the single parameter *n* (500 is a fine starting value). The program produces a single column of Table 1 (reflecting your machine and the optimization level you used on your compiler). The program displays the raw clicks for each operation and the computed coefficient in nanoseconds; double the value of *n* until the clicks are smooth and the coefficients converge.

**Exercise 3.** Estimate the cost on your system of the string functions *strlen* and *strcmp*. After writing down your guesses (no cheating!), download the program and run it on your machine, the experiment with different levels of optimizations.

The simple functions you've seen so far have illustrated some of the costs of recursion. They show little of its expressive power, though: The problems are easily solved iteratively. I'll move on now to a data structure that needs recursion.

Each node in a binary search tree contains an integer value and two pointers to subtrees; see Example 3 (a). The nodes in the *lo* subtree have values less than *val*, while those in the *hi* subtree have values greater than or equal to *val*. The root of the tree is *Tptr root;*. The recursive search function returns 1 if the value *t* is in the tree and (for similarity to the array search) -1 if the value is not present; see Example 3 (b). The iterative version ( Example 3 (c) is similar.

The first two lines of Table 2 are similar to other functions that employ tail recursion. (The nodes of the tree were inserted in increasing order to form a degenerate tree that requires linear time to search; the entries in the table are still the coefficients in nanoseconds.)

The glory of recursion becomes evident in the insertion function, which is originally called as *root = rinsert(root, t);*. If *p* is nonnull, the element is inserted in the correct branch (ties go high). When *p* is null, it makes a new node, as is shown in Example 3 (d). This recursion gracefully solves the problem of remembering whether to store the new subtree into a *lo*, *hi*, or *root* pointer. To appreciate that service, try doing the job yourself.

**Exercise 4. **Write an iterative function to insert a new node into a binary search tree.

Example 4, the simplest program I could come up with for this task, is faster than the recursive version, but I find it much less clear. The recursive version ran the first time; the iterative version had several bugs.

In situations like this, an old C trick combines the best of both worlds.

**Exercise 5.** Use a pointer to a pointer to keep track of where to store the pointer to the new node.

As a final exercise, I'll turn to the problem of traversing a binary tree. I'll just scratch the surface of this fascinating topic. If you're interested in more detail, read Donald Knuth's classic "Structured Programming With goto Statements" (*Computing Surveys 6*, 4, December 1974, pp. 261-301). The rest of this section is a brief sketch of Knuth's study.

Some traversals report the values in sorted order; others verify that the tree is sorted. To keep our traversal simple, we'll just count the nodes in the tree. Example 5(a) is a recursive function for the job.

Since you know that recursion is expensive, you can reduce the number of calls from 2*n* to *n* by guarding the recursive call; see Example 5(b). The original *rcount* first calls the function and then tests whether *p* is null; *rcount2* first tests for null, and then makes the call. For one extra *if* statement, you get a speedup of between six and 26 percent.

**Exercise 6.** Function *rcount2* is faster than function *rcount*, but it is potentially buggy. Explain what the problem is and how to avoid it.

The final call in *rcount2* is tail recursive. You can therefore replace it with an assignment and a *while* loop; as in Example 6 (a). The systems usually found this tail recursion; it was a 15 percent speedup on just one system.

Knuth uses several straightforward transformations on *rcount3* to remove the first recursion. The resulting *icount* function is shown in Example 6 (b). It uses its own stack. I found this version much more difficult to understand, so I was delighted that it was never significantly faster than the (singly) recursive function.

These little experiments have illustrated some big truths about recursion.

- Recursion is expressive. Some of the recursive tree functions are much clearer than their iterative cousins.
- Recursion can be very expensive.
- Tail recursion is sometimes cheap. Smart compilers convert it to iteration.
- Tail recursion is sometimes very expensive. Converting it by hand to iteration can be very effective.
- Guarding recursive calls can be effective.
- Handwritten stacks require lots of code yet yield little (if any) run time. (Guy Steele said it best: "If the programmer can simulate a construct faster than the construct itself, then the compiler writer has blown it badly.")
- Optimization is usually effective. Enabling compiler optimization can make some little functions an order of magnitude faster. Then again, it makes some functions a bit slower.
- Timing is hard. The factor of 20 in a conditional expression is just one of many mines in this very large field.

The text box, entitled "A Real Program," shows how these principles were applied to speed up search algorithms for a new data structure.

The following are solutions for a couple of the exercises: Exercise 5 -- Function *iinsertpp* uses *p* as a pointer to the pointer that is assigned a new node. Note that it is almost as fast as the iterative version; see Listing One. Exercise 6 -- You must guard the first call: *if (root) rcount2(root);*.

**DDJ**

void iinsertpp(int t) { Tptr *p = &root; while (*p) { if (t < (*p)->val) p = &((*p)->lo); else p = &((*p)->hi); } *p = (Tptr) malloc(sizeof(Tnode)); (*p)->lo = (*p)->hi = 0; (*p)->val = t; }