Learning from Past Mistakes

Another project I worked on had one section of troublesome legacy code. Practically every month, someone would report another bug against that area, and we'd hear about it and say, "Oh, that again.…" The code was a mess. Five different people had owned it over a 2-year period, and the nature of the code made it very difficult to write automatic unit tests, which meant we had no way to verify that all the past bugs were still fixed. Every time we fixed one bug, another appeared in its place.

One of the developers took it upon himself to rewrite that section to be more maintainable. Part of his task was to create automated unit tests so we could make sure the code worked before passing it to the quality assurance department. He wrote some great test cases, but as soon as he sent the new code over, the testers immediately started filing dozens of bugs on all the special cases he had forgotten to handle. The developer was amazed the testers were able to find so many bugs so quickly. He himself had spent nearly a week thinking up test cases and writing unit tests. How were the testers able to come up with so many additional test cases practically overnight?

What the developer didn't realize was that the testers were merely going through the defect tracking system and looking at all the bugs that had been logged against this program in the past. The tracking system indicated XYZ had been a problem before and had been fixed before—therefore, the testers knew they should check XYZ with the new version, too. That's when the developer realized he ought to be searching the tracking system, as well. He went back to the drawing board, checked the tracking system for all past bugs on this component, created unit tests for all of them, and fixed the code until it passed every one. After that, we never heard a bug report about this section from testers or customers again.

But to get maximum benefit out of this system, your developers need to get in the habit of describing the fix on each non-trivial PR they close. It's tempting to merely close the PR without writing any comments on it. For simple bugs—such as when the problem was something like merely forgetting to increment a loop index—that's fine. But many bugs are caused when the code fails to handle some special case (such as a password policy or an unusual set of input). In those cases, leave yourself a note describing exactly what the special case was and what you did to work around it. It may seem like extra work at the time, but it's like commenting your code: In 6 months when you can't remember what was involved in this fix, you'll thank yourself.

Tip

Some development orgs even go so far as to list on the PR which source files were changed to fix the bug. That way, all the information you could possibly ever need is listed on the PR. I don't follow this practice for every bug (though I probably ought to), but I do follow it in cases where my team might have to hotfix the PR in a few weeks or months. Then, a quick glance at the PR will remind me of exactly what was changed.

Reporting on PRs

Although statistical reports are seldom directly helpful when debugging, they can give a project leader tons of information to help focus the project scope—and they may even give the leader enough data to avoid having to take up valuable developer time with countless questions. That'll let the developers spend more time debugging and less time explaining themselves to managers. Everybody wins that way! So as a lead developer, I've often turned to the defect tracking system to provide estimates of project status.

One of the most difficult parts of a project management role is merely to identify when a program will be ready for release. No program will ever be bug free, but is this program good enough yet? Set a definition for a "must-fix" priority level (see the "Priorities" sidebar) and check the number of must-fix PRs each day. Until that count gets near zero, you know you're a long way from shipping. You'll also want to look at the number of new PRs filed each day and the number of bounce-back PRs. If those values are too high, then it may indicate the developers aren't testing their own code enough. Likewise, if your testers are creating too many user- error PRs, then that might indicate a problem you need to address, too.