2) Never Forget the Goal Is to Make the Program Work - Fixing the Bug Is the Means, Not the End

2) Never Forget the Goal Is to Make the Program Work—Fixing the Bug Is the Means, Not the End

Imagine you're building a new word processor. You discover the program crashes when printing a document. An examination of the crash dump logs (see Chapter 5) indicates the problem is occurring after a certain OS library function returns incorrect data. Maybe you're passing bad arguments to the function. Maybe the function has a bug. Maybe the function is fine, but you're using the wrong version of the library. Who knows? All you know is that the library function returns bad data when you call it. Anyway, your boss tells you to fix the problem. Now, what are you trying to do? What is your goal?

If you answered, "Figure out the bug with that library function," then you're missing the big picture. That isn't what your boss told you to do. You are trying to make the print feature work—you don't care about the low-level goal of fixing the library function. Granted, nine times out of ten, fixing that library function is your best bet at resolving the printing bug. But suppose the OS also defined some other function that does pretty much the same thing, except that it works. Rather than debugging the function that doesn't work, shouldn't you at least consider just changing your code to call the function that does work? Then the print feature would be fixed (which is all you really care about anyway), and you could move on to a different task.

Or maybe the call to the library function is merely an optimization, and maybe you're willing to give up a little performance in exchange for making the feature work. Or maybe that function is needed only for a seldom-used advanced printing feature (such as color printing) and the project manager might decide it's better to remove the color printing feature (thereby making this bug go away) so you can have time to fix the other 1,038 bugs in the product rather than potentially spending days on this one, relatively unimportant, issue. In all of these cases, looking at the big picture provides an easier way of getting the print function to work than limiting your focus to fixing the library function.

Don't get me wrong—I'm not saying that whenever code is broken you should throw it away and find a different way of doing things. Figuring out the problem with the original library function is usually the fastest, best solution. But the point is that if fixing the problem starts to involve really nasty details, then instead of spending days solving those details, look for alternative approaches that avoid those details entirely. You might not find any, but you should always at least be on the lookout for them. Don't get so drawn down into the details of the bug that you overlook easier paths.

Can You Really Cut a Feature to Hide a Bug?

Many developers feel dirty when asked to hide a bug by removing a feature. "It's a bug! We need to fix it! We can't just sweep it under the covers! Besides, how will customers feel when we remove this great feature just because we're too lazy to fix the bug?" Certainly, decisions like this shouldn't be made lightly, or else every feature would be removed as soon as a bug shows up. Certainly, decisions like this shouldn't be made by any one person—the entire product team needs to agree before removing a feature. But under particular circumstances, removing the feature can be the right thing to do.

When you're behind schedule and still have ten new features to write, you'd defer some of those features to a later version so that you have time to do a good job on the remaining ones, right? The same argument applies to bugs. When one bug looks like it might take a long time to fix, perhaps it would be better to use that time to fix a dozen easier bugs instead, even if it means cutting a feature. Better to ship five completely working features than six half-working features! Besides, if the feature is new to this version, then customers will never even know that you cut the feature at the last minute.

On the other hand, it's much harder to justify cutting a feature that customers have already seen in the previous version of an application. As an example, Visual Basic 6 supported a wonderful Edit-And-Continue feature that allowed you to fix errors while debugging without restarting the program. This feature disappeared in both C# and VB .NET because of some thorny last minute implementation issues (although Microsoft had indicated Edit-And-Continue may return in a future service pack). Customers have grudgingly accepted this because .NET provides so many other advantages, but there was still a huge outcry over this lost feature.

When Goals Are Dependent on Subgoals, Which Are Dependent on Even More Subgoals…

I once had to write an in-house utility for Microsoft Exchange 2000 that would programmatically create a rule to move incoming e-mail to the Saved Items mail folder. Exchange defines several APIs for doing this. But the catch is that setting up the rule object involves some binary keys that are slightly different depending on whether the code is running under the security credentials of the folder's owner or not. I found that to make the rule work, I needed to set up the binary key as if it were created by the folder's owner.

OK, no problem—all I had to do to solve that was make my code programmatically log on as the folder owner. And this worked! Most of the time. Except due to a flaw in Exchange 2000, I couldn't programmatically give myself permissions to log on as the folder owner, even if I already had the necessary administrator rights to assign myself the permissions in the management GUI. So my code wasn't always able to log on as the owner of the folder, and therefore it wasn't able to create the binary keys of the rule in the right format.

Now what would you do? It makes sense to look for a workaround for the permissions problem. Since I was able to manually grant myself permissions through the management GUI, there must have been some way to do it programmatically with the APIs. Even if this API couldn't do it, then presumably some other API could. And even if there were no such API, presumably Microsoft would someday issue a hotfix for the bug. At this point, I was annoyed and frustrated, but figuring out the logon permissions problem was what I had to do to fix the primary bug, so I stuck with it.

That was the trap I fell into, and I wasted two whole days confirming that the permissions problem really was a bug with Microsoft Exchange and trying to find a workaround. I wasn't directly working on what I really cared about (programmatically creating rules). I was focused on the subgoal: In order to create the rule, I needed to get the "owner" binary key; and in order to do that, I needed to log on with a different account; and in order to do that, I needed to give myself permissions; and in order to do that, I needed to work around this bug in the API. I was working on something four levels removed from what I really wanted to accomplish!

When the Going Gets Tough, the Tough Look for an Easier Way

Focusing on something several levels away from your primary goal isn't necessarily wrong. The devil really is in the details, so you will often have to implement many levels of subgoals. That's fine. The problem comes when you get so focused on a subgoal that you forget to look for alternative solutions. Anytime something becomes dramatically harder than you expected, step back and pause for a moment. Challenge your assumptions and ask if what you're about to do is actually necessary. Look again for an easier way. Don't make the mistake I did. Don't focus on the subgoal. Focus on the big picture. If the subgoal is hard, ask whether there is a different method that would avoid this difficult subgoal entirely.

In my case, it finally occurred to me to compare the "owner" and "non-owner" binary formats in the debugger's memory view window. Microsoft did not document these formats, but I discovered the only difference between them was that one format had 2 bytes set to 0x0c, while the other format had those same bytes set to 0x09. I realized I didn't need to solve the bug with the permissions problem. Instead of all the headaches that came with fetching the owner binary key by logging on as the owner of the folder, I could just take the non-owner key that I already had and edit those 2 bytes of memory to 0x09.

I know what you're thinking: Bit-banging the memory directly—isn't that far more low-level than most of us want to get? Didn't that require using the unsafe keyword? And reverse-engineering an undocumented memory layout—didn't that run the risk that Microsoft might change the layout in the next version? OK, sure. Those are valid considerations, and maybe the decision that was appropriate for this in-house utility wouldn't be appropriate for a shipping product.

But my hack worked, and it only took a few minutes, whereas the "right" way wasn't working after 2 straight days. The point isn't the specific solution I chose here. The point is that by stepping back and looking at the big picture, I belatedly realized there was a much easier, alternative means to accomplish my primary goal. In the back of your mind, you should constantly search for alternative means, too. The alternative means may just be a short term hack to buy yourself time to fix the problem in the right way, or it may be a permanent solution—only you can decide. But you should at least consider all options.

Never get so hung up on a detail that you overlook the fact that an alternative path would let you bypass this detail entirely.