I once wrote a program that created a temporary user account in the Windows Active Directory. Everything worked beautifully until one customer reported my program failed with the error message, "The server is unwilling to process the request." Talk about a rude server! But I was never able to reproduce the bug on my computers—it worked 100 percent of the time for me.
I quickly discovered which Windows API was causing the error, but I couldn't figure out why. The customer assured me he could manually create user accounts with no problems, and I spent a long time trying to track down why his computer was any different than mine. The customer said he couldn't think of any obvious configuration differences. But finally, after much searching, I discovered his computer had a Windows Password Policy, and then everything fell into place.
A Windows Password Policy ensures every user's password is greater than a certain length, or contains a certain number of numerical characters. Our application created randomly generated passwords with two non-alphanumeric characters, but the customer's Password Policy required at least three non-alphanumeric characters. That's why the application failed to create the accounts, and the unusual "Server is unwilling to process the request" message was merely a red herring. From the customer's point of view, password policies aren't unusual, which was why he didn't mention he had one. This bug was trivial to fix—but only after my coworkers and I spent days trying to discover the cause.
Incidentally, this problem turned out to be a common bug. That customer later told me he had nearly switched to my company's chief competitor—until he tried a demo version of their software and discovered their version had the exact same problem as ours did, because they made the exact same mistake!
Even in the best of cases where the customer is willing and able to provide information about the bug, you'll find it's still much more difficult to track down bugs on the customer's machine than on your own. This leads to one inescapable truth about debugging, described in the following tip.
Given sufficient time, you will eventually be able to fix any bug that you can consistently reproduce. But if you cannot reproduce the problem, then no guarantees are possible.
Anytime you hear about a customer bug, your very first step must be to get all the information needed to reproduce it on your own computer. After all, a significant portion of reported bugs turn out to be mere user error, and you'll never be able to fix those bugs no matter how hard you try. But once you can reproduce the bug, you're in the home stretch. You've eliminated the possibility of user error. You no longer have to rely on the customer's description of the problem, which may be incomplete or inaccurate. You can run as many tests as you want without having to annoying the customer by asking her to do it for you (and you don't have to wait overnight for a response, either). You can use your debugger to step over the code.
But the original bug report that the customer filed usually won't contain enough information for you to reproduce the problem. Usually there is a crucial step the customer failed to mention. Some customers will even give you wildly conflicting reports of the repro steps for the bug. Some will be angry about the bug, and will not be willing to run the quick test you need to diagnose the system. You will often need to drag the information out of them.