Causes of Common Threading Problems

There are three main types of threading bugs: race conditions, deadlocks, and starvations. In all three cases, the bug occurs only if the threads run in a certain order—it's a fundamental rule of multithreading that you can't predict when one thread will stop running and another will take its place, and this uncertainty is what causes most of the problems. Nearly every threading bug in the world would not have occurred if one thread had spent just a little longer to finish its task before yielding back to the other thread, or if that first thread had merely sat out entirely until the second thread had acquired a crucial resource. Let's examine how these common threading errors come about.

Race Conditions

Consider the following code snippet. It doesn't do anything at all useful, but it's straightforward and the assert should always be true, right?

class MyTestClass {
  public static int x = 0;
  public static void DoUselessStuff() {
    while (true) {
     int m = x;
     x++;
     System.Diagnostics.Debug.Assert ( m + 1 == x );
    }
 }
 public static void Main() {
  DoUselessStuff();
 }
}

In the normal, single-threaded world, that function works perfectly. But suppose we start two threads running that code.

using System.Threading;
class MyTestClass {
   public static int x = 0;
   public static void DoUselessStuff() {
     while (true) {
       int m = x;
       x++;
       System.Diagnostics.Debug.Assert ( m + 1 == x );
     }
   }
   public static void Main() {
     Thread t1 = new Thread(new ThreadStart(DoUselessStuff));
     Thread t2 = new Thread(new ThreadStart(DoUselessStuff));
     t1.Start();
     t2.Start();
   }
}

Now it's just a matter of time until we see the assert fail. But how can that be? This function worked perfectly fine when we ran it on a normal, single-threaded program. The fact that we're running it on two different threads simultaneously shouldn't make any difference, because isn't each thread supposed to be independent?

Here's the solution to the mystery: Both threads share the same x variable. Since both threads are modifying x at "the same time," one thread will eventually increment x at an inconvenient moment for the other thread. The exact steps look something like this:

Thread #1 is running. It passes the int m = x; line and then it executes the x++ line. Now x equals 1 and m equals 0.
Thread #1 pauses and thread #2 starts. It passes the int m = x; line and then it executes x++. Now x equals 2 and m equals 0.
One thread or the other—it doesn't matter which—now runs the assert to check if m + 1 == x. Since x equals 2 and m equals 0, the assert fails.

This problem is called a race condition, and it's among the toughest bugs you'll ever see. Race conditions don't happen all the time—only when the threads line up exactly right. In the preceding set of conditions, suppose thread #1 hadn't yielded control to thread #2 until after checking the assert. In that case, everything would have been fine. Since this particular example uses a while(true) loop to ensure disaster has plenty of opportunities to strike, the problem surfaces quickly. But in real-world scenarios, the threads often just happen to line up right most of the time, so the code usually works fine even though there's a ticking time bomb waiting to happen. The only way to make the bug appear then is to give a demo to your CEO.

Note

Of course, you could entirely avoid this problem by eliminating all variables that are shared across threads. If no two threads ever modify the same piece of data, then the whole issue goes away. But how useful is that? To do useful work, your threads will almost always have to communicate with each other, and that means two threads simultaneously reading/modifying the same data.

Every multithreaded program has a window of disaster where the threads could switch in exactly the right order to cause problems. So we need some way to tell the program when it's not allowed to switch threads. That process is called synchronization, and fortunately, the .NET threading API provides plenty of support for it.

Synchronization

Synchronization is how we prevent two threads from mucking with a variable at the same time. There are dozens of different synchronization methods: mutexes, semaphores, monitors, critical sections, etc., and discussing them all would fill up an entire book. But the easiest to use are C#'s lock statement and VB.NET's SyncLock statement. These keywords basically say, "Any section of code that's guarded by this lock can be accessed by only one thread at a time."

We can use the lock statement to fix the DoUselessStuff function:

void DoUselessStuff() {
  while (true) {
    lock(this) {
     int m = x;
     x++;
     System.Diagnostics.Debug.Assert ( m + 1 == x );
    }
  }
}

Now when the two threads reach the lock(this) line, only one of them will be allowed to proceed. The other will sleep until the first one leaves the block, and then the second thread will get a chance to enter the block. Net result: Only one thread will modify the x variable at a time, and the race condition problem is solved! This example was extremely trivial, of course, but it does illustrate the basic idea of race conditions. When two threads might try to read and write the same data at the same time, use synchronization to prevent unpredictable behavior.

The lock and SyncLock Statements

The lock and SyncLock statements require a parameter that serves as a key. That key is important because it not only provides the ability to prevent multiple threads from running the same lines of code simultaneously, but it also lets us prevent multiple threads from running completely different pieces of code. Suppose a certain variable was modified in functions A and B. Using the same parameter for lock in both functions will ensure no two threads ever enter both functions at the same time.

Because the lock and SyncLock statements require a reference type as their parameter, you need to have an appropriate object handy. One common technique is to use the this keyword as your parameter for lock—if you do that consistently, it ensures that no two threads will be able to simultaneously modify any member data of your object.

However, how do you extend this trick to static methods? At first, it's not at all clear what object should be used as the parameter on static methods, but one good choice is typeof(MyObjectClass). That's a convenient object that's easy to access from any function, and it works in static methods as well as non-statics. Be aware of the performance implications of this, though—it prevents even two entirely separate instances of this object from simultaneously entering their functions, even though that might be more aggressive locking than you actually require.

Deadlocks

Once you start using thread synchronization, it's easy to mess yourself up in another way. Remember that while one thread is holding a lock, no other thread can proceed on any piece of code locked by the same variable. So suppose you have four sections of thread-hot code: Sections 1 and 2 shouldn't be run at the same time, and sections 3 and 4 are mutually exclusive as well; but there's nothing wrong with running either 1 or 2 at the same time as either 3 or 4. You could prevent race conditions with a single lock on all four sections, but that reduces the amount of time that can be spent running multiple threads at once. It's more efficient to create two different locks, one for each hot section.

As soon as you start down the path of multiple locks, though, you risk running into an insidious problem known as deadlock. Let's say we have one thread running the following code:

class MyTestClass {
   MyTestClass X = new MyTestClass();
   MyTestClass Y = new MyTestClass();

   public void FirstThread() {
     lock(X) {
       lock(Y) {
          ...//Do some stuff
       }
     }
   }
}

And say our second thread runs this code:

class MyTestClass {
   ... //Code shown above


   public void SecondThread() {
      lock(Y) {
        lock(X) {
           ...//Do some stuff
        }
      }
   }
}

Now let's start two threads running those two functions simultaneously. If we're lucky and the first thread gets both the X and Y locks before the second thread has a chance to start, then there will be no problem at all. The first thread will finish its business and then release both locks, which allows the second thread to acquire the locks and do what it needs. But imagine this scenario:

Thread #1 is running and acquires the X lock.
Before thread #1 can acquire the Y lock, thread #2 starts and grabs it.
Since thread #2 can't proceed without the X lock (which is currently held by thread #1), this thread goes to sleep until thread #1 releases X.
Since thread #1 can't proceed without the Y lock (which is currently held by thread #2), this thread goes to sleep until thread #2 releases Y.
Neither thread can ever finish until the other thread finishes first. In fact, your entire program may be stalled.

The nice thing about deadlocks is that at least they're relatively easy to detect when they happen. You may or may not notice when your code runs incorrectly, but you'll almost certainly notice when a deadlock prevents your code from running at all. Still, as with all threading bugs, the threads might line up to cause a flawless execution in one test, but a deadlock in the next. Just because the code is working doesn't mean there's not a deadlock bug waiting to happen.

Starvation

One of the fundamental rules of threading is that each thread must be given a chance to run. It's OK to give one thread more chances to run than another, but it must not be the case that any thread so completely monopolizes the CPU that another thread is starved and denied the chance to run at all. If your thread never gets the chance to run, then it will never finish, and that's clearly bad. Unless, of course, that's exactly what you wanted.

Under normal circumstances, no thread should be allowed to starve. But what if you had one thread doing truly urgent work (controlling your life support machine, say) and another thread doing trivial junk (such as controlling a Star Trek themed screensaver)? In that situation, wouldn't you want the system to run the life support machine thread for as long as needed, even if that means the screen-saver thread will never get a chance to run? If your computer can find time to run both the screensaver thread and the life support thread, then great! But if the computer has time to run only one, then you need to make sure the system chooses the right one..NET allows you to control this by assigning priorities to each thread.

Note

The different thread priorities are found in the enum System.Threading.ThreadPriority. They are Lowest, BelowNormal, Normal, AboveNormal, and Highest. These priorities are meaningful only relative to each other: If every thread has AboveNormal priority, then AboveNormal has just become the new norm, right? But of course, a system with both Lowest and AboveNormal threads will give preference to the latter type.

If you want one thread to be substantially more important than another, set the priority of that thread appropriately. Beware, though, that doing this risks thread starvation. The operating system will take care of the higher priority threads first, but if those threads have enough work to occupy them full time, then the lower priority threads may never get a chance to run at all. Maybe that's what you want. Or maybe not. You need to decide whether that's acceptable and modify your code accordingly.

Caution

Using the Highest thread priority is particularly dangerous because that trumps even Windows system threads and virtually guarantees starvation of other threads. Only use this priority if you know what you're doing.

Starvation bugs are relatively easy to recognize and fix. If one of your threads isn't running and if you've already ruled out deadlock, then it must be starved. To fix it, just figure out which of your other threads has a higher priority and either lower the priority or else modify that code to explicitly yield control once in a while. That will ensure each thread gets a fair chance. For most developers, though, starvation bugs should be extremely rare because the need to change thread priorities should be extremely rare. Microsoft spent a great deal of work on their thread scheduling algorithm, and for most applications, it works well enough that there's usually no need to override it with different thread priorities.

Assume Multithreaded Programs Are Either 100 Percent Working or 0 Percent Working

With most bugs, you can assign a priority based on how often the bug occurs. For example, suppose a certain bug only happens when copying a very large file from a server with a particular version of a particular video driver at the same time as someone unplugs the printer while an archaic version of Microsoft Word is trying to print Sanskrit characters in bold. That may be a legitimate bug, but in the real world, it'll occur so rarely that it's not worth your effort to fix it. Concentrate on fixing bugs that happen more often instead—bugs that actually make a difference to the user.

It's tempting to use the same rationalization with multithreaded programs. Suppose a deadlock happened once on one particular server, but never occurred again. Using the same logic as before, you might conclude this deadlock is too rare to bother fixing. But hold on! The first example was rare because the necessary configuration was rare. People who don't have that exact configuration will never hit the bug, no matter what. On the other hand, the deadlock happens whenever one thread yields to another at exactly the right moment. It may be rare on your machine, but it could potentially happen on any machine. Just because you can't reproduce it doesn't mean it won't show up again later.

I've seen many multithreaded programs get into serious trouble because the programmers got the threads working "most of the time." Once in a while, the program would freeze up, but it occurred seldom enough that the programmers just restarted the program and carried on. The programmers hoped the bug was rare enough to be deferred. Unfortunately, the only reason a threading problem is rare is just dumb luck. Get a slightly faster or slightly slower machine and the bug will suddenly happen all the time because the threads start lining up differently.

Just because a threading problem is rare on your machine doesn't mean it will also be rare on everyone else's. "Rare" threading problems are ticking time bombs that will explode with everyday commonality on your customers' machines—prioritize these thread-related bugs accordingly.