I l@ve RuBoard

6.1 Introduction

Credit: Greg Wilson, Baltimore Technologies

Thirty years ago, in his classic The Mythical Man-Month: Essays on Software Engineering (Addison-Wesley), Fred Brooks drew a distinction between accidental and intrinsic complexity. Languages such as English and C++, with their inconsistent rules, exceptions, and special cases, are examples of the former: they make communication and programming harder than they need to be. Concurrency, on the other hand, is a prime example of the latter. Most people have to struggle to keep one chain of events straight in their minds. Keeping track of two, three, or a dozen, plus all of their possible interactions, is just plain hard.

Computer scientists began studying ways of running multiple processes safely and efficiently in a single physical address space in the mid-1960s. Since then, a rich theory has been developed in which assertions about the behavior of interacting processes can be formalized and proved, and entire languages devoted to concurrent and parallel programming have been created. Foundations of Multithreaded, Parallel, and Distributed Programming, by Gregory R. Andrews (Addison-Wesley), is not only an excellent introduction to this theory, but also contains a great deal of historical information tracing the development of major ideas.

Over the past 20 years, opportunity and necessity have conspired to make concurrency a part of programmers' everyday lives. The opportunity is for greater speed, which comes from the growing availability of multiprocessor machines. In the early 1980s, these were expensive curiosities; today, many programmers have dual-processor workstations on their desks and four-way or eight-way servers in the back room. If a calculation can be broken down into independent (or nearly independent) pieces, such machines can potentially solve them two, four, or eight times faster than their uniprocessor equivalents. While there are limits to the potential gains from this approach, it works well for problems as diverse as image processing, serving HTTP requests, and recompiling multiple source files.

In today's terminology, processes run in separate logical address spaces that are protected from each other. Using concurrent processing for performance purposes, particularly in multiprocessor machines, is more attractive with threads, which execute simultaneously within the same program, in the same address space, without being protected from each other. The lack of mutual protection allows lower overhead and easier and faster communication, particularly because of the shared address space. Since all threads run code from the same program, there are no special security risks caused by a lack of mutual protection, any more than there are in a single-threaded program. Thus, concurrency used for performance purposes is most often focused on adding threads to a single program.

However, adding threads to a Python program to speed it up is often not a successful strategy. The reason for this is the Global Interpreter Lock (GIL), which protects Python's internal data structures. This lock must be held by a thread before it can safely access Python objects. Without the lock, even simple operations (such as incrementing an integer) could fail.

Therefore, only the thread with the GIL can manipulate Python objects or call Python/C API functions. To make life easier for programmers, the interpreter releases and reacquires the lock every 10 bytecode instructions (a value that can be changed using sys.setcheckinterval). The lock is also released and reacquired around I/O operations, such as reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete. However, effective performance-boosting exploitation of multiple processors from multiple pure-Python threads of the same process is just not in the cards. Unless the CPU performance bottlenecks in your Python application are in C-coded extensions that release the GIL, you will not observe substantial performance increases by moving your multithreaded application to a multiprocessor machine.

The necessity for concurrent programming is largely because of the ever-growing importance of GUIs and network applications. Graphical interfaces often need to appear to be doing several things at once, such as displaying images while scrolling ads across the bottom of the screen. While it is possible to do the necessary interleaving manually, it is much simpler to code each operation on its own and let the underlying operating system decide on a concrete order of operations. Similarly, network applications often have to listen on several sockets at once or send data on one channel while receiving data on another.

Uniting these two types of applications is the fact that a GUI can't know when the user will press a key or move the mouse, and an HTTP server can't know which datagram will arrive next. Handling each stream of events with a separate control thread is often the simplest way to cope with this unpredictability, even on single-processor machines, and when high throughput is not an overriding concern. Event-driven programming can often be used in these kinds of applications as well, and Python frameworks such as Medusa and asyncore are proof that this approach often delivers excellent performance with complexity that, while different from that inherent in multithreading, is not necessarily larger.

The standard Python library allows programmers to approach multithreaded programming at two different levels. The core module, thread, is a thin wrapper around the basic primitives that any threading library must provide. Three of these primitives are used to create, identify, and end threads; others are used to create, test, acquire, and release simple mutual-exclusion locks (or binary semaphores). In general, programmers should avoid using these primitives directly, and should instead use the tools included in the higher-level threading module, which is substantially more programmer-friendly and has similar performance characteristics.

The most important elements of the threading module are classes that represent threads and various high-level synchronization constructs. The Thread class represents a separate control thread; it can be told what to do by passing a callable object to its constructor or by overriding its run method. One thread can start another by calling its start method or wait for it to complete by calling join. Python also supports daemon threads, which do background processing until all of the nondaemon threads in the program exit and shut themselves down automatically.

The synchronization constructs in the threading module include locks, reentrant locks (which a single thread can safely relock many times without deadlocking), counting semaphores, conditions, and events. Events can be used by one thread to signal others that something interesting has happened (e.g., that a new item has been added to a queue, or that it is now safe for the next thread to modify a shared data structure). The documentation that comes with Python describes each of these classes.

The relatively low number of recipes in this chapter, compared to others in this cookbook, reflects both Python's focus on programmer productivity (rather than absolute performance) and the degree to which other packages (such as httplib and wxWindows) hide the unpleasant details of concurrency in important application areas. This also reflects many Python programmers' tendencies to look for the simplest way to solve any particular problem, which complex threading rarely is.

However, this chapter's brevity may also reflect the Python community's underappreciation of the potential that simple threading has, when used appropriately, to simplify a programmer's life. The Queue module in particular supplies a delightfully self-contained synchronization and cooperation structure that can provide all the interthread supervision services you need. Consider a typical program, which accepts requests from a GUI (or from the network) and, as a result of such requests, will often find itself faced with a substantial chunk of work that might take so long to perform all at once that the program may appear unresponsive to the GUI (or network).

In a purely event-driven architecture, it may take considerable effort on the programmer's part to slice up the chunk into slices of work thin enough so each can be performed in idle time, without ever giving the appearance of unresponsiveness. Then, just a dash of multithreading can help considerably. The main thread pushes the substantial chunk of background work onto a dedicated Queue, then goes back to its task of making the program's interface appear responsive at all times.

At the other end of the Queue, a pool of daemonic worker threads await, each ready to peel a work request off the Queue and run it straight through. This kind of overall architecture combines event-driven and multithreaded approaches in the overarching ideal of simplicity, and is thus maximally Pythonic, even though you may need just a little bit more work if the result of a worker thread's efforts must be presented again to the main thread (via another Queue, of course), which is normally the case with GUIs. If you're willing to cheat just a little and use polling for the mostly event-driven main thread to access the result Queue back from the daemonic worker threads, see Recipe 9.7 to get an idea of how simple that little bit of work can be.

I l@ve RuBoard