Winsock Programmer's FAQ: Articles

Winsock Programmer's FAQ
Section 7: Articles

Which I/O Strategy Should I Use?

by Warren Young

There are several different conventions for communicating with Winsock, and each method has distinct advantages. The question of the hour is, what are these advantages, and how does someone choose the convention that makes the most sense for their application? The choices are:

Blocking sockets - By default, a Winsock call blocks, meaning that it will not return until it has completed its task or has failed while trying.
Non-blocking sockets - Calls on non-blocking sockets return immediately, even if they cannot complete their task immediately. Although this allows the program to do other things while the network operations finish, it requires that the program repeatedly "poll" Winsock to keep apprised of its current state.
Asynchronous sockets - These are similar to non-blocking sockets in that calls on them will return immediately. The difference is that Winsock sends the program a special window message whenever something "interesting" happens.
Event objects - This is a kind of cross between non-blocking and asynchronous sockets. Instead of getting a window message when something interesting happens, Winsock signals a Win32 event object. You can block on one or more of these objects with WSAEventSelect(), much like blocking on non-blocking sockets with select().
Overlapped I/O - One of Winsock 2's major features is that it ties sockets into Win32's unified I/O mechanism. In particular, you can now use overlapped I/O on sockets, which is intrinsically more efficient than the above options.

Further confusing the issue are threads, because each of the above mechanisms changes in nature when used with threads.

In trying to find an answer to the "which I/O strategy" question, it becomes apparent that there are only a few major kinds of programs, and the successful ones follow the same patterns. From those patterns and practical experiencesome personal and some borrowedI have derived the following set of heuristics. None of these heuristics are absolute laws, no one isolated heuristic is sufficient, and the heuristics sometimes conflict. When two heuristics conflict, you need to decide which is more important to your application and ignore the other. However, beware of ignoring a heuristic simply because violating it does not create noticeable consequences for your program. If you get into the habit of ignoring a certain heuristic, it becomes useless.

The heuristics are ordered in terms of compatibility, then speed, and finally functionality. Compatibility is first, because if a given I/O strategy won't work on the platforms you need to support, it doesn't matter how fast or functional it is. Speed is next because performance requirements are easy to determine, and often important. Functionality is last, because once you decide the compatibility and speed issues, your choices become much more subjective.

Note: Aside from the compatibility table in Heuristic 1, this article no longer covers Windows 3.1 and Windows NT 3.5x issues, as it did in the past. Hopefully we can call these platforms really and truly dead by now.

Heuristic 1: Narrow your choices by deciding your compatibility requirements.

There are several kinds of I/O strategies mainly because of the large number of platforms involved. Winsock was created as a subset of BSD sockets, and then as new varieties of Windows arrived, Winsock was extended to take advantage of OS features.

	Win9x	WinCE	WinNT 4 Win2K	WinNT 3.x	Win16	Unix (most)
Blocking Sockets	yes	yes	yes	yes	yes	yes
Non-blocking Sockets	yes	yes	yes	yes	yes	yes
Asynchronous Sockets	yes	no	yes	yes	yes	no
Event Objects	yes	no	yes	no	no	no
Overlapped I/O	yes¹	no	yes	no	no	no²
Threads	yes	yes	yes	yes	no	yes³

The Windows 95 and 98 kernels do not support overlapped I/O, so it is emulated for sockets by the Winsock layer. This means that programs that only use overlapped I/O functionality guaranteed by the Winsock spec will run fine on Win9x. If, on the other hand, you stray into functionality that only WinNT/2000 provides, your application will fail on Win9x. One example of this is calling ReadFile() with a socket: this works fine on NT, but will fail on Win9x.
If you only need scatter/gather I/O support, BSD sockets provides this functionality in the readv() and writev() calls. There is no standard Unix mechanism that provides similar efficiencies to Win32's overlapped I/O. Some Unixes provide the aio_*() family of functions (called asynchronous I/O, but not related to Winsock's asynchronous I/O), but this is not implemented widely at the moment.
The Unix world is standardizing on the pthreads library. Pthreads is roughly equivalent to Win32's thread mechanism, though of course the APIs are totally different. There are still a lot of older Unix machines out there with poor, nonstandard or nonexistent threading. If you want your code to be portable between Windows and several varieties of Unix, you probably will not be able to use threads unless your target platform list is very carefully chosen.

Heuristic 2: Avoid non-blocking sockets.

Non-blocking sockets are almost never necessary, and a good thing, too: their [lack of] performance makes them a poor architecture choice for Windows programs.

When a socket is set as non-blocking, every Winsock call on that socket will return immediately, whether it was able to do anything or not. This is useful because it lets your program do other things while the network is busy.

Most programs don't have something to all the time: they're usually waiting on user input, or the network, or some other slow thing. For this reason, Winsock provides the select() function which blocks until something happens on one or more sockets. Without select(), you would have to add busy-loops to your program, which wastes CPU time. Unfortunately, select() isn't all that efficient itself. Four of its parameters are structures that you must set up each time you call the function, and three of these require that you loop over them N times after each call, where N is the number of sockets you passed to select().

About the only time you should use select() is for compatibility reasons: the only non-synchronous I/O strategy on Unix and Windows CE is non-blocking sockets, and the only way to block on a non-blocking socket in these OSes is with select(). In all other cases, there are better alternatives.

Heuristic 3: Avoid asynchronous sockets in programs that must deal with high volumes of data.

Window messages are the slowest way (aside from select()) to be notified when something happens on a socket. This isn't to say that Windows message queues are inefficient, just that they're not as efficient as other methods presented below. These queues are also fairly short, so they can fill up if you are not promptly handling window messages.

The spec says Winsock will try posting notification messages until it succeeds. Yet, there are persistent reports of window messages being lost in high-traffic situations. I suspect that these are the result of bad asynch I/O code, because there are several optimizations in Microsoft's asynch I/O implementation that make it intolerant of code that doesn't obey the spec. Who's to say you won't make the same mistake these other programmers are making?

On the other hand, there are well-known applications that do handle high volumes of traffic with asynchronous sockets. I assume this is due to very tolerant code, or very well-written code. It probably also helps that these applications are dedicated servers: they mainly sit in the background doing their thing, so they don't have a lot of non-Winsock messages competing for the attention of the program's message loop code.

Heuristic 4: For high-performance servers, prefer overlapped I/O.

Of all the various I/O strategies, overlapped I/O has the highest performance. (I/O completion ports are even more efficient, but are nonstandard vis-a-vis Winsock proper, so I don't cover them in the FAQ.) With careful use of overlapped I/O (and boatloads of memory in the server!) you can support tens of thousands of connections with a single server. No other I/O strategy comes close to the scalability of overlapped I/O.

Heuristic 5: To support a moderate number of connections, consider asynchronous sockets and event objects.

If your server only has to support a moderate number of connectionssay, between 100 and 1000you may not need overlapped I/O. Overlapped I/O is not easy to program, so if you don't need its efficiencies, you can save yourself a lot of trouble by using a simpler I/O strategy.

Programmed correctly, asynchronous sockets are a reasonable choice for a dedicated server supporting a moderate number of connections. The main problem with doing this is that many servers don't have a user interface, and thus no message loop. A server without a UI using asynchronous sockets would have to create an invisible window solely to support its asynchronous sockets. If your program already has a user interface, though, asynchronous sockets can be the least painful way to add a network server feature to it.

Another reasonable choice for handling a moderate number of connections is event objects. These are very efficient in and of themselves. The main problem you run into with them is that you cannot block on more than 64 event objects at a time. To block on more, you need to create multiple threads, each of which blocks on a subset of the event objects. Before choosing this method, consider that handling 1024 sockets requires 64 threads. Any time you have many more active threads than you have processors in the system, you start causing serious performance problems. Thus, call 1024 sockets a hard practical limit.

One caution: it's very easy to underestimate the number of simultaneous connections you will get on a public Internet server. It may make sense to design for massive scalability even if your estimates don't currently predict thousands of simultaneous clients. On the other hand, it's becoming clear that usable-but-weak code today always beats wonderful code next month.

Heuristic 6: Low-traffic servers can use most any I/O strategy.

For low-traffic servers, there isn't much call to be super-efficient. Some servers just don't have to support very many connections, and if you're deploying on Win9x you're already going to be limited to 100 sockets at a time. Suitable strategies for 1-100 connections are event objects, asynchronous sockets, and threads with blocking sockets.

We've covered the first two methods already, so let's consider threads with blocking sockets. This is by far the simplest way to write a server. You just have a main loop that accepts connections and spins each new connection off to its own thread, where it's handled with blocking sockets. Blocking sockets have several advantages. They are efficient, because when a thread blocks, the operating system immediately lets other threads run. Also, synchronous code is more straightforward than equivalent non-synchronous code.

The problem is that this method doesn't scale well at all. Recall the discussion of event objects: if the number of active threads outnumbers the number of processors in the system to a great degree, you run into efficiency problems. So, this method is only suitable for a fairly small number of connections, or a moderate number of connections that are mostly idle.

Heuristic 7: Do not block inside a user interface thread.

This heuristic sounds more like a straightforward rule of Windows programming, but I bring it up because most programs are single-threaded. In a single-threaded GUI program, any time Winsock blocks, buttons can't be pressed, menus won't pull down, scroll bars won't move, keypresses are ignored...your UI freezes.

Heuristic 8: For GUI client programs, prefer asynchronous sockets.

There are two reasons for this Heuristic:

Asynchronous sockets were designed from the start to work well with GUI programs. You already have a window loop going, and you already have window management code in the rest of the program. Adding asynchronous network I/O is about as easy as adding a dialog to your program.
All of the alternatives require at least one additional thread to handle the networking in order to satisfy the previous Heuristic. With asynchronous sockets, you can handle both the network and the UI with a single thread. Since window messages are handled one at a time in the order they arrive, everything is automatically synchronized.

Heuristic 9: Threads are rarely helpful in client programs.

When a programmer first learns about threads, he is eager to try them out in his own programs. He sees that they have several advantages, but he doesn't yet see the drawbacks. Unfortunately for the soon-to-be-educated newbie, these drawbacks can have very significant consequences.

One real benefit of threads is that a thread doing I/O on a blocking socket has a linear control flow, and is therefore easier to understand. Asynchronous code is more spread out, so it is harder to write and debug.

Another perceived benefit of threads is a kind of encapsulation: a programmer can split a program up into a number of threads, each of which has a single well-defined task. But, this is only valid if each thread is mostly independent from the rest of the program. If not, the threads will have to share data through a common data structure, destroying any potential encapsulation.

In the end, the biggest problem with threads is also related to shared data structures: synchronization. This issue is covered better elsewhere, so I won't spend many words on it here. In short, synchronization is hard to get right: poorly-synchronized threads are subject to serialization delays, context switching overhead, deadlocks, race conditions and corrupted data. These are hard problems, and for most programs the benefits are not large enough to make them worth overcoming.

A saner alternative is to use asynchronous I/O. This buys you the synchronization benefits described in the previous Heuristic. You can even partition the application in a similar manner to threads by creating an invisible window for each socket. If you have two different types of sockets, each socket can have its notifications sent to a different type of window. In straight API terms it means a separate WndProc() for each type of socket. In terms of frameworks like MFC, you can put the code for each type of socket in a different subclass of CWnd.

Heuristic 10: Use threads only when their effect on the rest of the program is easily contained.

The previous Heuristic cautions that threads are often very hard to program correctly, but the truth is that they are sometimes very useful. You can make an educated guess about whether threads will improve the program by doing a bit of design work: is there a clean interface between each thread and the rest of the program? If so, synchronization becomes simple. If not, you're going to end up with a mess that crashes and destroys data unpredictably.

Examples where threads are viable are:

An FTP server. One way to write an FTP server is to let the main thread accept the incoming network connections, and send each one to a separate thread. Then, each thread can process the incoming FTP commands, send any required replies, and terminate when the session closes. Because each thread never has to interact with any other, and they all act alike, this is an ideal application of threads. (But, keep in mind the previous server-related Heuristics: one thread per client severely limits your server's scalability.)
A web browser. When you download a file with a modern web browser, the file comes down in the background, so that you can continue browsing. That download stream is most likely handled by a dedicated thread.
An email program. In an email program, the primary focus is usually on reading and writing email. However, when an email message needs to be sent, it is best not to interrupt the user's work. You can send that message with a separate network thread, since the process affects the rest of the program only minimally.
A stock ticker. Reduced to basics, a stock ticker simply displays a small amount of continuous real-time data in a pleasing and useful format. When the amount of network data involved is low, the thread synchronization overhead becomes negligible. Plus, this kind of application only has a single data structure that needs protection; the really big synchronization problems appear when multiple data structures need to be protected.

Conclusion

It is my hope that you find these heuristics helpful. Although you may not agree with each of them, I think that they will at least make you think about your own choices. Design is a highly subjective enterprise, and this list is based mainly on my own thoughts and preferences.

Special thanks go to Philippe Jounin for his comments on the 1998 version of this paper. The 2000 version reflects my greater experience, as well as commentary from David Schwartz and Alun Jones, both of whom expanded my ideas of the proper way to build a Winsock server.

<< Winsock for the Impatient	Effective TCP/IP >>
Last modified on 29 April 2000 at 15:52 UTC-7	Please send corrections to tangent@cyberport.com.

< Go to the main FAQ page

<< Go to my Programming pages

<<< Go to my Home Page