I l@ve RuBoard

10.1 Introduction

Credit: Guido van Rossum, creator of Python

Network programming is one of my favorite Python applications. I wrote or started most of the network modules in the Python standard library, including the socket and select extension modules and most of the protocol client modules (such as ftplib), which set an example. I also wrote a popular server framework module, SocketServer, and two web browsers in Python, the first predating Mosaic. Need I say more?

Python's roots lie in a distributed operating system, Amoeba, which I helped design and implement in the late '80s. Python was originally intended to be the scripting language for Amoeba, since it turned out that the Unix shell, while ported to Amoeba, wasn't very useful for writing Amoeba system-administration scripts. Of course, I designed Python to be platform-independent from the start. Once Python was ported from Amoeba to Unix, I taught myself BSD socket programming by wrapping the socket primitives in a Python extension module and then experimenting with them using Python; this was one of the first extension modules.

This approach proved to be a great early testimony of Python's strengths. Writing socket code in C is tedious: the code necessary to do error checking on every call quickly overtakes the logic of the program. Quick: in which order should a server call accept, bind, connect, and listen? This is remarkably difficult to find out if all you have is a set of Unix manpages. In Python, you don't have to write separate error-handling code for each call, making the logic of the code stand out much clearer. You can also learn about sockets by experimenting in an interactive Python shell, where misconceptions about the proper order of calls and the argument values that each call requires are cleared up quickly through Python's immediate error messages.

Python has come a long way since those first days, and now few applications use the socket module directly; most use much higher-level modules such as urllib or smtplib. The examples in this chapter are a varied bunch: there are some that construct and send complex email messages, while others dig in the low-level bowels of the network implementation on a specific platform. My favorite is Recipe 10.13, which discusses PyHeartBeat: it's useful, it uses the socket module, and it's simple enough to be a good educational example.

The socket module itself is still the foundation of all network operations in Python. It's a plain transliteration of the socket APIs梖irst introduced in BSD Unix and now widespread on all platforms梚nto the object-oriented paradigm. You create socket objects by calling the socket.socket factory function, then calling methods on these objects to perform typical low-level network operations. Of course, you don't have to worry about allocating and freeing memory for buffers and the like桺ython handles that for you automatically. You express IP addresses as (host,port) pairs, in which host is a string in either dotted-quad ('1.2.3.4') or domain-name ('www.python.org') notation. As you can see, even low-level modules in Python aren't as low-level as all that.

But despite the various conveniences, the socket module still exposes the actual underlying functionality of your operating system's network sockets. If you're at all familiar with them, you'll quickly get the hang of Python's socket module, using Python's own Library Reference. You'll then be able to play with sockets interactively in Python to become a socket expert, if that is what you need. The classic work on this subject is UNIX Network Programming, Volume 1: Networking APIs - Sockets and XTI, Second Edition, by W. Richard Stevens (Prentice-Hall), and it is highly recommended. For many practical uses, however, higher-level modules will serve you better.

The Internet uses a sometimes dazzling variety of protocols and formats, and Python's standard library supports many of them. In Python's standard library, you will find dozens of modules dedicated to supporting specific Internet protocols (such as smtplib to support the SMTP protocol to send mail, nntplib to support the NNTP protocol to send and receive Network News, and so on). In addition, you'll find about as many modules that support specific Internet formats (such as htmllib to parse HTML data, the email package to parse and compose various formats related to email梚ncluding attachments and encoding梐nd so on).

Clearly, I cannot even come close to doing justice to the powerful array of tools mentioned in this introduction, nor will you find all of these modules and packages used in this chapter, nor in this book, nor in most programming shops. You may never need to write any program that deals with Network News, for example, so you will not need to study nntplib. But it is reassuring to know it's there (part of the "batteries included" approach of the Python standard library).

Two higher-level modules that stand out from the crowd, however, are urllib and urllib2. Each can deal with several protocols through the magic of URLs梩hose now-familiar strings, such as http://www.python.org/index.html, that identify a protocol (such as http), a host and port (such as www.python.org, port 80 being the default here), and a specific resource at that address (such as /index.html). urllib is rather simple to use, but urllib2 is more powerful and extensible. HTTP is the most popular protocol for URLs, but these modules also support several others, such as FTP and Gopher. In many cases, you'll be able to use these modules to write typical client-side scripts that interact with any of the supported protocols much quicker and with less effort than it might take with the various protocol-specific modules.

To illustrate, I'd like to conclude with a cookbook example of my own. It's similar to Recipe 10.7, but rather than a program fragment, it's a little script. I call it wget.py because it does everything for which I've ever needed wget. (In fact, I wrote it on a system where wget wasn't installed but Python was; writing wget.py was a more effective use of my time than downloading and installing the real thing.)

import sys, urllib
def reporthook(*a): print a
for url in sys.argv[1:]:
    i = url.rfind('/')
    file = url[i+1:]
    print url, "->", file
    urllib.urlretrieve(url, file, reporthook)

Pass it one or more URLs as command-line arguments; it retrieves those into local files whose names match the last components of the URLs. It also prints progress information of the form:

(block number, block size, total size)

Obviously, it's easy to improve on this; but it's only seven lines, it's readable, and it works梐nd that's what's so cool about Python.

Another cool thing about Python is that you can incrementally improve a program like this, and after it's grown by two or three orders of magnitude, it's still readable, and it still works! To see what this particular example might evolve into, check out Tools/webchecker/websucker.py in the Python source distribution. Enjoy!

I l@ve RuBoard