15—
Using the Basic Internet Protocols

Python grew up around the same time as the Internet. For the first few years, both Python and the Internet ran mainly on various flavors of Unix. Therefore, it's no surprise to find that Python has excellent support for most of the common Internet protocols in use today. Fortunately, this heritage has moved to the Windows platform.

This chapter shows how to use many of the common Internet protocols from Python on Windows. For information on the SMTP mail protocol, see Chapter 14, Working with Email.

HTTP and HTML

The Hypertext Transfer Protocol (HTTP) is one of the most widely used Internet Protocols. Anyone who has ever used a web browser has used HTTP. HTTP is a protocol for moving data across a network. Most often, the data is formatted as Hypertext Markup Language (HTML). Thus, HTTP defines how to obtain the data, and HTML defines how the data is arranged.

An HTTP server program is run on a computer set up to accept connections from client computers. The client computer connects to the HTTP server, issues a request for some content (typically a filename), and tells the server the type of data it wishes to receive. The HTTP server locates the content, and sends the data back to the client computer. The data consists of a number of headers (lines that describe the data) and the data itself. A full description of the HTTP protocol can be found on the Web at http://www.w3.org/hypertext/WWW/Protocols/.

Fetching Data via HTTP

The Python module httplib defines a class for fetching data via HTTP. As is typical with Python, only a few lines of code are needed to fetch a document via HTTP. Let's experiment with it from an interactive Python session.

First, import the Python module and instantiate the HTTP class. The HTTP class requires the name of the server you wish to connect to. Let's connect to the Python home page:

>>> import httplib >>> http=httplib.HTTP('www.python.org') >>>

Now you need to tell the remote server the data to retrieve and the data formats to accept. Ask the server to fetch the main index page and designate whether to accept plain text or HTML text:

>>> http.putrequest('GET', '/index.html') >>> http.putheader('Accept', 'text/html') >>> http.putheader('Accept', 'text/plain') >>> http.endheaders() >>>

All that remains is to ask for the data. The getreply() method does this, and returns three items: the error code, the error message, and the headers sent by the server. Make this call and print the result:

>>> errcode, errmsg, headers = http.getreply() >>> print errcode, errmsg, headers 200 OK <mimetools.Message instance at 1073680> >>>

HTTP defines the code 200 as success, and it's reflected in the error message. The headers object retrieved is an instance of another Python class. This Python class can be used in the same way as a Python dictionary, so let's see what it contains:

>>> len(headers) 8

There are eight headers from the server. You can loop and print them all, using standard Python dictionary semantics:

>>> for key, value in headers.items(): ?nbsp; print key, "=", value ?BR> server = Apache/1.2.0 content-type = text/html accept-ranges = bytes date = Wed, 13 Jan 1999 06:41:15 GMT connection = close etag = "f4d6-2d66-369294d0" content-length = 11622 last-modified = Tue, 05 Jan 1999 22:40:16 GMT >>>

This reveals some interesting facts about the server, including the date the home page was last modified and the HTTP server software used. The content-length header says how many bytes are in the data itself. The getfile() method can obtain a file that can read the data:

>>> file=http.getfile() >>>

But rather than print all 11 KB of data, you can check to see that you do indeed have all the data:

>>> print len(file.read()) 11622

Reading the file gives the exact number of bytes expected. Obviously, you can do something useful with this data, such as write it to a local file.

Serving Data via HTTP

Python can also act as an HTTP server. The standard Python library contains a number of modules to act as the basis for your own HTTP server; in fact, it even comes with a basic HTTP server all ready to go.

SimpleHTTPServer.py

The Python module SimpleHTTPServer.py implements, as its name suggests, a simple HTTP server. For information on how to run this server, open SimpleHTTPServer.py in any text editor, and read the instructions.

Implementing an HTTP redirector

As an example, let's implement our own special HTTP server. Our HTTP server functions similarly to a proxy server: it accepts requests and redirects those requests to another server. For example, if you ask the server to redirect to www.python.org, that server appears to have the same content as www.python.org. Thus, people can access www.python.org via our server.

The implementation is straightforward. Extend the basic Python HTTP server code, but instead of searching for the file, simply open a HTTP connection to the remote server and redirect the data to your own client:

# HTTPRedirector.py # An HTTP Server that redirects all requests to a named, remote server. # BaseHTTPServer provides the basic HTTP Server functionality. import BaseHTTPServer # httplib establishes our connection to the remote server import httplib import socket # For the error! # The server we are redirecting to. g_RemoteServerName = "www.python.org" class HTTPRedirector(BaseHTTPServer.BaseHTTPRequestHandler): # This function is called when a client makes a GET request # ie, it wants the headers, and the data. def do_GET(self): srcfile = self.send_headers("GET") if srcfile: # Copy the data from the remote server # back to the client. BLOCKSIZE = 8192 while 1: # Read a block from the remote. data = srcfile.read(BLOCKSIZE) if not data: break self.wfile.write(data) srcfile.close() # This function is called when a client makes a HEAD request # i.e., it only wants the headers, not the data. def do_HEAD(self): srcfile = self.send_headers("HEAD") if srcfile: srcfile.close() # A private function which handles all the redirection logic. def send_headers(self, request): # Establish a remote connection try: http = httplib.HTTP(g_RemoteServerName) except socket.error, problem: print "Error - Cannot connect to %s: %s" \ % (g_RemoteServerName, problem) return # Resend all the headers we retrieved in the request. http.putrequest(request, self.path) for header, val in self.headers.items(): http.putheader(header, val) http.endheaders() # Now get the response from the remote server errcode, errmsg, headers = http.getreply() self.send_response(errcode, errmsg) # Send the headers back to the client. for header, val in headers.items(): self.send_header(header, val) self.end_headers() if errcode==200: return http.getfile() if __name__=='__main__': print "Redirecting HTTP requests to", g_RemoteServerName BaseHTTPServer.test(HTTPRedirector)

To test the server, simply execute the script:

C:\Scripts>python HTTPRedirector.py Redirecting HTTP requests to www.python.org
Serving HTTP on port 8000 ?/TT>

Now you can establish a connection to the server. Note the server is using port 8000 for requests. Since this is not the default HTTP port, you need to specify it in your URL. Open your browser and enter the following URL: http://localhost:8000/. If you look at the server window, you see the following messages as the page is delivered to the browser: localhost - - [13/Jan/1999 22:08:31] "GET /pics/PyBanner004.gif HTTP/1.1" 200 - localhost - - [13/Jan/1999 22:08:47] "GET /pics/PythonPoweredSmall.gif HTTP/1.1" 200 - localhost - - [13/Jan/1999 22:09:03] "GET /pics/pythonHi.gif HTTP/1.1" 200 - ?/TT> And the Python home page appears in the browser! FTP The File Transfer Protocol (FTP) transfers files across a network. The Python module ftplib supports this protocol. An FTP server program is run on a computer client computers can connect to. The client computer sendsty5e transfer is initiated, a new connection exclusively for the data is established between the client and the server. Fetching Data via FTP The ftplib module is used in much the same way as the httplib module: a single class, FTP, provides all of the functionality. The FTP protocol supports a variety of commands, which include such operations as logging in, navigating the filesystem, and retrieving directory listings. Let's create an FTP session: >>> import ftplib >>> ftp = ftplib.FTP('ftp.python.org') # connect to host, default port >>> Log on as an anonymous user: >>> ftp.login('anonymous', 'your@email.address') "230-WELCOME to python.org, the Python programming language ?quot; >>> Get a directory listing: >>> ftp.retrlines('LIST') # list directory contents total 38 drwxrwxr-x 11 root 4127 512 Aug 28 20:23 . ?BR> -r--r--r-- 1 klm 1000 764 Aug 25 19:32 welcome.msg '226 Transfer complete.' Notice there's a file welcome.msg: let's download the file. Open a local file and indicate its write method should be called to store the data: >>> file=open("welcome.msg", "w") >>> ftp.retrlines("retr welcome.msg", file.write) '226 Transfer complete.' >>> file.close() Now reopen the file and print the data: >>> open("welcome.msg", "r").read() "WELCOME to python.org, the Python programming language home site. ?quot; >>> To retrieve a binary file (such as an executable), you could use the method retrbinary(); it takes the same methods as retrlines(), except it also allows you to specify a block size for the transfer. In this case you should remember to open the file itself in binary mode, as discussed in Chapter 3, Python on Windows. NNTP The Network News Transfer Protocol (NNTP) exchanges news articles over a network. Whenever you run a news reader, it uses the NTTP protocol to read and post news articles. An NNTP server program is run on a computer client computers can connect to. The NNTP protocol is text-based: all communications between the client and server use ASCII text. The NNTP protocol is similar to the SMTP mail protocol we discussed in the previous chapter. Clients send requests or news articles, and the server responds with responses and possibly a news article. News articles are structured similar to Internet mail messages; the body of the article follows a list of headers. Fetching News Articles via NNTP It should come as no surprise that a Python module nntplib supports the NNTP protocol. Following the style of the other Internet-related modules, a single class NNTP implements all functionality. The NNTP protocol supports a wide variety of commands for determining which articles exist on the server computer. Information on these commands is beyond the scope of this book; you should refer to the NNTP protocol standard or the nntplib module itself for further information. However, to whet your appetite, let's create a sample program that scans a newsgroup for a list of articles with a specific word in their subject. It generates an HTML file, then fires your browser with the news articles hyperlinked: # SimpleNewsViewer.py # Finds all news articles in a news group that have a specific word # in its subject. Then writes the results to a HTML file for # easy reading. # eg, running: # c:\> SimpleNewsViewer.py comp.lang.python python # # Will generate "comp.lang.python.html", and execute your # browser on this file. import sys, string import nntplib import win32api # to execute our browser. g_newsserver = 'news-server.c3.telstra-mm.net.au' def MakeNewsPage(groupname, subjectsearch, outfile ): print "Connecting?quot; nntp=nntplib.NNTP(g_newsserver) print "Fetching group information" # Most functions return the raw server response first. resp, numarts, first, last, name = nntp.group(groupname) # Get the subject line from these messages. print "Getting article information?quot; resp, data = nntp.xover(first, last) for artnum, subject, poster, time, id, references, size, numlines in data: # We will match on any case! subjectlook=string.lower(subject) if string.find(subjectlook, string.lower(subjectsearch))>=0: # Translate the "<" and ">" chars. subject = string.replace(subjectlook, "<", "&lt") poster = string.replace(poster, "<", "&lt") subject = string.replace(subject, ">", "&gt") poster = string.replace(poster, ">", "&gt") # Build a href href = "news:%s" % id[1:-1] # Write the HTML outfile.write('<P>From %s on %s<BR><a HREF="%s">%s</a>\n' \ % (poster, time, href, subject) outfile.close() if __name__=='main__': if len(sys.argv)<3: print "usage: %s groupname, searchstring" % sys.argv[0] sys.exit(1) groupname = sys.argv[1] search = sys.argv[2] outname = groupname + ".htm" # Open the outfile file. outfile = open(outname, "w") MakeNewsPage(groupname, search, outfile) print "Done - Executing", outname win32api.ShellExecute(0, "open", outname, None, "", 1) Now run this program using syntax such as: C:\Scripts>SimpleNewsViewer comp.lang.python python Connecting?BR> Fetching group information Getting article information?BR> Done - Executing comp.lang.python.htm You should find your browser opened with a list of news articles that match the search. Clicking on one of the links opens your news-reading software and the article. Conclusion In this chapter we have presented a quick look at some of the common Internet protocols and how they can be used from Python. Although we did not discuss any of the protocols in great detail, we demonstrated some of the basic concepts and provided pointers to further information on the relevant protocols. Python is used extensively in domains that require these and similar tasks. Although we have presented a few of the common Internet protocols, you are almost certain to find that a Python module already exists to help you out regardless of your specific requirements. Back

15— Using the Basic Internet Protocols

HTTP and HTML

Fetching Data via HTTP

Serving Data via HTTP

SimpleHTTPServer.py

Implementing an HTTP redirector

FTP

Fetching Data via FTP

NNTP

Fetching News Articles via NNTP

Conclusion

15—
Using the Basic Internet Protocols