17—
Processes and Files

This chapter covers techniques for working with files and processes. We first look at the facilities Python has for portably working with both files and processes, then we'll look at Windows-specific techniques. If you're an experienced Python developer, you may wish to skip to the later sections; if you're new to Python, this is essential groundwork.

We cover techniques for finding, moving, and rearranging files, look at file objects themselves, and then cover some of the standard Python idioms for reading and writing them. Then we look at techniques for starting, stopping, and generally working with processes.

Portable File Manipulation

Python has excellent built-in file support that works on all platforms supported by Python.

Working with Files on Disk

Most of the key file-manipulation functions live in the os module and an associated module called os.path. To provide a degree of platform independence, os loads in the right module for your platform. os provides basic file-handling functions, and os.path handles operations on paths and filenames. On Windows, these modules are called nt and ntpath respectively, although they should always be referred to as os and os.path. The functions in the os module generally accept the same arguments as their corresponding MS-DOS commands. Table 17-1 depicts the os module's file and directory functions.

Table 17-1.File and Directory Functions
Module and Function	Description
`os.getcwd()`	Gets the current working directory.
`os.chdir(newdir)`	Changes the current working directory.
`os.rmdir(dir)`	Removes a directory, is allowed.
`os.mkdir(newdir)`	Creates a directory, supplies either an absolute path or a subdirectory name to go under the current directory.
`os.exists(name)`	Says is something exists, but doesn't say if it's a file or directory.
`os.isdir(dirname)`	Says that a directory exists.
`os.isfile(filename)`	Says that a file exists. `filename` may include a path; if not, it looks in the current directory.
`os.listdir`	Return a list of and directions within the given directory.
`os.glob(pattern)`	`Returns a list of files matching the given pattern (using expressions such as` `dir` `.doc` is known as file globbing* on Unix, hence the name). Just like the command prompt and most other Windows tools, the pattern accepts the ? character to match a single character or the * character to match any number of character. If you need to use true regular expressions^a to match filenames, use `os.listrdir()` and the `re` module.
^aa regular expression uses patterns to match strings. The filename-matching capabilities described are similar concept to regular expressions, although the regular expression provided by the Python re module provide a syntax similar to Perl and offer far more matching than the simple filename-matching patterns described here.

Here are some quick examples:

>>> import os >>> os.getcwd() 'C:\\Program Files\\Python' >>> os.chdir('C:\\temp') >>> os.mkdir('subdirectory1') >>> os.rmdir('subdirectory1') >>>

What's with the \\? This is Python's literal string syntax. Python lets you directly enter special characters at the interactive prompt or in strings embedded in your code. For example, \n means a newline, \t is a tab, and \123 is the octal number 123. If you just want a plain old slash, you have to type \\. The only place where this feels slightly weird is in manipulating filenames. Remember to double all your slashes. An alternative is to use a forward slash (like c:/temp); but Python always gives backslashes when you ask for directory lists on Windows:

>>> mydir = 'c:\\data\\project\\oreilly\\text' >>> os.path.exists(mydir) 1 >>> os.path.isdir(mydir) 1 >>> os.path.isfile(mydir) #hope not 0 >>> os.listdir(mydir) ['ChapterXX.doc', '00index.txt', 匽 >>> import glob >>> glob.glob(mydir + '\\' + '*files*.doc') ['c:\\data\\project\\oreilly\\text\\Chapter_-_Processes_and_Files1.doc', 'c:\\ data\\project\\oreilly\\text\\files.doc', 'c:\\data\\project\\oreilly\\text\\ Chapter_-_PythonFiles.doc'] >>>

Note that if you don't want full paths from glob, chdir into the directory first.

Working with Paths and Filenames

The os.path module provides platform-independent routines for chopping up and putting together filenames. os.path.split (path) separates a full path into the directory and filename components; os.path.splitext (filename) separates the filename (and path, if present) from the extension.

As discussed, DOS and Windows use a backslash to separate directories. We shouldn't have used the line glob.glob (mydir + 慭\?nbsp;+ ?/TT>*files*.doc?/TT>) in the previous example; use the variable os.sep instead. On a Unix platform, this is a forward slash:

>>> os.path.split('c:\\windows\\system\\gdi.exe') ('c:\\windows\\system', 'gdi.exe') >>> os.path.splitext('gdi.exe') ('gdi', '.exe') >>> os.path.splitext('c:\\windows\\system\\gdi.exe') ('c:\\windows\\system\\gdi', '.exe') >>> (root, ext) = os.path.splitext('c:\\mydata\\myfile.txt') >>> newname = root + '.out' >>> newname 'c:\\mydata\\myfile.out' >>> Names for Temporary Files The function tempfile.mktemp() returns a filename suitable for temporary use; this function is available on every platform, but it's smart enough to know where your \temp directory is on Windows: >>> import tempfile >>> tempfile.mktemp() 'C:\\WINDOWS\\TEMP\\~-304621-1' >>> When the file is closed, it's automatically deleted, assisting in the housekeeping that often goes with working with temporary files. Getting Information about Files The function os.stat (filename) returns information about files or directories without opening them. It returns a tuple of ten items. With a tuple of this size, it can be hard to recall what each element is for, so the standard Python module stat contains a number of constants and functions to assist in working with these entries. Table 17-2 lists the entries returned by os.stat() Table 17-2. os.stat() Return Values Index Constant Description 0 stat.ST_MODE Bit mask for file-mode information. The stat.S_IFDIR bit is set if path specifies a directory; the stat.S_IFREG bit is set if path specifies an ordinary file or a device. 1 stat.ST_INO Not used on Windows filesystem. 2 stat.ST_DEV Drive number of the disk containing the file. 3 stat.ST_NLINK The Visual C++ documentation is not very helpful on this one. It simply state ''Always 1 on non-NTFS filesystem.'' 4 stat.ST_UID Not used on Windows. 5 stat.ST_GID Not used on Windows. 6 stat.ST_SIZE Size of the file in bytes. This is limited to 64 bits, so for large files should use the win32file.GetFileSize() function, which returns large file sizes as a long integer. 7 stat.ST_ATIME The time the file was last accessed or zero if the filesystem doesn't support this information. 8 stat.ST_MTIME The time the file was last modified or zero if the filesystem doesn't support this information. 9 stat.ST_CTIME The time the file was created or zero if the filesystem doesn't support this information. Some of these aren't used on Windows, but contain useful information when used on other operating systems. Also, note that all dates are returned as integers compatible with the Python time module. Depending on the format of the disk holding the file, some of these time values may not be available. Let's see an example of using the stat() function: >>> os.stat('c:\\autoexec.bat') (33279, 0, 2, 1, 0, 0, 640, 916444800, 915484932, 915484930) >>> Here's a function to decode it: import os, stat, time def getfileinfo(filename) stats = os.stat(filename) size = stats[stat.ST_SIZE] print 'File size is %d bytes' % size accessed = stats[stat.ST_ATIME] modified = stats[stat.ST_MTIME] print 'Last accessed: ' + time.ctime(accessed) print 'Last modified: ' + time.ctime(modified) And the Output >>> decode_stat.getfileinfo('c:\\autoexec.bat') File size is 640 bytes Last accessed: Sat Jan 16 00:00:00 1999 Last modified: Mon Jan 04 21:22:12 1999 >>> Unfortunately, there's no portable Python module for working with file permissions. Modules exist for working with permissions on various operating systems, including Windows and Unix, but the differences between the various schemes make a simple and unified model difficult. Windows NT permissions are themselves complex and beyond the scope of this book; indeed, it would require a book of this size to cover them in detail. There is a brief example of working with permissions in Chapter 16, Windows NT Administration. Walking through a Directory Tree Often you need to move through a directory tree looking at all the subdirectories or files in turn. The Python library provides a powerful generic routine to do this: os.path.walk(). The general idea is that you specify a directory, and os.path.walk() calls a function (that you write) once for each subdirectory of the main directory. Each time your function is called, it's passed a list of all filenames in that directory. Thus, your function can examine every file in every directory under the starting point you specify. The function you write to perform the desired operation on the file is of the form myfunc ( arg, dirname, filenames). The first argument can be anything you want; we will see examples later. The second argument contains the name of the current directory being examined, starting with the directory you specify in the argument to os.path.walk(); the third is the list of filenames in the directory. Once you have written the function, call os.path.walk() with three parameters: the name of the directory in which to begin the walking, your callback function, and any third parameter you choose. This third parameter is passed unchanged in your callback function's first parameter, as described previously. This first example lists the directories examined and how many files are present in each. This makes the callback function simple: you print the dirname parameter, and the length of the filenames parameter. Then call os.path.walk(), passing a directory from the Python installation and the simple function as the callback: >>> def walker1(arg, dirname, filenames): ?nbsp; #List directories and numbers of files ?nbsp; print dirname, 'contains', len(filenames), 'files' ?BR> >>> os.path.walk('c:\\program files\\python\\win32', walker1, None) c:\program files\python\win32 contains 24 files c:\program files\python\win32\lib contains 39 files c:\program files\python\win32\Help contains 3 files c:\program files\python\win32\demos contains 19 files c:\program files\python\win32\demos\service contains 8 files c:\program files\python\win32\demos\service\install contains 3 files >>> That was easy! Note that you don't need the extra argument and so use the value None. Now let's try something a bit more practical and write a program to scan for recent changes. This is useful for archiving or for trying to figure out which new application just ate your registry. The callback function becomes slightly more complex as you loop over the list of files. The example then checks the Windows system directory for all files changed in the last 30 days: >>> import time >>> def walker2 (arg, dirname, filenames): ?nbsp; "Lists files modified in last ARG days" ?nbsp; cutoff = time.time() - (arg * 24 * 60 * 60) ?nbsp; for filename in filenames: ?nbsp; stats = os.stat(dirname + os.sep + filename ?nbsp; modified = stats[8] ?nbsp; if modified >=cutoff: ?nbsp; print dirname + os.sep + filename ?BR> >>> os.path.walk('c:\\windows\\system', walker2, 30) c:\windows\system\FFASTLOG.TXT c:\windows\system\MSISYS.VXD c:\windows\system\HwInfoD.vxd c:\windows\system\ws552689.ocx >>> So far you haven't returned anything; indeed, if walker2 returned a value, you'd have no easy way to grab it. This is another common use for the "extra argument". Let's imagine you want to total the size of all files in a directory. It's tempting to try this: def walker3(arg, dirname, filenames): "Adds up total size of all files" for filename in filenames: stats = os.stat(dirname + os.sep + filename) size = stats[6] arg = arg + size def compute_size(rootdir): "uses walker3 to compute the size" total = 0 os.path.walk(rootdir, walker3, total) return total Here, a walker function does the work, and a controlling function sets up the arguments and returns the results. This is a common pattern when dealing with recursive functions. Unfortunately this returns zero. You can't modify a simple numeric argument in this way, since arg within the function walker3() is a local variable. However, if arg was an object, you could modify its properties. One of the simplest answers is to use a list; it's passed around, and the walker function is free to modify its contents. Let's rewrite the function to generate a list of sizes: # these two work?BR> def walker4(arg, dirname, filenames): "Adds up total size of all files" for filename in filenames: stats = os.stat(dirname + os.sep + filename) size = stats[6] arg.append(size) def compute_size (rootdir) "uses walker3 to compute the size" sizes = [] os.path.walk(rootdir, walker4, sizes) # now add them up total = 0 for size in sizes: total = total + size return total When run, this code behaves as desired: >>> compute_size('c:\\program files\\python') 26386305 >>> # well, I do have a lot of extensions installed There are numerous uses for this function, and it can save a lot of lines of code. Some possibilities include: ?Archiving all files older than a certain date ?Building a list of filenames meeting certain criteria for further processing ?Synchronizing two file trees efficiently across a network, copying only the changes ?Keeping an eye on users' storage requirements We've started to see what makes Python so powerful for manipulating filesystems. It's not just the walk function: that could have been done in many languages. The key point is how walk interacts with Python's higher-level data structures, such as lists, to make these examples simple and straightforward. Working with Python File Objects Now we've had a good look at moving files around; it's time to look inside them. Python has a built-in file object, which is available on all Python platforms. Any Python program you hope to run on platforms other than Windows should use the standard file objects. Once you have a Python file object, you can use the methods to read data from the file, write data to the file, and perform various other operations. Opening a File The function open (filename, mode=搑?/TT>) returns a file object. If mode is omitted, the file is opened read-only. Mode is a string, and can be r for reading, w for writing, or a for appending. Add the letter b for binary (as discussed in Chapter 3, Python on Windows), and w+ opens it for updating. See the Python Library Reference (included in HTML format in the standard Python distribution) for further details. Table 17-3 shows the most important methods for file objects. C programmers will note the similarity to the STDIO routines; this should be no surprise, as they are implemented using the C STDIO routines of the same names. Table 17-3. Methods of File Objects Method Description close() Closes the file. flush() Flushes to disk. Windows caches disk activity; if you write a file, you can hear the lag between writing a file and the disk clicking. This ensures it's written immediately. isatty() Nonzero if the input is a terminal-type device (e.g., standard input when using Python from the console). read([size]) Reads up to [size] bytes and returns a string. Omit [size], and the whole file is read into memory. When end of file is reached, returns an empty string. readline() Returns a string up to and including the next newline character. readlines() Returns a list of strings containing all lines in the file. Each string includes the trailing newline character. seek(offset, [whence]) Jumps to the location offset in the file. whence is optional and specifies a mode: if zero, offset is an absolute position, if 1, relative to the current position, and if 2, relative to the end of the file. Table 17-3. Methods of File Objects (continued) Method Description tell() Returns the current location in the file. truncate([size]) Truncates the file at the current position or at size if it's provided. write(str) Writes the string to the file. writelines(list) Writes a list of strings to the file. It doesn't insert any newlines or other delimiters. Every language has functions, such as read and write, and many have readline. Python's ability to handle lists and strings is what really makes file processing a joy. Let's run through a few common idioms. Reading a Text File into a List Here readlines loads the entire file into a list of strings in memory: >>> f = open('c:\\config.sys', 'r') >>> lines = f.readlines() >>> f.close() >>> from pprint import pprint >>> pprint(lines[0:3]) ['DEVICEHIGH = A:\\CDROM\\CDROM.SYS /D:CD001\012', 'device=C:\\WINDOWS\\COMMAND\\display.sys con=(ega,,1)\012', 'Country=044,850,C:\\WINDOWS\\COMMAND\\country.sys\012'] >>> The pprint function (short for pretty-print) lets you display large data structures on several lines. Note also that each line still ends in a newline character (octal 012, decimal 10). Because the file is opened in text mode (by omitting the binary specification), you see a single newline character terminating each line, even if the actual file is terminated with carriage-return/linefeed pairs. You can follow this with a call to string.split() to parse each line. Here's a generic function to parse tab-delimited data: def read_tab_delimited_file(filename): "returns a list of tuples" # we can compress the file opening down to a one-liner - # the file will be closed automatically lines = open(filename).readlines() table = [] for line in lines: #chop off the final newline line = line[:-1] # split up the row on tab characters row = string.split(line, '\t') table.append(row) return table And here's what it can do: >>> data = read_tab_delimited_file('c:\\temp\\sales.txt') >>> pprint(data) [['Sales', '1996', '1997', '1998',] ['North', '100', '115', '122'], ['South', '176', '154', '180'], ['East', '130', '150', '190',]] >>> Note once again how useful pprint is! This is another of Python's key strengths: you can work at the interactive prompt, looking at your raw data, which helps you to get your code right early in the development process. Reading a Line at a Time The previous example is suitable only for files that definitely fit into memory. If they might get bigger, you should loop a line at a time. Here is the common idiom for doing this: f = open(filename, 'r') s = f.readline() while s <> ": # do something with string 's' s = f.readline() f.close() The Fileinput Module A number of people have complained about having to type readline() twice, while Perl has a one-line construction to loop over files. The standard library now includes a module called fileinput to save you this minimal amount of extra typing. The module lets you do the following: import fileinput for line in fileinput.input([filename]): process(line) If no filename is provided, the module loops over standard input, useful in script processing. Pass the filename parameter in single item list; fileinput iterates automatically over any number of files simply by including more items in this parameter. fileinput also lets you access the name of the file and the current line number and provides a mechanism to modify files in place (with a backup) in case something goes wrong. Reading Binary Data The read() command loads an entire file into memory if you don't specify a size. You often see the one liner: >>> mystring = open('c:\\temp\\sales.txt').read() >>> This code uses the fact that file objects are closed just before they get garbage-collected. You didn't assign the file object to a variable, so Python closes it and deletes the object (but not the file!) after the line executes. You can slurp an entire file into a string in one line. Python strings are eight-bit safe and are the easiest means to manipulate binary data. In addition to this, the struct module lets you create C-compatible structures and convert them to and from strings; and the array module efficiently handles arrays of data, which it can convert to and from strings and files. More information on working with files and the other various Python modules we discussed here can be found in either of these fine O'Reilly Python books we've mentioned before: Programming Python and Learning Python. Native File Manipulation: The Win32file Module There are times when the standard Python file objects can't meet your requirements, and you need to use the Windows API to manipulate files. This can happen in a number of situations, such as: ?You need to read or write data to or from a Windows pipe. ?You need to set custom Windows security on a file you are creating. ?You need to perform advanced techniques for performance reasons, such as "Overlapped" operations or using completion ports. Python file objects are integrated closely with Python. You should use the win32file module only when standard Python file objects can't meet your requirements. Using the win32file module is a good deal more complex than using native Python files. Opening and Creating Files The win32file.CreateFile() function opens or creates standard files, returning a handle to the file. Standard files come in many flavors, including synchronous files (where read or write operations don't return until the operation has completed); asynchronous (or overlapped I/O) files, where read and write operations return immediately; and temporary files that are automatically deleted when the handle is closed. Files may also be opened requesting that Windows not cache any file operations, that no buffering is performed, etc. All the variations that CreateFile() can use are too numerous to list here. For full details, please see the Windows API documentation for CreateFile(). The CreateFile() function takes the following parameters: ?Name of the file ?Integer indicating the type of access requested on the file ?Integer-sharing options for the file ?Security attributes for the new file or None ?A flag, indicating what action to take depending on if the file exists ?A set of flags and attributes for the file itself ?Another file to act as a template or None This function returns a PyHANDLE object. PyHANDLEs are simply objects that wrap standard Win32 HANDLEs. When a PyHANDLE object goes out of scope, it's automatically closed; thus, it's generally not necessary to close these HANDLEs as it is necessary when using these from C or C++. Let's see how these parameters interact and test out some of the documented semantics. Here's a small script that uses the win32file module to work with Win32 file handles. The code creates a file, then checks that other attempts to open the file either succeed or fail, based on the flags passed to CreateFile(). You will also find that auto-delete files behave as expected; i.e., after the last handle is closed, the file no longer exists on disk: # CheckFileSemantics.py # Demonstrate the semantics of CreateFile. # To keep the source code small, # we import all win32file objects. from win32file import &asteric import win32api import os # First, lets create a normal file h1 = CreateFile( \ "\\filel.tst", # The file name \ GENERIC_WRITE, # we want write access. \ FILE_SHARE-READ, # others can open for read \ None, # No special security requirements \ CREATE_ALWAYS, # File to be created. \ FILE_ATTRIBUTE_NORMAL, # Normal attributes \ None ) # No template file. # now we will print the handle, # just to prove we have one! print "The first handle is", h1 # Now attempt to open the file again, # this time for read access h2 = CreateFile( \ "\\filel.tst", # The same file name. \ GENERIC_READ, # read access \ FILE_SHARE_WRITE | FILE_SHARE_READ, \ None, # No special security requirements \ OPEN_EXISTING, # expect the file to exist. \ 0, # Not creating, so attributes dont matter. \ None ) # No template file # Prove we have another handle print "The second handle is", h2 # Now attempt yet again, but for write access. # We expect this to fail. try: h3 = CreateFile( \ "\\filel.tst", # The same file name. \ GENERIC_WRITE, # write access \ 0, # No special sharing \ None, # No special security requirements \ CREATE_ALWAYS, # attempting to recreate it! \ 0, # Not creating file, so no attributes \ None ) # No template file except win32api.error, (code, function, message): print "The file could not be opened for write mode." print "Error", code, "with message", message # Close the handles. h1.Close() h2.Close() # Now lets check out the FILE_FLAG_DELETE_ON_CLOSE fileAttributes = FILE_ATTRIBUTE_NORMAL | \ FILE_FLAG_DELETE_ON_CLOSE h1 = CreateFile( \ "\\file1.tst", # The file name \ GENERIC_WRITE, # we want write access. \ FILE_SHARE_READ, # others can open for read \ None, # no special security requirements \ CREATE_ALWAYS, # file to be created. \ fileAttributes, \ None ) # No template file. # Do a stat of the file to ensure it exists. print "File stats are", os.stat("\\file1.tst") # Close the handle h1.Close() try: os.stat("\\file1.tst") except os.error: print "Could not stat the file - file does not exist" When you run this script, you see the following output: The first handle is <PyHANDLE at 8344464 (80)> The second handle is <PyHANDLE at 8344400 (112)> The file could not be opened for write mode. Error 32 with message The process cannot access the file because it is being used by another process. File stats are (33206, 0, 11, 1, 0, 0, 0, 916111892, 916111892, 916111892) Could not stat the file - file does not exist Thus, the semantics are what you'd expect: ?A file opened to allow reading can be opened this way. ?A file opened to disallow writing can't be opened this way. ?A file opened for automatic delete is indeed deleted when the handle is closed. Reading and Writing Files The win32file module has functions for reading and writing files. Not surprisingly, win32file.ReadFile() reads files, and win32file.WriteFile() writes files. win32file.ReadFile() takes the following parameters: ?The file handle to read from ?The size of the data to read (see the reference for further details) ?Optionally, an OVERLAPPED or None win32file.ReadFile() returns two pieces of information in a Python tuple: the error code for ReadFile and the data itself. The error code is either zero or the value winerror.ERROR_IO_PENDING if overlapped I/O is being performed. All other error codes are trapped and raises a Python exception. win32file.WriteFile() takes the following parameters: ?A file handle opened to allow reading ?The data to write ?Optionally, an OVERLAPPED or None win32file.WriteFile() returns the error code from the operation. This is either zero or win32error.ERROR_IO_PENDING if overlapped I/O is used. All other error codes are converted to a Python exception. Overlapped I/O Windows provides a number of techniques for high-performance file I/O. The most common is overlapped I/O. Using overlapped I/O, the win32file. ReadFile() and win32file.WriteFile() operations are asynchronous and return before the actual I/O operation has completed. When the I/O operation finally completes, a Windows event is signaled. Overlapped I/O does have some requirements normal I/O operations don't: ?The operating system doesn't automatically advance the file pointer. When not using overlapped I/O, a ReadFile or WriteFile operation automatically advances the file pointer, so the next operation automatically reads the subsequent data in the file. When using overlapped I/O, you must manage the location in the file manually. ?The standard technique of returning a Python string object from win32file. ReadFile() doesn't work. Because the I/O operation has not completed when the call returns, a Python string can't be used. As you can imagine, the code for performing overlapped I/O is more complex than when performing synchronous I/O. Chapter 18, Windows NT Services, contains some sample code that uses basic overlapped I/O on a Windows-named pipe. Pipes Pipes are a concept available in most modern operating systems. Typically, these are a block of shared memory set up much like a file. Typically, one process writes information to a pipe, and another process reads it. They are often used as a form of interprocess communication or as a simple queue implementation. Windows has two flavors of pipes: anonymous pipes and named pipes. Python supports both via the win32pipe module. Anonymous Pipes Anonymous pipes are simple and lightweight pipes, designed to use between the process that creates it and its child processes. Since they are unnamed, the only way to use anonymous pipes is to communicate its handle; there's no name for the pipe that processes use to obtain access to the pipe. This typically makes anonymous pipes unsuitable for interprocess communication between unrelated processes (for example, between a client and a server process). Anonymous pipes are simple to create and use. The function win32pipe. CreatePipe() creates an anonymous pipe and returns two handles: one for read- ing from the pipe, and one for writing to the pipe. The win32pipe.CreatePipe() function takes the following parameters: ?The security attributes for the pipe or None for the default. ?The buffer size or zero for the default. It then returns a tuple of (readHandle, writeHandle). A demonstration of anonymous pipes is quite simple. Let's create an anonymous pipe (obtaining the two handles), then write some data to the pipe, and read the same data back: >>> import win32pipe >>> # Create the pipe >>> readHandle, writeHandle = win32pipe.CreatePipe(None, 0) >>> import win32file # This module contains the ReadFile/WriteFile functions. >>> # Write a string to the pipe >>> win32file.WriteFile(writeHandle, "Hi from the pipe") (0, 16) >>> # Now read data from it >>> win32file.ReadFile(readHandle, 16) (0, 'Hi from the pipe') >>> Named Pipes Named pipes are similar to anonymous pipes, except they have a unique name. Typically, a server process creates a named pipe with a known name, and other client processes connect to this pipe simply by specifying the name. The key benefit of named pipes is that unrelated processes can use them, even from over the network. All a process needs is the name of the pipe, possibly the name of the host server, and sufficient security to open it. This makes named pipes suitable for simple communication between a server and many clients. Named pipes can be created only by Windows NT. Windows 95/98 can create a client connection to an existing named pipe, but can't create a new named pipe. Creating and using named pipes is a complex subject and beyond the scope of this book. However, an example using named pipes can be found in Chapter 18. The win32pipe module supports all pipe operations supported by Windows. For further information on named pipes, please see the Windows SDK documentation or one of the pipe samples that comes with the Python for Windows Extensions. Processes Every program running under Windows runs in the context of a process. A process is an executing application and has a single virtual address space, a list of valid handles, and other Windows resources. A process consists of at least one thread, but may contain a large number of threads. Python has the ability to manage processes from a fairly high level, right down to the low level defined by the Win32 API. This section discusses some of these capabilities. Portable Process Control: The os Module Python itself defines a few process-manipulation functions that are portable across all platforms, including Windows. As they are portable to Unix and other operating systems, they operate at a high level and don't cover the range of functionality provided natively. The Python os module provides a number of techniques for starting new processes. os.system os.system provides the most rudimentary support for new processes. It takes a single argument (the command line of the process to execute) and returns an integer "error code." For example: >>> import os >>> os.system("notepad.exe C:\\autoexec.bat") 0 >>> starts an instance of notepad, editing your autoexec.bat file. The exit code from the program is zero. Unfortunately, the result of zero is often misleading; the Windows command processor responsible for executing these commands usually refuses to pass the actual error code on, always reporting a success code of zero. The single parameter can be anything that typically works from a Windows command prompt. Thus, the system path is searched for the program. There are, however, a number of other limitations to this approach. First, if you execute this code from PythonWin (or any other GUI Python environment) you will notice that an empty command prompt opens. Windows knows you are running from a GUI, but isn't smart enough to look at the program to execute to determine if it too is a GUI program; so it creates a new console for the program. This works well when executing command-line tools, but not so well for GUI programs such as notepad. Second, notice that Python waits until the new process has terminated before returning. Depending on your requirements, this may or may not be appropriate. os.execv os.execv provides an interesting (although often useless) way to create new processes. The program you specify effectively replaces the calling process. Technically, the process to be created is a new process (i.e., it has a different process ID), so the new process doesn't replace the old process; the old process simply terminates immediately after the call to os.execv. In effect, the new process executed appears to overwrite the current process, almost as if the old process becomes the new process; therefore, it's rarely used. os.execv takes two arguments: a string containing the program to execute, and a tuple containing the program arguments. For example, if you execute the following code: >>> import os >>> os.execv("c:\\Winnt\\notepad.exe", ("c:\\autoexec.bat",) ) Notice that your existing Python or PythonWin implementation immediately terminates (no chance to save anything!) and is replaced by an instance of notepad. Also notice that os.execv doesn't search your system path. Therefore, you need to specify the full path to notepad. You will probably need to change the example to reflect your Windows installation. Another function, os.execve, is similar but allows a custom environment for the new process to be defined. os.popen os.popen is also supposed to be a portable technique for creating new processes and capturing their output. os.popen takes three parameters: the command to execute, the default mode for the pipe, and the buffer size. Only the first parameter is required; the others have reasonable defaults (see the Python Library Reference for details). The following code shows that the function returns a Python file object, which can be read to receive the data: >>> import os >>> file = os.popen("echo Hello from Python") >>> file.read() 'Hello from Python\012' >>> If you try this code from Python.exe, you will notice it works as expected. However, if you attempt to execute this code from a GUI environment, such as PythonWin, you receive this error: >>> os.popen("echo Hello from Python") Traceback (innermost last): File "<interactive input>", line 0, in ? error: (0, 'No error') >>> Unfortunately, a bug in the Windows popen function prevents this working from a GUI environment. Attempting to come to the rescue is the win32pipe module, which provides a replacement popen that works in a GUI environment under Windows NT; see the following code: >>> import win32pipe >>> file=win32pipe.popen("echo Hello from Python") >>> file.read() 'Hello from Python\012' >>> Better Process Control: The win32api Module The module win32api provides some additional techniques for manipulating processes. These allow you to perform many of the common requirements for starting new processes, but still don't provide the ultimate in low-level control. win32api.WinExec The WinExec function behaves similarly to the os.system function, as described previously, but it provides some concessions for GUI programs; namely, no console is created, and the function doesn't wait until the new process has completed. The function takes two parameters: ?The command to execute ?Optionally, the initial state for the application's window For example, to execute notepad, using the default window state, you can execute the following code: >>> import win32api >>> win32api.WinExec("notepad") >>> notepad should appear in a normal window, and Python continues executing commands before you close notepad. To show notepad maximized: >>> import win32api, win32con >>> win32api.WinExec("notepad", win32con.SW_SHOWMAXIMIZED) >>> win32api.ShellExecute The win32api module also provides another useful function for creating new processes. The ShellExecute function is primarily used to open documents, rather than start arbitrary processes. For example, you can tell ShellExecute to "open MyDocument.doc.'' Windows itself determines which process to use to open .doc files and start it on your behalf. This is the same function Windows Explorer uses when you click (or double-click) on a .doc file: it calls ShellExecute, and the correct program is started. The ShellExecute function takes these parameters: ?The handle to the parent window or zero for no parent. ?The operation to perform on the file. ?The name of the file or program to execute. ?Optional parameters for the new process. ?The initial directory for the application. ?A flag indicating if the application should be shown. Let's try this function. Start Python or PythonWin from a directory with a .doc file in it, then execute the following commands: >>> import win32api >>> win32api.ShellExecute(0, "open", \ ?nbsp; "MyDocument.doc", None, "", 1) 33 >>> Assuming Microsoft Word is installed, this code opens the document MyDocument.doc. If you instead wish to print this document, execute this: >>> import win32api >>> win32api.ShellExecute(0, "print", \ ?nbsp; "MyDocument.doc", None, "", 1) 33 >>> Microsoft Word then opens and prints the document. Ultimate Process Control: The win32process Module The win32process module provides the ultimate in process level control; it exposes most of the native Windows API for starting, stopping, controlling, and waiting for processes. But before we delve into the win32process module, some definitions are in order. Handles and IDs Every thread and process in the system can be identified by a Windows handle, and by an integer ID. A process or thread ID is a unique number allocated for the process or thread and is valid across the entire system. An ID is invariant while the thread or process is running and serves no purpose other than to uniquely identify the thread or process. IDs are reused, so while two threads or processes will never share the same ID while running, the same ID may be reused by the system once it has terminated. Further, IDs are not secure. Any user can obtain the ID for a thread or process. This is not a security problem, as the ID is not sufficient to control the thread or process. A handle provides additional control capabilities for the thread or handle. Using a handle, you can wait for a process to terminate, force the termination of a process, or change the characteristics of running process. While a process can have only a single ID, there may be many handles to it. The handle to a process determines the rights a user has to perform operations on the process or thread. Given a process ID, the function win32api.OpenProcess() can obtain a handle. The ability to use this handle is determined by the security settings for both the current user and the process itself. Creating Processes The win32process module contains two functions for creating new processes: CreateProcess() and CreateProcessAsUser(). These functions are identical, except CreateProcessAsUser() accepts an additional parameter indicating the user under which the process should be created. CreateProcess() accepts a large number of arguments that allow very fine control over the new process: ?The program to execute ?Optional command-line parameters ?Security attributes for the new process or None ?Security attributes for the main thread of the new process or None ?A flag indicating if handles are inherited by the new process ?Flags indicating how the new process is to be created ?A new environment for the new process or None for the current environment ?The current directory for the new process ?Information indicating how the new window is to be positioned and shown And returns a tuple with four elements: ?A handle to the new process ?A handle to the main thread of the new process ?An integer identifying the new process ?An integer identifying the main thread of the new process Terminating Processes To terminate a process, the win32process.TerminateProcess() function is used. This function takes two parameters: ?A handle to the process to be terminated ?The exit code to associate with the process If you initially created the new process, it's quite easy to get the handle to the process; you simply remember the result of the win32process.CreateProcess() call. But what happens if you didn't create the process? If you know the process ID, you can use the function win32api.OpenProcess() to obtain a handle. But how do you find the process ID? There's no easy answer to that question. The file killProcName.py that comes with the Python for Windows Extensions shows one method of obtaining the process ID given the process name. It also shows how to use the win32api.OpenProcess() function to obtain a process handle suitable to terminate: Controlling Processes Once a process is running, there are two process properties that can be set: the priority and the affinity mask. The priority of the process determines how the operating system schedules the threads in the process. The win32process. SetPriorityClass() function can set the priority. A process's affinity mask defines which processor the process runs on, which obviously makes this useful only in a multiprocessor system. The win32process. SetProcessAffinityMask() function allows you to define this behavior. Putting It All Together This section presents a simple example that demonstrates how to use the CreateProcess API and process handles. In the interests of allowing the salient points to come through, this example won't really do anything too useful; instead, it's restricted to the following functionality: ?Creates two instances of notepad with its window position carefully laid out. ?Waits 10 seconds for these instances to terminate. ?If the instances haven't terminated in that time, kills them. This functionality demonstrates the win32process.CreateProcess() function, how to use win32process.STARTUPINFO() objects, and how to wait on process handles using the win32event.WaitForMultipleObjects() function. Note that instead of waiting 10 seconds in one block, you actually wait for one second 10 times. This is so you can print a message out once per second, so it's obvious the program is working correctly: # CreateProcess.py # # Demo of creating two processes using the CreateProcess API, # then waiting for the processes to terminate. import win32process import win32event import win32con import win32api # Create a process specified by commandLine, and # The process' window should be at position rect # Returns the handle to the new process. def CreateMyProcess( commandLine, rect): # Create a STARTUPINFO object si = win32process.STARTUPINFO() # Set the position in the startup info. si.dwX, si.dwY, si.dwXSize, si.dwYSize = rect # And indicate which of the items are valid. si.dwFlags = win32process.STARTF_USEPOSITION | \ win32process.STARTF_USESIZE # Rest of startup info is default, so we leave alone. # Create the process. info = win32process.CreateProcess( None, # AppName commandLine, # Command line None, # Process Security None, # ThreadSecurity 0, # Inherit Handles? win32process.NORMAL_PRIORITY_CLASS, None, # New environment None, # Current directory si) # startup info. # Return the handle to the process. # Recall info is a tuple of (hProcess, hThread, processId, threadId) return info[0] def RunEm(): handles = [] # First get the screen size to calculate layout. screenX = win32api.GetSystemMetrics(win32con.SM_CXSCREEN) screenY = win32api.GetSystemMetrics(win32con.SM_CYSCREEN) # First instance will be on the left hand side of the screen. rect = 0, 0, screenX/2, screenY handle = CreateMyProcess("notepad", rect) handles.append(handle) # Second instance of notepad will be on the right hand side. rect = screenX/2+1, 0, screenX/2, screenY handle = CreateMyProcess("notepad", rect) handles.append(handle) # Now we have the processes, wait for them both # to terminate. # Rather than waiting the whole time, we loop 10 times, # waiting for one second each time, printing a message # each time around the loop countdown = range(1,10) countdown.reverse() for i in countdown: print "Waiting %d seconds for apps to close" % i rc = win32event.WaitForMultipleObjects( handles, # Objects to wait for. 1, # Wait for them all 1000) # timeout in milli-seconds. if rc == win32event.WAIT_OBJECT_0: # Our processes closed! print "Our processes closed in time." break # else just continue around the loop. else: # We didn't break out of the for loop! print "Giving up waiting - killing processes" for handle in handles: try: win32process.TerminateProcess(handle, 0) except win32process.error: # This one may have already stopped. pass if __name__=='__main__': RunEm() You should run this example from a command prompt rather than from PythonWin. Under PythonWin, the script works correctly, but due to the complications of running in a GUI environment, PythonWin appears to hand until either 10 seconds expires or the applications close. Although PythonWin is printing the messages once per second, they can't be seen until the script closes. You run this example from a command prompt as you would any script. Running the script creates two instances of notepad taking up the entire screen. If you switch back to the command prompt, notice the following messages: C:\Scripts>python CreateProcess.py Waiting 9 seconds for apps to close ?BR> Waiting 2 seconds for apps to close Waiting 1 seconds for apps to close Giving up waiting - killing processes C:\Scripts> If instead of switching back to the command prompt, you simply close the new instances of notepad, you'll see the following: C:\Scripts>python CreateProcess.py Waiting 9 seconds for apps to close Waiting 8 seconds for apps to close Our processes closed in time. C:\Scripts> Conclusion In this chapter, we have looked that the various techniques we can use in Python for working with files and processes. We discussed how Python's standard library has a number of modules for working with files and processes in a portable way, and a few of the problems you may encounter when using these modules. We also discussed the native Windows API for dealing with these objects and the Python interface to this API. We saw how Python can be used to work with and exploit the Windows specific features of files and processes. Back

17— Processes and Files

Portable File Manipulation

Working with Files on Disk

Working with Paths and Filenames

Names for Temporary Files

Getting Information about Files

Walking through a Directory Tree

Working with Python File Objects

Opening a File

Reading a Text File into a List

Reading a Line at a Time

The Fileinput Module

Reading Binary Data

Native File Manipulation: The Win32file Module

Opening and Creating Files

Reading and Writing Files

Overlapped I/O

Pipes

Anonymous Pipes

Named Pipes

Processes

Portable Process Control: The os Module

os.system

os.execv

os.popen

Better Process Control: The win32api Module

win32api.WinExec

win32api.ShellExecute

Ultimate Process Control: The win32process Module

Handles and IDs

Creating Processes

Terminating Processes

Controlling Processes

Putting It All Together

Conclusion

17—
Processes and Files