I l@ve RuBoard

9.3 Manipulating Programs

9.3.1 Calling Other Programs

Python can be used like a shell scripting language, to steer other tools by calling them with arguments the Python program determines at runtime. So, if you have to run a specific program (call it analyzeData) with various data files and various parameters specified on the command line, you can use the os.system() call, which takes a string specifying a command to run in a subshell. Specifically:

for datafname in ['data.001', 'data.002', 'data.003']:
  for parameter1 in range(1, 10):
    os.system("analyzeData -in %(datafname)s -param1 %(paramter1)d" % vars())

If analyzeData is a Python program, you're better off doing it without invoking a subshell; simply use the import statement up front and a function call in the loop. Not every useful program out there is a Python program, though.

In the preceding example, the output of analyzeData is most likely either a file or standard out. If it's standard out, it would be nice to be able to capture its output. The popen() function call is an almost standard way to do this. We'll show it off in a real-world task.

When we were writing this book, we were asked to avoid using tabs in source-code listings and use spaces instead. Tabs can wreak havoc with typesetting, and since indentation matters in Python, incorrect typesetting has the potential to break examples. But since old habits die hard (at least one of us uses tabs to indent his own Python code), we wanted a tool to find any tabs that may have crept into our code before it was shipped off for publication. The following script, findtabs.py, does the trick:

#!/usr/bin/env python
# find files, search for tabs

import string, os
cmd = 'find . -name "*.py" -print'         # find is a standard Unix tool

for file in os.popen(cmd).readlines():     # run find command
    num  = 1
    name = file[:-1]                       # strip '\n'
    for line in open(name).readlines():    # scan the file
        pos = string.find(line, "\t")
        if  pos >= 0:
            print name, num, pos           # report tab found
            print '....', line[:-1]        # [:-1] strips final \n
            print '....', ' '*pos + '*', '\n'
        num = num+1

This script uses two nested for loops. The outer loop uses os.popen to run a find shell command, which returns a list of all the Python source filenames accessible in the current directory and its subdirectories. The inner loop reads each line in the current file, using string.find to look for tabs. But the real magic in this script is in the built-in tools it employs:

os.popen: Takes a shell command passed in as a string (called cmd in the example) and returns a file-like object connected to the command's standard input or output streams. Output is the default if you don't pass an explicit "r" or "w" mode argument. By reading the file-like object, you can intercept the command's output as we did here—the result of the find. It turns out that there's a module in the standard library called find.py that provides a function that does a very similar thing to our use of popen with the find Unix command. As an exercise, you could rewrite findtabs.py to use it instead.
string.find: Returns the index of the first occurrence of one string in another, searching from left to right. In the script, we use it to look for a tab, passed in as an (escaped) one-character string ('\t').

When a tab is found, the script prints the matching line, along with a pointer to where the tab occurs. Notice the use of string repetition: the expression ' '*pos moves the print cursor to the right, up to the index of the first tab. Use double quotes inside a single-quoted string without backslash escapes in cmd. Here is the script at work, catching illegal tabs in the unfortunately named file happyfingers.py :

C:\python\book-examples> python findtabs.py
./happyfingers.py 2 0
....   for i in range(10):
.... *

./happyfingers.py 3 0
....           print "oops..."
.... *

./happyfingers.py 5 5
.... print      "bad style"
....      *

A note on portability: the find shell command used in the findtabs script is a Unix command, which may or may not be available on other platforms (it ran under Windows in the listing above because a find utility program was installed). os.popen functionality is available as win32pipe.popen in the win32 extensions to Python for Windows.^[5] If you want to write code that catches shell command output portably, use something like the following code early in your script:

^[5] Two important compatibility comments: the win32pipe module also has a popen2 call, which is like the popen2 call on Unix, except that it returns the read and write pipes in swapped order (see the documentation for popen2 in the posix module for details on its interface). There is no equivalent of popen on Macs, since pipes don't exist on that operating system.

import sys
if sys.platform == "win32":                # on a Windows port
    try:
        import win32pipe
        popen = win32pipe.popen
    except ImportError:
        raise ImportError, "The win32pipe module could not be found"
else:                                      # else on POSIX box
    import os
    popen = os.popen
...And use popen in blissful platform ignorance

The sys.platform attribute is always preset to a string that identifies the underlying platform (and hence the Python port you're using). Although the Python language isn't platform-dependent, some of its libraries may be; checking sys.platform is the standard way to handle cases where they are. Notice the nested import statements here; as we've seen, import is just an executable statement that assigns a variable name.

I l@ve RuBoard