I l@ve RuBoard

5.6 Odds and Ends

In this section, we introduce a few module-related ideas that seem important enough to stand on their own (or obscure enough to defy our organizational skills).

5.6.1 Module Compilation Model

As currently implemented, the Python system is often called an interpreter, but it's really somewhere between a classic interpreter and compiler. As in Java, Python programs are compiled to an intermediate form called bytecode, which is then executed on something called a virtual machine. Since the Python virtual machine interprets the bytecode form, we can get away with saying that Python is interpreted, but it still goes through a compile phase first.

Luckily, the compile step is completely automated and hidden in Python. Python programmers simply import modules and use the names they define; Python takes care to automatically compile modules to bytecode when they are first imported. Moreover, Python tries to save a module's bytecode in a file, so it can avoid recompiling in the future if the source code hasn't been changed. In effect, Python comes with an automatic make system to manage recompiles.^[5]

^[5] For readers who have never used C or C++, a make system is a way to automate compiling and linking programs. make systems typically use file modification dates to know when a file must be recompiled (just like Python).

Here's how this works. You may have noticed .pyc files in your module directories after running programs; these are the files Python generates to save a module's bytecode (provided you have write access to source directories). When a module M is imported, Python loads a M.pyc bytecode file instead of the corresponding M.py source file, as long as the M.py file hasn't been changed since the M.pyc bytecode was saved. If you change the source code file (or delete the .pyc), Python is smart enough to recompile the module when imported; if not, the saved bytecode files make your program start quicker by avoiding recompiles at runtime.

Why You Will Care: Shipping Options

Incidentally, compiled .pyc bytecode files also happen to be one way to ship a system without source code. Python happily loads a .pyc file if it can't find a .py source file for a module on its module search path, so all you really need to ship to customers are the .pyc files. Moreover, since Python bytecode is portable, you can usually run a .pyc file on multiple platforms. To force pre-compilation into .pyc files, simply import your modules (also see the compileall utility module).

It's also possible to "freeze" Python programs into a C executable; the standard freeze tool packages your program's compiled byte code, any Python utilities it uses, and as much of the C code of the Python interpreter as needed to run your program. It produces a C program, which you compile with a generated makefile to produce a standalone executable program. The executable works the same as the Python files of your program. Frozen executables don't require a Python interpreter to be installed on the target machine and may start up faster; on the other hand, since the bulk of the interpreter is included, they aren't small. A similar tool, squeeze, packages Python bytecode in a Python program; search Python's web site for details.

5.6.2 Data Hiding Is a Convention

As we've seen, Python modules export all names assigned at the top level of their file. There is no notion of declaring which names should and shouldn't be visible outside the module. In fact, there's no way to prevent a client from changing names inside a module if they want to.

In Python, data hiding in modules is a convention, not a syntactical constraint. If you want to break a module by trashing its names, you can (though we have yet to meet a programmer who would want to). Some purists object to this liberal attitude towards data hiding and claim that it means Python can't implement encapsulation. We disagree (and doubt we could convince purists of anything in any event). Encapsulation in Python is more about packaging, than restricting.^[6]

^[6] Purists would probably also be horrified by the rogue C++ programmer who types #define private public to break C++'s hiding mechanism in a single blow. But then those are rogue programmers for you.

As a special case, prefixing names with an underscore (e.g., _X) prevents them from being copied out when a client imports with a from* statement. This really is intended only to minimize namespace pollution;since from* copies out all names, you may get more than you bargained for (including names which overwrite names in the importer). But underscores aren't "private" declarations: you can still see and change such names with other import forms.

5.6.3 Mixed Modes: name and main

Here's a special module-related trick that lets you both import a module from clients and run it as a standalone program. Each module has a built-in attribute called __name__, which Python sets as follows:

If the file is being run as a program, __ name _ _ is set to the string __ main __ when it starts
If the file is being imported, __name__ is set to the module's name as known by its clients

The upshot is that a module can test its own _ _name__ to determine whether it's being run or imported. For example, suppose we create the module file below, to export a single function called tester:

def tester():
    print "It's Christmas in Heaven..."

if __name__ == '__main__':         # only when run
    tester()                       # not when imported

This module defines a function for clients to import and use as usual:

% python
>>> import runme
>>> runme.tester()
It's Christmas in Heaven...

But the module also includes code at the bottom that is set up to call the function when this file is run as a program:

% python runme.py
It's Christmas in Heaven...

Perhaps the most common place you'll see the _ _main__ test applied is for self-test code: you can package code that tests a module's exports in the module itself by wrapping it in a __ main__ test at the bottom. This way, you can use the file in clients and test its logic by running it from the system shell.

5.6.4 Changing the Module Search Path

We've mentioned that the module search path is a list of directories in environment variable PYTHONPATH. What we haven't told you is that a Python program can actually change the search path, by assigning to a built-in list called sys.path (the path attribute in the built-in sys module). sys.path is initialized from PYTHONPATH (plus compiled-in defaults) on startup, but thereafter, you can delete, append, and reset its components however you like:

>>> import sys
>>> sys.path
['.', 'c:\\python\\lib', 'c:\\python\\lib\\tkinter']

>>> sys.path = ['.']                            # change module search path
>>> sys.path.append('c:\\book\\examples')       # escape backlashes as "\\"
>>> sys.path
['.', 'c:\\book\\examples']

>>> import string
Traceback (innermost last):
  File "<stdin>", line 1, in ?
ImportError: No module named string

You can use this to dynamically configure a search path inside a Python program. Be careful, though; if you delete a critical directory from the path, you may lose access to critical utilities. In the last command above, for example, we no longer have access to the string module, since we deleted the Python source library's directory from the path.

5.6.5 Module Packages (New in 1.5)

Packages are an advanced tool, and we debated whether to cover them in this book. But since you may run across them in other people's code, here's a quick overview of their machinery.

In short, Python packages allow you to import modules using directory paths; qualified names in import statements reflect the directory structure on your machine. For instance, if some module C lives in a directory B, which is in turn a subdirectory of directory A, you can say import A.B.C to load the module. Only directory A needs to be found in a directory listed in the PYTHONPATH variable, since the path from A to C is given by qualification.

Packages come in handy when integrating systems written by independent developers; by storing each system's set of modules in its own subdirectory, we can reduce the risk of name clashes. For instance, if each developer writes a module called spam.py, there's no telling which will be found on PYTHONPATH first if package qualifier paths aren't used. If another subsystem's directory appears on PYTHONPATH first, a subsystem may see the wrong one.

Again, if you're new to Python, make sure that you've mastered simple modules before stepping up to packages. Packages are more complex than we've described here; for instance, each directory used as a package must include a __ init__.py module to identify itself as such. See Python's reference manuals for the whole story.

Why You Will Care: Module Packages

Now that packages are a standard part of Python, you're likely to start seeing third-party extensions shipped as a set of package directories, rather than a flat list of modules. The PythonWin port of Python for MS-Windows was one of the first to jump on the package bandwagon. Many of its utility modules reside in packages, which you import with qualification paths; for instance, to load client-side COM tools, we say:

from win32com.client import constants, Dispatch

which fetches names from the client module of the PythonWin win32com package (an install directory). We'll see more about COM in Chapter 10.

5.6.6 Module Design Concepts

Like functions, modules present design tradeoffs: deciding which functions go in which module, module communication mechanisms, and so on. Here too, it's a bigger topic than this book allows, so we'll just touch on a few general ideas that will become clearer when you start writing bigger Python systems:

You're always in a module in Python: There's no way to write code that doesn't live in some module. In fact, code typed at the interactive prompt really goes in a built-in module called __main __.
Minimize module coupling: global variables: Like functions, modules work best if they're written to be closed boxes. As a rule of thumb, they should be as independent of global names in other modules as possible.
Maximize module cohesion: unified purpose: You can minimize a module's couplings by maximizing its cohesion; if all the components of a module share its general purpose, you're less likely to depend on external names.
Modules should rarely change other modules' variables: It's perfectly okay to use globals defined in another module (that's how clients import services, after all), but changing globals in another module is usually a symptom of a design problem. There are exceptions of course, but you should try to communicate results through devices such as function return values, not cross-module changes.

5.6.7 Modules Are Objects: Metaprograms

Finally, because modules expose most of their interesting properties as built-in attributes, it's easy to write programs that manage other programs. We usually call such manager programs metaprograms , because they work on top of other systems. This is also referred to as introspection, because programs can see and process object internals.

For instance, to get to an attribute called name in a module called M, we can either use qualification, or index the module's attribute dictionary exposed in the built-in _ _dict__ attribute. Further, Python also exports the list of all loaded modules as the sys.modules dictionary (that is, the modules attribute of the sys module), and provides a built-in called getattr that lets us fetch attributes from their string names. Because of that, all the following expressions reach the same attribute and object:

M.name                          # qualify object
M.__dict__['name']             # index namespace dictionary manually
sys.modules['M'].name           # index loaded-modules table manually
getattr(M, 'name')              # call built-in fetch function

By exposing module internals like this, Python helps you build programs about programs.^[7] For example, here is a module that puts these ideas to work, to implement a customized version of the built-in dir function. It defines and exports a function called listing, which takes a module object as an argument and prints a formatted listing of the module's namespace:

^[7] Notice that because a function can access its enclosing module by going through the sys.modules table like this, it's possible to emulate the effect of the global statement we met in Chapter 4. For instance, the effect of global X; X=0 can be simulated by saying, inside a function: import sys; glob=sys.modules[_ _name__ ]; glob.X=0 (albeit with much more typing). Remember, each module gets a __ name__ attribute for free; it's visible as a global name inside functions within a module. This trick provides a way to change both local and global variables of the same name, inside a function.

# a module that lists the namespaces of other modules

verbose = 1

def listing(module):
    if verbose:
        print "-"*30
        print "name:", module.__ _name__, "file:", module.__file__
        print "-"*30

    count = 0
    for attr in module.__dict__.keys():      # scan namespace
        print "%02d) %s" % (count, attr),
        if attr[0:2] == "__":
            print "<built-in name>"          # skip __file__, etc.
        else:
            print getattr(module, attr)      # same as .__dict__[attr]
        count = count+1

    if verbose:
        print "-"*30
        print module.__name__, "has %d names" % count
        print "-"*30

if __name__ == "__main__":
    import mydir
    listing(mydir)      # self-test code: list myself

We've also provided self-test logic at the bottom of this module, which narcissistically imports and lists itself. Here's the sort of output produced:

C:\python> python mydir.py
------------------------------
name: mydir file: mydir.py
------------------------------
00) __file__ <built-in name>
01) __name__ <built-in name>
02) listing <function listing at 885450>
03) __doc__ <built-in name>
04) __builtins__ <built-in name>
05) verbose 1
------------------------------
mydir has 6 names
------------------------------

We'll meet getattr and its relatives again. The point to notice here is that mydir is a program that lets you browse other programs. Because Python exposes its internals, you can process objects generically.^[8]

^[8] By the way, tools such as mydir.listing can be preloaded into the interactive namespace, by importing them in the file referenced by the PYTHONSTARTUP environment variable. Since code in the startup file runs in the interactive namespace (module _ _main__), imports of common tools in the startup file can save you some typing. See Chapter 1 for more details.