I l@ve RuBoard

5.7 Module Gotchas

Finally, here is the usual collection of boundary cases, which make life interesting for beginners. Some are so obscure it was hard to come up with examples, but most illustrate something important about Python.

5.7.1 Importing Modules by Name String

As we've seen, the module name in an import or from statement is a hardcoded variable name; you can't use these statements directly to load a module given its name as a Python string. For instance:

>>> import "string"
  File "<stdin>", line 1
    import "string"
                  ^
SyntaxError: invalid syntax

5.7.1.1 Solution

You need to use special tools to load modules dynamically, from a string that exists at runtime. The most general approach is to construct an import statement as a string of Python code and pass it to the exec statement to run:

>>> modname = "string"
>>> exec "import " + modname       # run a string of code
>>> string                         # imported in this namespace
<module 'string'>

The exec statement (and its cousin, the eval function) compiles a string of code, and passes it to the Python interpreter to be executed. In Python, the bytecode compiler is available at runtime, so you can write programs that construct and run other programs like this. By default, exec runs the code in the current scope, but you can get more specific by passing in optional namespace dictionaries. We'll say more about these tools later in this book.

The only real drawback to exec is that it must compile the import statement each time it runs; if it runs many times, you might be better off using the built-in __ import __ function to load from a name string instead. The effect is similar, but __ import __ returns the module object, so we assign it to a name here:

>>> modname = "string"
>>> string = __import__(modname)
>>> string
<module 'string'>

5.7.2 from Copies Names but Doesn't Link

Earlier, we mentioned that the from statement is really an assignment to names in the importer's scope—a name-copy operation, not a name aliasing. The implications of this are the same as for all assignments in Python, but subtle, especially given that the code that shares objects lives in different files. For instance, suppose we define a module nested1 as follows:

X = 99
def printer(): print X

Now, if we import its two names using from in another module, we get copies of those names, not links to them. Changing a name in the importer resets only the binding of the local version of that name, not the name in nested1:

from nested1 import X, printer    # copy names out
X = 88                            # changes my "X" only!
printer()                         # nested1's X is still 99

% python nested2.py
99

5.7.2.1 Solution

On the other hand, if you use import to get the whole module and assign to a qualified name, you change the name in nested1. Qualification directs Python to a name in the module object, rather than a name in the importer:

import nested1                    # get module as a whole
nested1.X = 88                    # okay: change nested1's X
nested1.printer() 

% python nested3.py
88

5.7.3 Statement Order Matters in Top-Level Code

As we also saw earlier, when a module is first imported (or reloaded), Python executes its statements one by one, from the top of file to the bottom. This has a few subtle implications regarding forward references that are worth underscoring here:

Code at the top level of a module file (not nested in a function) runs as soon as Python reaches it during an import; because of that, it can't reference names assigned lower in the file.
Code inside a function body doesn't run until the function is called; because names in a function aren't resolved until the function actually runs, they can usually reference names anywhere in the file.

In general, forward references are only a concern in top-level module code that executes immediately; functions can reference names arbitrarily. Here's an example that illustrates forward reference dos and don'ts:

func1()               # error: "func1" not yet assigned

def func1():
    print func2()     # okay:  "func2" looked up later

func1()               # error: "func2" not yet assigned

def func2():
    return "Hello"

func1()               # okay:  "func1" and "func2" assigned

When this file is imported (or run as a standalone program), Python executes its statements from top to bottom. The first call to func1 fails because the func1 def hasn't run yet. The call to func2 inside func1 works as long as func2's def has been reached by the time func1 is called (it hasn't when the second top-level func1 call is run). The last call to func1 at the bottom of the file works, because func1 and func2 have both been assigned.

5.7.3.1 Solution

Don't do that. Mixing defs with top-level code is not only hard to read, it's dependent on statement ordering. As a rule of thumb, if you need to mix immediate code with defs, put your defs at the top of the file and top-level code at the bottom. That way, your functions are defined and assigned by the time code that uses them runs.

5.7.4 Recursive "from" Imports May Not Work

Because imports execute a file's statements from top to bottom, we sometimes need to be careful when using modules that import each other (something called recursive imports ). Since the statements in a module have not all been run when it imports another module, some of its names may not yet exist. If you use import to fetch a module as a whole, this may or may not matter; the module's names won't be accessed until you later use qualification to fetch their values. But if you use from to fetch specific names, you only have access to names already assigned.

For instance, take the following modules recur1 and recur2. recur1 assigns a name X, and then imports recur2, before assigning name Y. At this point, recur2 can fetch recur1 as a whole with an import (it already exists in Python's internal modules table), but it can see only name X if it uses from; the name Y below the import in recur1 doesn't yet exist, so you get an error:

module recur1.py

X = 1
import recur2             # run recur2 now if doesn't exist
Y = 2

module recur2.py

from recur1 import X      # okay: "X" already assigned
from recur1 import Y      # error: "Y" not yet assigned

>>> import recur1
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "recur1.py", line 2, in ?
    import recur2
  File "recur2.py", line 2, in ?
    from recur1 import Y   # error: "Y" not yet assigned
ImportError: cannot import name Y

Python is smart enough to avoid rerunning recur1's statements when they are imported recursively from recur2 (or else the imports would send the script into an infinite loop), but recur1's namespace is incomplete when imported by recur2.

5.7.4.1 Solutions

Don't do that....really! Python won't get stuck in a cycle, but your programs will once again be dependent on the order of statements in modules. There are two ways out of this gotcha:

You can usually eliminate import cycles like this by careful design; maximizing cohesion and minimizing coupling are good first steps.
If you can't break the cycles completely, postpone module name access by using import and qualification (instead of from), or running your froms inside functions (instead of at the top level of the module).

5.7.5 reload May Not Impact from Imports

The from statement is the source of all sorts of gotchas in Python. Here's another: because from copies (assigns) names when run, there's no link back to the module where the names came from. Names imported with from simply become references to objects, which happen to have been referenced by the same names in the importee when the from ran. Because of this behavior, reloading the importee has no effect on clients that use from; the client's names still reference the objects fetched with from, even though names in the original module have been reset:

from module import X       # X may not reflect any module reloads!
. . .
reload(module)             # changes module, not my names
X                          # still references old object

5.7.5.1 Solution

Don't do it that way. To make reloads more effective, use import and name qualification, instead of from. Because qualifications always go back to the module, they will find the new bindings of module names after calling reload:

import module              # get module, not names
. . .
reload(module)             # changes module in-place
module.X                   # get current X: reflects module reloads

5.7.6 reload Isn't Applied Transitively

When you reload a module, Python only reloads that particular module's file; it doesn't automatically reload modules that the file being reloaded happens to import. For example, if we reload some module A, and A imports modules B and C, the reload only applies to A, not B and C. The statements inside A that import B and C are rerun during the reload, but they'll just fetch the already loaded B and C module objects (assuming they've been imported before):

% cat A.py
import B                   # not reloaded when A is
import C                   # just an import of an already loaded module

% python
>>> . . . 
>>> reload(A)

5.7.6.1 Solution

Don't depend on that. Use multiple reload calls to update subcomponents independently. If desired, you can design your systems to reload their subcomponents automatically by adding reload calls in parent modules like A .^[9]

^[9] You could also write a general tool to do transitive reloads automatically, by scanning module __ dict __s (see the section Section 5.6.7), and checking each item's type() to find nested modules to reload recursively. This is an advanced exercise for the ambitious.

I l@ve RuBoard