I l@ve RuBoard

C.8 Chapter 8

Describing a directory. There are several solutions to this exercise, naturally. One simple solution is:

import os, sys, stat

def describedir(start):
    def describedir_helper(arg, dirname, files):
        """ Helper function for describing directories """
        print "Directory %s has files:" % dirname
        for file in files:
            # find the full path to the file (directory + filename)
            fullname = os.path.join(dirname, file)
            if os.path.isdir(fullname):
                # if it's a directory, say so; no need to find the size
                print '  '+ file + ' (subdir)' 
            else: 
                # find out the size, and print the info.
                size = os.stat(fullname)[stat.ST_SIZE]
                print '  '+file+' size='  + `size`

    # Start the 'walk'.
    os.path.walk(start, describedir_helper, None)

which uses the walk function in the os.path module, and works just fine:

>>> import describedir
>>> describedir.describedir2('testdir')
Directory testdir has files:
  describedir.py size=939
  subdir1 (subdir)
  subdir2 (subdir)
Directory testdir\subdir1 has files:
  makezeros.py size=125
  subdir3 (subdir)
Directory testdir\subdir1\subdir3 has files:
Directory testdir\subdir2 has files:

Note that you could have found the size of the files by doing len(open(fullname, 'rb').read()), but this works only when you have read access to all the files and is quite inefficient. The stat call in the os module gives out all kinds of useful information in a tuple, and the stat module defines some names that make it unnecessary to remember the order of the elements in that tuple. See the Library Reference for details.

Modifying the prompt. The key to this exercise is to remember that the ps1 and ps2 attributes of the sys module can be anything, including a class instance with a __repr__ or _ _str__ method. For example:

import sys, os
class MyPrompt:
    def __init__(self, subprompt='>>> '):
        self.lineno = 0
        self.subprompt = subprompt
    def __repr__(self):
        self.lineno = self.lineno + 1
        return os.getcwd()+'|%d'%(self.lineno)+self.subprompt

sys.ps1 = MyPrompt()
sys.ps2 = MyPrompt('... ')

This code works as shown (use the -i option of the Python interpreter to make sure your program starts right away):

h:\David\book> python -i modifyprompt.py
h:\David\book|1>>> x = 3
h:\David\book|2>>> y = 3
h:\David\book|3>>> def foo():
h:\David\book|3...   x = 3                # the secondary prompt is supported
h:\David\book|3...
h:\David\book|4>>> import os
h:\David\book|5>>> os.chdir('..')
h:\David|6>>>                             # note the prompt changed!

Avoiding regular expressions. This program is long and tedious, but not especially complicated. See if you can understand how it works. Whether this is easier for you than regular expressions depends on many factors, such as your familiarity with regular expressions and your comfort with the functions in the string module. Use whichever type of programming works for you.

import string
file = open('pepper.txt')
text = file.read()
paragraphs = string.split(text, '\n\n')

def find_indices_for(big, small):
    indices = []
    cum = 0
    while 1:
        index = string.find(big, small)
        if index == -1:
            return indices
        indices.append(index+cum)
        big = big[index+len(small):]
        cum = cum + index + len(small)

def fix_paragraphs_with_word(paragraphs, word):
    lenword = len(word)
    for par_no in range(len(paragraphs)):
        p = paragraphs[par_no]
        wordpositions = find_indices_for(p, word)
        if wordpositions == []: return
        for start in wordpositions:
            # look for 'pepper' ahead
            indexpepper = string.find(p, 'pepper')
            if indexpepper == -1: return -1
            if string.strip(p[start:indexpepper]) != '':
                # something other than whitespace in between!
                continue
            where = indexpepper+len('pepper')
            if p[where:where+len('corn')] == 'corn':
                # it's immediately followed by 'corn'!
                continue
            if string.find(p, 'salad') < where:
                # it's not followed by 'salad'
                continue
            # Finally! we get to do a change!
            p = p[:start] + 'bell' + p[start+lenword:]
            paragraphs[par_no] = p         # change mutable argument!

fix_paragraphs_with_word(paragraphs, 'red')
fix_paragraphs_with_word(paragraphs, 'green')

for paragraph in paragraphs:
    print paragraph+'\n'

We won't repeat the output here; it's the same as that of the regular expression solution.

Wrapping a text file with a class. This one is surprisingly easy, if you understand classes and the split function in the string module. The following is a version that has one little twist over and beyond what we asked for:

import string

class FileStrings:
    def __init__(self, filename=None, data=None):
        if data == None:
            self.data = open(filename).read()
        else:
            self.data = data
        self.paragraphs = string.split(self.data, '\n\n')
        self.lines = string.split(self.data, '\n')
        self.words = string.split(self.data)
    def __repr__(self):
        return self.data
    def paragraph(self, index):
        return FileStrings(data=self.paragraphs[index])
    def line(self, index):
        return FileStrings(data=self.lines[index])
    def word(self, index):
        return self.words[index]

This solution, when applied to the file pepper.txt, gives:

>>> from FileStrings import FileStrings
>>> bigtext = FileStrings('pepper.txt')
>>> print bigtext.paragraph(0)
This is a paragraph that mentions bell peppers multiple times.  For
one, here is a red Pepper and dried tomato salad recipe.  I don't like
to use green peppers in my salads as much because they have a harsher
flavor.
>>> print bigtext.line(0)
This is a paragraph that mentions bell peppers multiple times.  For
>>> print bigtext.line(-4)
aren't peppers, they're chilies, but would you rather have a good cook
>>> print bigtext.word(-4)
botanist

How does it work? The constructor simply reads all the file into a big string (the instance attribute data) and then splits it according to the various criteria, keeping the results of the splits in instance attributes that are lists of strings. When returning from one of the accessor methods, the data itself is wrapped in a FileStrings object. This isn't required by the assignment, but it's nice because it means you can chain the operations, so that to find out what the last word of the third line of the third paragraph is, you can just write:

>>> print bigtext.paragraph(2).line(2).word(-1)
'cook'

I l@ve RuBoard