I l@ve RuBoard Previous Section Next Section

4.4 Searching and Replacing Text in a File

Credit: Jeff Bauer

4.4.1 Problem

You need to change one string into another throughout a file.

4.4.2 Solution

String substitution is most simply performed by the replace method of string objects. The work here is to support reading from the specified file (or standard input) and writing to the specified file (or standard output):

#!/usr/bin/env python
import os, sys

nargs = len(sys.argv)

if not 3 <= nargs <= 5:
    print "usage: %s search_text replace_text [infile [outfile]]" % \
        os.path.basename(sys.argv[0])
else:
    stext = sys.argv[1]
    rtext = sys.argv[2]
    input = sys.stdin
    output = sys.stdout
    if nargs > 3:
        input = open(sys.argv[3])
    if nargs > 4:
        output = open(sys.argv[4], 'w')
    for s in input.xreadlines(  ):
        output.write(s.replace(stext, rtext))
    output.close(  )
    input.close(  )

4.4.3 Discussion

This recipe is really simple, but that's what beautiful about it梬hy do complicated stuff when simple stuff suffices? The recipe is a simple main script, as indicated by the leading "shebang" line. The script looks at its arguments to determine the search text, the replacement text, the input file (defaulting to standard input), and the output file (defaulting to standard output). Then, it loops over each line of the input file, writing to the output file a copy of the line with the substitution performed on it. That's all! For accuracy, it closes both files at the end.

As long as it fits comfortably in memory in two copies (one before and one after the replacement, since strings are immutable), we could, with some speed gain, operate on the whole input file's contents at once instead of looping. With today's PCs typically coming with 256 MB of memory, handling files of up to about 100 MB should not be a problem. It suffices to replace the for loop with one single statement:

output.write(input.read(  ).replace(stext, rtext))

As you can see, that's even simpler than the loop used in the recipe.

If you're stuck with an older version of Python, such as 1.5.2, you may still be able to use this recipe. Change the import statement to:

import os, sys, string

and change the last two lines of the recipe into:

for s in input.readlines(  ):
    output.write(string.replace(s, stext, rtext))

The xreadlines method used in the recipe was introduced with Python 2.1. It takes precautions not to read all of the file into memory at once, while readlines must do so, and thus may have problems with truly huge files.

In Python 2.2, the for loop can also be written more directly as:

for s in input:
    output.write(s.replace(stext, rtext))

This offers the fastest and simplest approach.

4.4.4 See Also

Documentation for the open built-in function and file objects in the Library Reference.

    I l@ve RuBoard Previous Section Next Section