|I l@ve RuBoard|
4.4 Searching and Replacing Text in a File
Credit: Jeff Bauer
String substitution is most simply performed by the replace method of string objects. The work here is to support reading from the specified file (or standard input) and writing to the specified file (or standard output):
#!/usr/bin/env python import os, sys nargs = len(sys.argv) if not 3 <= nargs <= 5: print "usage: %s search_text replace_text [infile [outfile]]" % \ os.path.basename(sys.argv) else: stext = sys.argv rtext = sys.argv input = sys.stdin output = sys.stdout if nargs > 3: input = open(sys.argv) if nargs > 4: output = open(sys.argv, 'w') for s in input.xreadlines( ): output.write(s.replace(stext, rtext)) output.close( ) input.close( )
This recipe is really simple, but that's what beautiful about it梬hy do complicated stuff when simple stuff suffices? The recipe is a simple main script, as indicated by the leading "shebang" line. The script looks at its arguments to determine the search text, the replacement text, the input file (defaulting to standard input), and the output file (defaulting to standard output). Then, it loops over each line of the input file, writing to the output file a copy of the line with the substitution performed on it. That's all! For accuracy, it closes both files at the end.
As long as it fits comfortably in memory in two copies (one before and one after the replacement, since strings are immutable), we could, with some speed gain, operate on the whole input file's contents at once instead of looping. With today's PCs typically coming with 256 MB of memory, handling files of up to about 100 MB should not be a problem. It suffices to replace the for loop with one single statement:
output.write(input.read( ).replace(stext, rtext))
As you can see, that's even simpler than the loop used in the recipe.
If you're stuck with an older version of Python, such as 1.5.2, you may still be able to use this recipe. Change the import statement to:
import os, sys, string
and change the last two lines of the recipe into:
for s in input.readlines( ): output.write(string.replace(s, stext, rtext))
The xreadlines method used in the recipe was introduced with Python 2.1. It takes precautions not to read all of the file into memory at once, while readlines must do so, and thus may have problems with truly huge files.
In Python 2.2, the for loop can also be written more directly as:
for s in input: output.write(s.replace(stext, rtext))
This offers the fastest and simplest approach.
4.4.4 See Also
Documentation for the open built-in function and file objects in the Library Reference.
|I l@ve RuBoard|