|I l@ve RuBoard|
4.2 Reading from a File
Credit: Luther Blissett
Here's the most convenient way to read all of the file's contents at once into one big string:
all_the_text = open('thefile.txt').read( ) # all text from a text file all_the_data = open('abinfile', 'rb').read( ) # all data from a binary file
However, it is better to bind the file object to a variable so that you can call close on it as soon as you're done. For example, for a text file:
file_object = open('thefile.txt') all_the_text = file_object.read( ) file_object.close( )
list_of_all_the_lines = file_object.readlines( ) list_of_all_the_lines = file_object.read( ).splitlines(1) list_of_all_the_lines = file_object.read().splitlines( ) list_of_all_the_lines = file_object.read( ).split('\n')
The first two ways leave a '\n' at the end of each line (i.e., in each string item in the result list), while the other two ways remove all trailing '\n' characters. The first of these four ways is the fastest and most Pythonic. In Python 2.2 and later, there is a fifth way that is equivalent to the first one:
list_of_all_the_lines = list(file_object)
Unless the file you're reading is truly huge, slurping it all into memory in one gulp is fastest and generally most convenient for any further processing. The built-in function open creates a Python file object. With that object, you call the read method to get all of the contents (whether text or binary) as a single large string. If the contents are text, you may choose to immediately split that string into a list of lines, with the split method or with the specialized splitlines method. Since such splitting is a frequent need, you may also call readlines directly on the file object, for slightly faster and more convenient operation. In Python 2.2, you can also pass the file object directly as the only argument to the built-in type list.
On Unix and Unix-like systems, such as Linux and BSD variants, there is no real distinction between text files and binary data files. On Windows and Macintosh systems, however, line terminators in text files are encoded not with the standard '\n' separator, but with '\r\n' and '\r', respectively. Python translates the line-termination characters into '\n' on your behalf, but this means that you need to tell Python when you open a binary file, so that it won't perform the translation. To do that, use 'rb' as the second argument to open. This is innocuous even on Unix-like platforms, and it's a good habit to distinguish binary files from text files even there, although it's not mandatory in that case. Such a good habit will make your programs more directly understandable, as well as letting you move them between platforms more easily.
You can call methods such as read directly on the file object produced by the open function, as shown in the first snippet of the solution. When you do this, as soon as the reading operation finishes, you no longer have a reference to the file object. In practice, Python notices the lack of a reference at once and immediately closes the file. However, it is better to bind a name to the result of open, so that you can call close yourself explicitly when you are done with the file. This ensures that the file stays open for as short a time as possible, even on platforms such as Jython and hypothetical future versions of Python on which more advanced garbage-collection mechanisms might delay the automatic closing that Python performs.
file_object = open('abinfile', 'rb') while 1: chunk = file_object.read(100) if not chunk: break do_something_with(chunk) file_object.close( )
Passing an argument N to the read method ensures that read will read only the next N bytes (or fewer, if the file is closer to the end). read returns the empty string when it reaches the end of the file.
for line in open('thefile.txt'): do_something_with(line)
Several idioms were common in older versions of Python. The one idiom you can be sure will work even on extremely old versions of Python, such as 1.5.2, is quite similar to the idiom for reading a binary file a chunk at a time:
file_object = open('thefile.txt') while 1: line = file_object.readline( ) if not line: break do_something_with(line) file_object.close( )
readline, like read, returns the empty string when it reaches the end of the file. Note that the end of the file is easily distinguished from an empty line because the latter is returned by readline as '\n', which is not an empty string but rather a string with a length of 1.
4.2.4 See Also
Recipe 4.3; documentation for the open built-in function and file objects in the Library Reference.
|I l@ve RuBoard|