I l@ve RuBoard

4.15 Updating a Random-Access File

Credit: Luther Blissett

4.15.1 Problem

You want to read a binary record from somewhere inside a large file of fixed-length records, change the values, and write the record back.

4.15.2 Solution

Read the record, unpack it, perform whatever computations you need for the update, pack the fields back into the record, seek to the start of the record again, and write it back. Phew. Faster to code than to say:

import struct

thefile = open('somebinfile', 'r+b')
record_size = struct.calcsize(format_string)

thefile.seek(record_size * record_number)
buffer = thefile.read(record_size)
fields = list(struct.unpack(format_string, buffer))

# Perform computations, suitably modifying fields, then:

buffer = struct.pack(format_string, *fields)
thefile.seek(record_size * record_number)
thefile.write(buffer)

thefile.close(  )

4.15.3 Discussion

This approach works only on files (generally binary ones) defined in terms of records that are all the same, fixed size; it doesn't work on normal text files. Furthermore, the size of each record must be that defined by a struct's format string, as shown in the recipe's code. A typical format string, for example, might be "8l", to specify that each record is made up of eight four-byte integers, each to be interpreted as a signed value and unpacked into a Python int. In this case, the fields variable in the recipe would be bound to a list of eight ints. Note that struct.unpack returns a tuple. Because tuples are immutable, the computation would have to rebind the entire fields variable. A list is not immutable, so each field can be rebound as needed. Thus, for convenience, we explicitly ask for a list when we bind fields. Make sure, however, not to alter the length of the list. In this case, it needs to remain composed of exactly eight integers, or the struct.pack call will raise an exception when we call it with a format_string that is still "8l". Also note that this recipe is not suitable for working with records that are not all of the same, unchanging length.

To seek back to the start of the record, instead of using the record_size*record_number offset again, you may choose to do a relative seek:

thefile.seek(-record_size, 1)

The second argument to the seek method (1) tells the file object to seek relative to the current position (here, so many bytes back, because we used a negative number as the first argument). seek's default is to seek to an absolute offset within the file (i.e., from the start of the file). You can also explicitly request this default behavior by calling seek with a second argument of 0.

Of course, you don't need to open the file just before you do the first seek or close it right after the write. Once you have a file object that is correctly opened (i.e., for update, and as a binary rather than a text file), you can perform as many updates on the file as you want before closing the file again. These calls are shown here to emphasize the proper technique for opening a file for random-access updates and the importance of closing a file when you are done with it.

The file needs to be opened for updating (i.e., to allow both reading and writing). That's what the 'r+b' argument to open means: open for reading and writing, but do not implicitly perform any transformations on the file's contents, because the file is a binary one (the 'b' part is unnecessary but still recommended for clarity on Unix and Unix-like systems梙owever, it's absolutely crucial on other platforms, such as Macintosh and Windows). If you're creating the binary file from scratch but you still want to be able to reread and update some records without closing and reopening the file, you can use a second argument of 'w+b' instead. However, I have never witnessed this strange combination of requirements; binary files are normally first created (by opening them with 'wb', writing data, and closing the file) and later opened for update with 'r+b'.

4.15.4 See Also

The sections of the Library Reference on file objects and the struct module; Perl Cookbook Recipe 8.13.

I l@ve RuBoard