I l@ve RuBoard Previous Section Next Section

8.8 Storing a BLOB in a PostgreSQL Database

Credit: Luther Blissett

8.8.1 Problem

You need to store a binary large object (BLOB) in a PostgreSQL database.

8.8.2 Solution

PostgreSQL 7.2 supports large objects, and the psycopg module supplies a Binary escaping function:

import psycopg, cPickle

# Connect to a DB, e.g., the test DB on your localhost, and get a cursor
connection = psycopg.connect("dbname=test")
cursor = connection.cursor(  )

# Make a new table for experimentation
cursor.execute("CREATE TABLE justatest (name TEXT, ablob BYTEA)")

try:
    # Prepare some BLOBs to insert in the table
    names = 'aramis', 'athos', 'porthos'
    data = {}
    for name in names:
        datum = list(name)
        datum.sort(  )
        data[name] = cPickle.dumps(datum, 1)

    # Perform the insertions
    sql = "INSERT INTO justatest VALUES(%s, %s)"
    for name in names:
        cursor.execute(sql, (name, psycopg.Binary(data[name])) )

    # Recover the data so you can check back
    sql = "SELECT name, ablob FROM justatest ORDER BY name"
    cursor.execute(sql)
    for name, blob in cursor.fetchall(  ):
        print name, cPickle.loads(blob), cPickle.loads(data[name])
finally:
    # Done. Remove the table and close the connection.
    cursor.execute("DROP TABLE justatest")
    connection.close(  )

8.8.3 Discussion

PostgreSQL supports binary data (BYTEA and variations thereof), but you need to be careful when communicating such data via SQL. Specifically, when you use a normal INSERT SQL statement and need to have binary strings among the VALUES you're inserting, you need to escape some characters in the binary string according to PostgreSQL's own rules. Fortunately, you don't have to figure out those rules for yourself: PostgreSQL supplies functions that do all the needed escaping, and psycopg exposes such a function to your Python programs as the Binary function. This recipe shows a typical case: the BYTEAs you're inserting come from cPickle.dumps, so they may represent almost arbitrary Python objects (although, in this case, we're just using them for a few lists of characters). The recipe is purely demonstrative and works by creating a table and dropping it at the end (using a try/finally statement to ensure finalization is performed even if the program terminates because of an uncaught exception).

Earlier PostgreSQL releases put limits of a few KB on the amount of data you could store in a normal field of the database. To store really large objects, you needed to use roundabout techniques to load the data into the database (such as PostgreSQL's nonstandard SQL function LO_IMPORT to load a datafile as an object, which requires superuser privileges and datafiles that reside on the machine running the PostgreSQL server) and store a field of type OID in the table to be used later for indirect recovery of the data. Fortunately, none of these techniques are necessary anymore: since Release 7.1 (the current release at the time of writing is 7.2.1), PostgreSQL embodies the results of project TOAST, which removes the limitations on field-storage size and therefore the need for peculiar indirection. psycopg supplies the handy Binary function to let you escape any binary string of bytes into a form acceptable for placeholder substitution in INSERT and UPDATE SQL statements.

8.8.4 See Also

Recipe 8.7 for a MySQL-oriented solution to the same problem; PostgresSQL's home page (http://www.postgresql.org/); the Python/PostgreSQL module (http://initd.org/software/psycopg).

    I l@ve RuBoard Previous Section Next Section