I l@ve RuBoard

16.3 Translating a Python Sequence into a C Array with the PySequence_Fast Protocol

Credit: Luther Blissett

16.3.1 Problem

You have an existing C function that takes as an argument a C array of C-level values (e.g., doubles), and want to wrap it into a Python callable C extension that takes as an argument a Python sequence (or iterator).

16.3.2 Solution

The easiest way to accept an arbitrary Python sequence in the Python C API is with the PySequence_Fast function, which builds and returns a tuple when needed, but returns only its argument (with the reference count incremented) if the argument is already a list:

#include <Python.h>

/* a preexisting C-level function you want to expose -- e.g: */
static double total(double* data, int len)
{
    double total = 0.0;
    int i;
    for(i=0; i<len; ++i)
        total += data[i];
    return total;
}

/* here is how you expose it to Python code: */
static PyObject *totalDoubles(PyObject *self, PyObject *args)
{
    PyObject* seq;
    double *dbar;
    double result;
    int seqlen;
    int i;

    /* get one argument as a sequence */
    if(!PyArg_ParseTuple(args, "O", &seq))
        return 0;
    seq = PySequence_Fast(seq, "argument must be iterable");
    if(!seq)
        return 0;

    /* prepare data as an array of doubles */
    seqlen = PySequence_Fast_GET_SIZE(seq);
    dbar = malloc(seqlen*sizeof(double));
    if(!dbar) {
        Py_DECREF(seq);
        return PyErr_NoMemory(  );
    }
    for(i=0; i < seqlen; i++) {
        PyObject *fitem;
        PyObject *item = PySequence_Fast_GET_ITEM(seq, i);
        if(!item) {
            Py_DECREF(seq);
            free(dbar);
            return 0;
        }
        fitem = PyNumber_Float(item);
        if(!fitem) {
            Py_DECREF(seq);
            free(dbar);
            PyErr_SetString(PyExc_TypeError, "all items must be numbers");
            return 0;
        }
        dbar[i] = PyFloat_AS_DOUBLE(fitem);
        Py_DECREF(fitem);
    }    

    /* clean up, compute, and return result */
    Py_DECREF(seq);
    result = total(dbar, seqlen);
    free(dbar);
    return Py_BuildValue("d", result);
}

static PyMethodDef totalMethods[] = {
    {"total", totalDoubles, METH_VARARGS, "Sum a sequence of numbers."},
    {0} /* sentinel */
};

void
inittotal(void)
{
    (void) Py_InitModule("total", totalMethods);
}

16.3.3 Discussion

The two best ways for your C-coded, Python-callable extension functions to accept generic Python sequences as arguments are PySequence_Fast and PyObject_GetIter (in Python 2.2 only). The latter can often save memory, but it is appropriate only when it's okay for the rest of your C code to get the items one at a time without knowing beforehand how many items there will be in total. Often, you have preexisting C functions from an existing library that you want to expose to Python code, and those most often require that their input sequences are C arrays. Thus, this recipe shows how to build a C array (in this case, an array of double) from a generic Python sequence argument, so you can pass the array (and the integer that gives the array's length) to your existing C function (represented here, as an example, by the total function at the start of the recipe).

PySequence_Fast takes two arguments: a Python object to be presented as a sequence and a string to use as the error message in case the Python object cannot be presented as a sequence, in which case it returns 0 (the null pointer, an error indicator). If the Python object is already a list or tuple, PySequence_Fast returns the same object with the reference count increased by one. If the Python object is any other kind of sequence (or, in Python 2.2, any iterator or iterable), PySequence_Fast builds and returns a new tuple with all items already in place. In any case, PySequence_fast returns an object on which you can call PySequence_Fast_GET_SIZE to learn the sequence length (as we do in the recipe to malloc the appropriate amount of storage for the C array) and PySequence_Fast_GET_ITEM to get an item given a valid index (between 0, included, and the sequence length, excluded).

The recipe requires quite a bit of care, which is typical of all C-coded Python extensions (and, more generally, any C code), to deal with memory and error conditions properly. For C-coded Python extensions, it's imperative that you know which functions return new references (which you must Py_DECREF when you are done with them) and which return borrowed references (which you must not Py_DECREF, but on the contrary, Py_INCREF if you want to keep a copy for a longer time). In this specific case, you have to know the following (by reading the Python documentation):

PyArg_ParseTuple always gives you borrowed references.
PySequence_Fast returns a new reference.
PySequence_Fast_GET_ITEM returns a borrowed reference.
PyNumber_Float returns a new reference.

There is method to this madness: even though as you start your career as a coder of C API Python extensions, you'll no doubt have to double-check each case. Python's C API strives to return borrowed references for performance when it knows it can always do so safely (i.e., it knows that the reference it is returning necessarily refers to an already existing object). It has to return a new reference when it's possible (or certain) that a new object may have to be created.

For example, in the above list, PyNumber_Float and PySequence_Fast may be able to return the same object they were given as an argument, but it's also quite possible that they may have to create a new object for this purpose to ensure that the returned object has the correct type. Therefore, these two functions are specified as always returning new references. PyArg_ParseTuple and PySequence_Fast_GET_ITEM, on the other hand, will always return references to objects that already exist elsewhere (as items in the arguments' tuple or items in the fast-sequence container, respectively), and therefore, these two functions can afford to return borrowed references and are thus specified as doing so.

One last note: when we have an item from the fast-sequence container, we immediately try to transform it into a Python float object and deal with the possibility that the transformation will fail (e.g., if we're passed a sequence containing a string, a complex number, etc.). It is often quite futile to first attempt a check (with PyNumber_Check), because the check might succeed, and the later transformation attempt might fail anyway (e.g., with a complex-number item).

As usual, the best way to build this extension (assuming you've saved it to a total.py file) is with the distutils package. Place a file named setup.py such as:

from distutils.core import setup, Extension

setup(name = "total", maintainer = "Luther Blissett", maintainer_email =
    "situ@tioni.st", ext_modules = [Extension('total',sources=['total.c'])]
)

in the same directory as the C source, then build and install by running:

$ python setup.py install

The nice thing about this is that it works on any platform (assuming you have Python 2.0 or later and have access to the same C compiler used to build your version of Python).

16.3.4 See Also

The Extending and Embedding manual is available as part of the standard Python documentation set at http://www.python.org/doc/current/ext/ext.html; documentation on Python C API at http://www.python.org/doc/current/api/api.html; the Distributing Python Modules section of the standard Python documentation set is still incomplete, but it is the best source of information on the distutils package.

I l@ve RuBoard