I l@ve RuBoard

1.13 Flattening a Nested Sequence

Credit: Luther Blissett

1.13.1 Problem

You have a sequence, such as a list, some of whose items may in turn be lists, and so on. You need to flatten it out into a sequence of its scalar items (the leaves, if you think of the nested sequence as a tree).

1.13.2 Solution

Of course, we need to be able to tell which of the elements we're handling are to be deemed scalar. For generality, say we're passed as an argument a predicate that defines what is scalar梐 function that we can call on any element and that returns 1 if the element is scalar or 0 otherwise. Given this, one approach is:

def flatten(sequence, scalarp, result=None):
    if result is None: result = []
    for item in sequence:
        if scalarp(item): result.append(item)
        else: flatten(item, scalarp, result)
    return result

In Python 2.2, a simple generator is an interesting alternative, and, if all the caller needs to do is loop over the flattened sequence, may save the memory needed for the result list:

from _ _future_ _ import generators
def flatten22(sequence, scalarp):
    for item in sequence:
        if scalarp(item):
            yield item
        else:
            for subitem in flatten22(item, scalarp):
                yield subitem

1.13.3 Discussion

The only problem with this recipe is that determining what is a scalar is not as obvious as it might seem, which is why I delegated that decision to a callable predicate argument that the caller is supposed to pass to flatten. Of course, we must be able to loop over the items of any non-scalar with a for statement, or flatten will raise an exception (since it does, via a recursive call, attempt a for statement over any non-scalar item). In Python 2.2, that's easy to check:

def canLoopOver(maybeIterable):
    try: iter(maybeIterable)
    except: return 0
    else: return 1

The built-in function iter, new in Python 2.2, returns an iterator, if possible. for x in s implicitly calls the iter function, so the canLoopOver function can easily check if for is applicable by calling iter explicitly and seeing if that raises an exception.

In Python 2.1 and earlier, there is no iter function, so we have to try more directly:

def canLoopOver(maybeIterable):
    try:
        for x in maybeIterable:
            return 1
        else:
            return 1
    except:
        return 0

Here we have to rely on the for statement itself raising an exception if maybeIterable is not iterable after all. Note that this approach is not fully suitable for Python 2.2: if maybeIterable is an iterator object, the for in this approach consumes its first item.

Neither of these implementations of canLoopOver is entirely satisfactory, by itself, as our scalar-testing predicate. The problem is with strings, Unicode strings, and other string-like objects. These objects are perfectly good sequences, and we could loop on them with a for statement, but we typically want to treat them as scalars. And even if we didn't, we would at least have to treat any string-like objects with a length of 1 as scalars. Otherwise, since such strings are iterable and yield themselves as their only items, our flatten function would not cease recursion until it exhausted the call stack and raised a RuntimeError due to "maximum recursion depth exceeded."

Fortunately, we can easily distinguish string-like objects by attempting a typical string operation on them:

def isStringLike(obj):
    try: obj+''
    except TypeError: return 0
    else: return 1

Now, we finally have a good implementation for the scalar-checking predicate:

def isScalar(obj):
    return isStringLike(obj) or not canLoopOver(obj)

By simply placing this isScalar function and the appropriate implementation of canLoopOver in our module, before the recipe's functions, we can change the signatures of these functions to make them easier to call in most cases. For example:

def flatten22(sequence, scalarp=isScalar):

Now the caller needs to pass the scalarp argument only in those (hopefully rare) cases where our definition of what is scalar does not quite meet the caller's application-specific needs.

1.13.4 See Also

The Library Reference section on sequence types.

I l@ve RuBoard