I l@ve RuBoard Previous Section Next Section

2.8 Sorting by Item or by Attribute

Credit: Matthew Wood

2.8.1 Problem

You need to sort a list of (x, y) coordinates by item y, or, more generally, sort a list of objects by any attribute or item.

2.8.2 Solution

You might first think of something like the following class, based on the simple but slow approach of passing a comparison function to the sort method:

class Sorter:
    # Notice how _ _compare is dependent on self._ _whichindex
    def _ _compare(self, x, y):
        return cmp(x[self._ _whichindex], y[self._ _whichindex])

    # Pass the sort function the _ _compare function defined above
    def _ _call_ _(self, data, whichindex = None):
        if whichindex is None :
            data.sort(  )
        else :
            self._ _whichindex = whichindex
            data.sort(self._ _compare)
        return data               # handy for inlining, and low-cost

The trick is to use a bound method that accesses instance state as the comparison function to determine which item (or attribute, but this version deals only with items) to fetch. Unfortunately, this makes the approach nonreentrant and not thread-safe.

Thanks to the faster, more robust, and more flexible DSU idiom, it's not hard to make a more general version that allows attributes as well as items, guarantees a stable sort, is both reentrant and thread-safe, and is speedier to boot:

class Sorter:
    def _helper(self, data, aux, inplace):
        aux.sort(  )
        result = [data[i] for junk, i in aux]
        if inplace: data[:] = result
        return result

    def byItem(self, data, itemindex=None, inplace=1):
        if itemindex is None:
            if inplace:
                data.sort(  )
                result = data
                result = data[:]
                result.sort(  )
            return result
            aux = [(d[i][itemindex], i) for i in range(len(data))]
            return self._helper(data, aux, inplace)

    # a couple of handy synonyms
    sort = byItem
    _ _call_ _ = byItem

    def byAttribute(self, data, attributename, inplace=1):
        aux = [(getattr(d[i],attributename),i) for i in range(len(data))]
        return self._helper(data, aux, inplace)

Of course, since the second version doesn't use its "classhood" in any way, making it a class is somewhat peculiar. It would be far more Pythonic, and clearer, to use a module with free-standing functions, rather than an artificial class with instances that have no state.

2.8.3 Discussion

How do you efficiently sort a list of (x, y) coordinates by y? More generally, how do you sort a list of dictionaries, lists, or class instances by a particular item or attribute? I hate not being able to sort by any item or attribute, so I wrote an auxiliary class to do it for me.

The DSU idiom is much faster than passing sort a comparison function, as discussed in other recipes. The second version of Sorter in this recipe uses DSU because of this, as well as auxiliary flexibility advantages. This second version gets no benefit from being a class rather than just a couple of functions, but casting it as a class makes it drop-in compatible with the first, inferior version, which did use some state as a trick (losing reentrancy and thread-safety in the process).

Here is some example code (note that it instantiates the Sorter class only once, another hint that it is not at all an optimal architecture to wrap this functionality as a class):

sort = Sorter(  )

if _ _name_ _ == '_ _main_ _' :
      list = [(1, 2), (4, 8), (0, 3)]
      dict = [{'a': 3, 'b': 4}, {'a': 5, 'b': 2}, {'a': 0, 'b': 0},
          {'a': 9, 'b': 9}]
      dumb = [1, 4, 6, 7, 2, 5, 9, 2, 4, 6]

      print 'list normal:', list
      sort(list, 0)
      print 'sort by [0]:', list
      sort(list, 1)
      print 'sort by [1]:', list


      print "dict normal:", dict
      sort(dict, 'a')
      print "sort by 'a':",  dict
      sort(dict, 'b')
      print "sort by 'b':",  dict


      print 'dumb normal:', dumb
      print 'normal sort:', dumb

Returning the sorted list is cheap (it's just an extra reference, since Python, fortunately, never does any copying of data unless you specifically request it) and offers uniform behavior between in-place and non-in-place cases. Often, we only want to do:

for x in sort(something, inplace=0):

Returning a reference to the sorted data gives us just the right amount of flexibility for this.

2.8.4 See Also

Recipe 2.4 and Recipe 4.10.

    I l@ve RuBoard Previous Section Next Section