I l@ve RuBoard Previous Section Next Section

5.18 Implementing a Set Class

Credit: Thomas Heller

5.18.1 Problem

You need to model a set (i.e., a collection of items, with no particular ordering and no duplicates).

5.18.2 Solution

A Python dictionary is a good representation for a set abstract data type, but by wrapping it into a class, we can use it more naturally:

class Set:
    def _ _init_ _(self, *args):
        self._dict = {}
        for arg in args:
            self.add(arg)

    def _ _repr_ _(self):
        import string
        elems = map(repr, self._dict.keys(  ))
        elems.sort(  )
        return "%s(%s)" % (self._ _class_ _._ _name_ _, string.join(elems, ', '))

    def extend(self, args):
        """ Add several items at once. """
        for arg in args:
            self.add(arg)

    def add(self, item):
        """ Add one item to the set. """
        self._dict[item] = item

    def remove(self, item):
        """ Remove an item from the set. """
        del self._dict[item]

    def contains(self, item):
        """ Check whether the set contains a certain item. """
        return self._dict.has_key(item)

    # High-performance membership test for Python 2.0 and later
    _ _contains_ _ = contains

    def _ _getitem_ _(self, index):
        """ Support the 'for item in set:' protocol. """
        return self._dict.keys(  )[index]

    def _ _iter_ _(self):
        """ Better support of 'for item in set:' via Python 2.2 iterators """
        return iter(self._dict.copy(  ))

    def _ _len_ _(self):
        """ Return the number of items in the set """
        return len(self._dict)

    def items(self):
        """ Return a list containing all items in sorted order, if possible """
        result = self._dict.keys(  )
        try: result.sort(  )
        except: pass
        return result

5.18.3 Discussion

Sets are such fundamental constructs that you often find yourself needing one. Python dictionaries model them well, and the Set class in this recipe is basically a thin veneer of nice-looking syntactic sugar over a dictionary.

Of course, an important limitation (both of bare dictionaries and of this Set class) is that the items must be hashable (which typically means immutable). As with other recipes involving dictionaries, to work around this one can sometimes use cPickle.dumps(item) as the dictionary key corresponding to item, but this may be inapplicable or too heavyweight in some cases.

It's not hard to make this Set into a Bag, if that's what you need. A Bag differs from a Set in that each item is in a Bag a certain number of times (i.e., duplications are counted rather than ignored or disallowed).

For Bag-modeling purposes, rather than keeping the item as both key and value, use the item's membership count as the value. Adding an item becomes:

self._dict[item]=1+self.get(item,0)

and so on. You probably need both a .remove method (decrements the count by one) and a .wipeout method (removes the entry totally, no matter what). Similar duplications (and hard choices about which version to use as the basic special-method) loom for other functionality (e.g., what's _ _len_ _, the number of different items or the total?). Python gives you all the bricks, but it's still up to you to put them together to form the right shape.

Another extension worth considering is the possibility of adding set operations. Rather than overloading operators, which might lead to confusion, it's probably best to define union, intersection, and so on as either methods or standalone functions. Both choices are quite defensible. Functions have the advantage of being able to coerce both arguments more naturally. Of course, apart from performance issues, set operations can be implemented in terms of the abstract primitives supplied by the Set class, another consideration that argues for using free-standing functions rather than methods. For example:

def union(s1, s2):
    import copy
    result = copy.copy(s1)
    for item in s2:
        result.add(item)
    return result

This allows highly polymorphic use of such operations and amounts to a decisive advantage for free-standing functions over methods in most cases. It is for similar reasons that many of Python's most fundamental built-ins, such as len, are free-standing functions (which may call a special method if present but still afford polymorphic use on many objects that lack the special method).

5.18.4 See Also

The Library Reference section on sequence types.

    I l@ve RuBoard Previous Section Next Section