I l@ve RuBoard

3.16 Converting Between Different Naming Conventions

Credit: Sami Hangaslammi

3.16.1 Problem

You have a body of code whose identifiers use one of the common naming conventions to represent multiple words in a single identifier (CapitalizedWords, mixedCase, or under_scores), and you need to convert the code to another naming convention in order to merge it smoothly with other code.

3.16.2 Solution

re.sub covers the two hard cases, converting underscore to and from the others:

import re

def cw2us(x): # capwords to underscore notation
    return re.sub(r'(?<=[a-z])[A-Z]|(?<!^)[A-Z](?=[a-z])',
        r"_\g<0>", x).lower(  )

def us2mc(x): # underscore to mixed-case notation
    return re.sub(r'_([a-z])', lambda m: (m.group(1).upper(  )), x)

Mixed-case to underscore is just like capwords to underscore (the case-lowering of the first character becomes redundant, but it does no harm):

def mc2us(x): # mixed-case to underscore notation
    return cw2us(x)

Underscore to capwords can similarly exploit the underscore to mixed-case conversion, but it needs an extra twist to uppercase the start:

def us2cw(x): # underscore to capwords notation
    s = us2mc(x)
    return s[0].upper(  )+s[1:]

Conversion between mixed-case and capwords is, of course, just an issue of lowercasing or uppercasing the first character, as appropriate:

def mc2cw(x): # mixed-case to capwords
    return s[0].lower(  )+s[1:]

def cw2mc(x): # capwords to mixed-case
    return s[0].upper(  )+s[1:]

3.16.3 Discussion

Here are some usage examples:

>>> cw2us("PrintHTML")
'print_html'
>>> cw2us("IOError")
'io_error'
>>> cw2us("SetXYPosition")
'set_xy_position'
>>> cw2us("GetX")
'get_x'

The set of functions in this recipe is useful, and very practical, if you need to homogenize naming styles in a bunch of code, but the approach may be a bit obscure. In the interest of clarity, you might want to adopt a conceptual stance that is general and fruitful. In other words, to convert a bunch of formats into each other, find a neutral format and write conversions from each of the N formats into the neutral one and back again. This means having 2N conversion functions rather than N x (N-1)梐 big win for large N梑ut the point here (in which N is only three) is really one of clarity.

Clearly, the underlying neutral format that each identifier style is encoding is a list of words. Let's say, for definiteness and without loss of generality, that they are lowercase words:

import string, re
def anytolw(x):  # any format of identifier to list of lowercased words

    # First, see if there are underscores:
    lw = string.split(x,'_')
    if len(lw)>1: return map(string.lower, lw)

    # No. Then uppercase letters are the splitters:
    pieces = re.split('([A-Z])', x)

    # Ensure first word follows the same rules as the others:
    if pieces[0]: pieces = [''] + pieces
    else: pieces = pieces[1:]

    # Join two by two, lowercasing the splitters as you go
    return [pieces[i].lower(  )+pieces[i+1] for i in range(0,len(pieces),2)]

There's no need to specify the format, since it's self-describing. Conversely, when translating from our internal form to an output format, we do need to specify the format we want, but on the other hand, the functions are very simple:

def lwtous(x): return '_'.join(x)
def lwtocw(x): return ''.join(map(string.capitalize,x))
def lwtomc(x): return x[0]+''.join(map(string.capitalize,x[1:]))

Any other combination is a simple issue of functional composition:

def anytous(x): return lwtous(anytolw(x))
cwtous = mctous = anytous
def anytocw(x): return lwtocw(anytolw(x))
ustocw = mctocw = anytocw
def anytomc(x): return lwtomc(anytolw(x))
cwtomc = ustomc = anytomc

The specialized approach is slimmer and faster, but this generalized stance may ease understanding as well as offering wider application.

3.16.4 See Also

The Library Reference sections on the re and string modules.

I l@ve RuBoard