I l@ve RuBoard

### 3.16 Converting Between Different Naming Conventions

Credit: Sami Hangaslammi

#### 3.16.1 Problem

You have a body of code whose identifiers use one of the common naming conventions to represent multiple words in a single identifier (CapitalizedWords, mixedCase, or under_scores), and you need to convert the code to another naming convention in order to merge it smoothly with other code.

#### 3.16.2 Solution

re.sub covers the two hard cases, converting underscore to and from the others:

```import re

def cw2us(x): # capwords to underscore notation
return re.sub(r'(?<=[a-z])[A-Z]|(?<!^)[A-Z](?=[a-z])',
r"_\g<0>", x).lower(  )

def us2mc(x): # underscore to mixed-case notation
return re.sub(r'_([a-z])', lambda m: (m.group(1).upper(  )), x)```

Mixed-case to underscore is just like capwords to underscore (the case-lowering of the first character becomes redundant, but it does no harm):

```def mc2us(x): # mixed-case to underscore notation
return cw2us(x)```

Underscore to capwords can similarly exploit the underscore to mixed-case conversion, but it needs an extra twist to uppercase the start:

```def us2cw(x): # underscore to capwords notation
s = us2mc(x)
return s[0].upper(  )+s[1:]```

Conversion between mixed-case and capwords is, of course, just an issue of lowercasing or uppercasing the first character, as appropriate:

```def mc2cw(x): # mixed-case to capwords
return s[0].lower(  )+s[1:]

def cw2mc(x): # capwords to mixed-case
return s[0].upper(  )+s[1:]```

#### 3.16.3 Discussion

Here are some usage examples:

```>>> cw2us("PrintHTML")
'print_html'
>>> cw2us("IOError")
'io_error'
>>> cw2us("SetXYPosition")
'set_xy_position'
>>> cw2us("GetX")
'get_x'```

The set of functions in this recipe is useful, and very practical, if you need to homogenize naming styles in a bunch of code, but the approach may be a bit obscure. In the interest of clarity, you might want to adopt a conceptual stance that is general and fruitful. In other words, to convert a bunch of formats into each other, find a neutral format and write conversions from each of the N formats into the neutral one and back again. This means having 2N conversion functions rather than N x (N-1)梐 big win for large N梑ut the point here (in which N is only three) is really one of clarity.

Clearly, the underlying neutral format that each identifier style is encoding is a list of words. Let's say, for definiteness and without loss of generality, that they are lowercase words:

```import string, re
def anytolw(x):  # any format of identifier to list of lowercased words

# First, see if there are underscores:
lw = string.split(x,'_')
if len(lw)>1: return map(string.lower, lw)

# No. Then uppercase letters are the splitters:
pieces = re.split('([A-Z])', x)

# Ensure first word follows the same rules as the others:
if pieces[0]: pieces = [''] + pieces
else: pieces = pieces[1:]

# Join two by two, lowercasing the splitters as you go
return [pieces[i].lower(  )+pieces[i+1] for i in range(0,len(pieces),2)]```

There's no need to specify the format, since it's self-describing. Conversely, when translating from our internal form to an output format, we do need to specify the format we want, but on the other hand, the functions are very simple:

```def lwtous(x): return '_'.join(x)
def lwtocw(x): return ''.join(map(string.capitalize,x))
def lwtomc(x): return x[0]+''.join(map(string.capitalize,x[1:]))```

Any other combination is a simple issue of functional composition:

```def anytous(x): return lwtous(anytolw(x))
cwtous = mctous = anytous
def anytocw(x): return lwtocw(anytolw(x))
ustocw = mctocw = anytocw
def anytomc(x): return lwtomc(anytolw(x))
cwtomc = ustomc = anytomc```

The specialized approach is slimmer and faster, but this generalized stance may ease understanding as well as offering wider application.