I l@ve RuBoard

3.24 Module: Roman Numerals

Credit: Paul M. Winkler

There are many algorithms for creating Roman numerals. Example 3-2 presents the easiest-to-read algorithm that I've been able to find for this purpose: it establishes a mapping between integer values and Roman numerals, then counts how many of each value can fit into the input integer. The code uses two tuples for the mapping, instead of a dictionary, because it needs to go through them sequentially and doesn't care about random access. Thus, a dictionary would be more hindrance than help.

Example 3-2. Roman numerals

def int_to_roman(input):
    """ Convert an integer to a Roman numeral. """

    if not isinstance(input, type(1)):
        raise TypeError, "expected integer, got %s" % type(input)
    if not 0 < input < 4000:
        raise ValueError, "Argument must be between 1 and 3999"
    ints = (1000, 900,  500, 400, 100,  90, 50,  40, 10,  9,   5,  4,   1)
    nums = ('M',  'CM', 'D', 'CD','C', 'XC','L','XL','X','IX','V','IV','I')
    result = []
    for i in range(len(ints)):
        count = int(input / ints[i])
        result.append(nums[i] * count)
        input -= ints[i] * count
    return ''.join(result)

def roman_to_int(input):
    """ Convert a Roman numeral to an integer. """

    """
    if not isinstance(input, type("")):
        raise TypeError, "expected string, got %s" % type(input)
    input = input.upper(  )
    nums = {'M':1000, 'D':500, 'C':100, 'L':50, 'X':10, 'V':5, 'I':1}
    sum = 0
    for i in range(len(input)):
        try:
            value = nums[input[i]]
            # If the next place holds a larger number, this value is negative
            if i+1 < len(input) and nums[input[i+1]] > value:
                sum -= value
            else: sum += value
        except KeyError:
            raise ValueError, 'input is not a valid Roman numeral: %s' % input
    # easiest test for validity...
    if int_to_roman(sum) == input:
        return sum
    else:
        raise ValueError, 'input is not a valid Roman numeral: %s' % input

Here are some usage examples of converting integers to Roman numerals and vice versa:

>>> print int_to_roman(2002)
MMII
>>> print int_to_roman(1999)
MCMXCIX
>>> roman_to_int('XLII')
42
>>> roman_to_int('VVVIV')
Traceback (most recent call last):
 ...
ValueError: input is not a valid Roman numeral: VVVIV

The rules for Roman numerals are as follows:

I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000.
Zero is not represented.
Numbers greater than 3,999 are not represented.
Roman numerals are repeated to add value: III is equivalent to 1 +1 +1 = 3.
Only powers of 10 may be repeated in this way. Thus, VV is invalid; 5 + 5 would instead be expressed as X.
No more than three repetitions of a numeral can be used. Five repetitions can be represented with a single, larger numeral; to represent four, use the next larger numeral, but precede it with a numeral to subtract from it. Thus, IIII is invalid and would instead be written as IV (one less than five). Likewise, XC represents 90 (10 less than 100), and XL represents 40 (10 less than 50).
A numeral used for subtraction in this way must be the largest power of 10 that is less than the numeral it precedes. Thus, XC is valid but IC is invalid.

In my first attempt at int_to_roman, my approach was simply to follow, as closely as I could, the plain English description of these rules. I rejected that version, because it ended up being longer and more complicated than the version given here. It's actually easier to forcibly assign values to IV and its friends than it is to implement the rule that determines the values.

A different approach to a Roman-numeral-to-integer converter can be found in Mark Pilgrim's Dive Into Python (http://diveintopython.org/roman_divein.html), an online book containing lots of useful information, all free for use under the Python license. Mark relies on a regular expression to validate the input. This is a fine idea that makes his function nice and short, but it puts a lot of the logic in the regular expression, which may be easier to misunderstand than the slightly longer function in this recipe.

Here is another approach, based on Mark's, but with an additional field in each tuple to enforce the maximum number of repetitions allowed for a numeral. It relies on the ordering of the tuple to enforce the correct ordering of numerals, so it doesn't need a regular expression (or any double-checking in the end through int_to_roman, as in Example 3-2):

def roman_to_int(input):
    try: input = input.upper(  )
    except AttributeError:
        raise TypeError, 'expected string, got %s' % type(input)
    # map of (numeral, value, maxcount) tuples
    roman_numeral_map = (('M', 1000, 3), ('CM', 900, 1),
        ('D', 500, 1), ('CD', 400, 1),
        ('C', 100, 3), ('XC', 90, 1),
        ('L', 50, 1), ('XL', 40, 1),
        ('X', 10, 3), ('IX', 9, 1),
        ('V', 5, 1), ('IV', 4, 1), ('I', 1, 3))
    result, index = 0, 0
    for numeral, value, maxcount in roman_numeral_map:
        count = 0
        while input[index: index+len(numeral)] == numeral:
            count += 1 # how many of this numeral we have
            if count > maxcount:
                raise ValueError, \
                    'input is not a valid roman numeral: %s' % input
            result += value
            index += len(numeral)
    if index < len(input): # There are characters unaccounted for
        raise ValueError, 'input is not a valid roman numeral: %s'%input
    return result

However, this version is not quite rigid enough in diagnosing malformed Roman numerals. For example, this version accepts XCXL, translating it into 130, while the version in Example 3-2 properly rejects it. The canonical way to represent 130 as a Roman numeral is CXXX, but it's not easy to capture the fact that XCXL is invalid (indeed, although it should be forbidden, none of the rules appears to forbid it). The version in Example 3-2, by checking that the string it has just parsed is indeed the canonical (and thus, the only allowed) representation for the resulting integer, gains a substantial measure of solidity in rejecting plausible but noncanonical strings.

This leads to a general idea that you should keep in mind whenever you are coding bidirectional transformation functions between two formats, where the functions are inverses of each other. When one of the directions has a more clearly specified transformation algorithm, you can verify the function that implements the more loosely specified transformation by checking that the other function does indeed result in the original input value when applied to the candidate result. If only the canonical form is to be accepted, this pattern lets you easily reject plausible but noncanonical inputs that it might otherwise be difficult to detect.

3.24.1 See Also

Mark Pilgrim's Dive Into Python (http://diveintopython.org).

I l@ve RuBoard