I l@ve RuBoard Previous Section Next Section

17.20 Module: Parsing a String into a Date/Time Object Portably

Credit: Brett Cannon

Python's time module supplies the parsing function strptime only on some platforms, and not on Windows. Example 17-2 shows a strptime function that is a pure Python implementation of the time.strptime function that comes with Python. It is similar to how time.strptime is documented in the standard Python documentation. It accepts two more optional arguments, as shown in the following signature:

strptime(string, format="%a %b %d %H:%M:%S %Y", option=AS_IS, locale_setting=ENGLISH)

option's default value of AS_IS gets time information from the string, without any checking or filling-in. You can pass option as CHECK, so that the function makes sure that whatever information it gets is within reasonable ranges (raising an exception otherwise), or FILL_IN (like CHECK, but also tries to fill in any missing information that can be computed). locale_setting accepts a locale tuple (as created by LocaleAssembly) to specify names of days, months, and so on. Currently, ENGLISH and SWEDISH locale tuples are built into this recipe's strptime module.

Although this recipe's strptime cannot be as fast as the version in the standard Python library, that's hardly ever a major consideration for typical strptime use. This recipe does offer two substantial advantages. It runs on any platform supporting Python and gives perfectly identical results on different platforms, while time.strptime exists only on some platforms and tends to have different quirks on each platform that supplies it. The optional checking and filling-in of information that this recipe provides is also quite handy.

The locale-setting support of this version of strptime was inspired by that in Andrew Markebo's own strptime, which you can find at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py. However, this recipe has a more complete implementation of strptime's specification that is based on regular expressions, rather than relying on whitespace and miscellaneous characters to split strings. For example, this recipe can correctly parse strings based on a format such as "%Y%m%d".

Example 17-2. Parsing a string into a date/time object portably
""" A pure-Python version of strptime.

As close as possible to time.strptime's specs in the official Python docs.
Locales supported via LocaleAssembly -- examples supplied for English and
Swedish, follow the examples to add your own locales.

Thanks to Andrew Markebo for his pure Python version of strptime, which
convinced me to improve locale support -- and, of course, to Guido van Rossum
and all other contributors to Python, the best language I've ever used!
"""
import re
from exceptions import Exception
_ _all_ _ = ['strptime', 'AS_IS', 'CHECK', 'FILL_IN',
           'LocaleAssembly', 'ENGLISH', 'SWEDISH']
# metadata module
_ _author_ _ = 'Brett Cannon'
_ _email_ _ = 'drifty@bigfoot.com'
_ _version_ _ = '1.5cb'
_ _url_ _ = 'http://www.drifty.org/'

# global settings and parameter constants
CENTURY = 2000
AS_IS = 'AS_IS'
CHECK = 'CHECK'
FILL_IN = 'FILL_IN'

def LocaleAssembly(DirectiveDict, MonthDict, DayDict, am_pmTuple):
    """ Creates locale tuple for use by strptime.

    Accepts arguments dictionaries DirectiveDict (locale-specific regexes for
    extracting info from time strings), MonthDict (locale-specific full and
    abbreviated month names), DayDict (locale-specific full and abbreviated
    weekday names), and the am_pmTuple tuple (locale-specific valid
    representations of AM and PM, as a two-item tuple). Look at how the
    ENGLISH dictionary is created for an example; make sure your dictionary has values
    corresponding to each entry in the ENGLISH dictionary. You can override
    any value in the BasicDict with an entry in DirectiveDict.
    """
    BasicDict={'%d':r'(?P<d>[0-3]\d)', # Day of the month [01,31]
        '%H':r'(?P<H>[0-2]\d)', # Hour (24-h) [00,23]
        '%I':r'(?P<I>[01]\d)', # Hour (12-h) [01,12]
        '%j':r'(?P<j>[0-3]\d\d)', # Day of the year [001,366]
        '%m':r'(?P<m>[01]\d)', # Month [01,12]
        '%M':r'(?P<M>[0-5]\d)', # Minute [00,59]
        '%S':r'(?P<S>[0-6]\d)', # Second [00,61]
        '%U':r'(?P<U>[0-5]\d)', # Week in the year, Sunday first [00,53]
        '%w':r'(?P<w>[0-6])', # Weekday [0(Sunday),6]
        '%W':r'(?P<W>[0-5]\d)', # Week in the year, Monday first [00,53]
        '%y':r'(?P<y>\d\d)', # Year without century [00,99]
        '%Y':r'(?P<Y>\d\d\d\d)', # Year with century
        '%Z':r'(?P<Z>(\D+ Time)|([\S\D]{3,3}))', # Timezone name or empty
        '%%':r'(?P<percent>%)' # Literal "%" (ignored, in the end)
        }
    BasicDict.update(DirectiveDict)
    return BasicDict, MonthDict, DayDict, am_pmTuple

# helper function to build locales' month and day dictionaries
def _enum_with_abvs(start, *names):
    result = {}
    for i in range(len(names)):
        result[names[i]] = result[names[i][:3]] = i+start
    return result

""" Built-in locales """
ENGLISH_Lang = (
    {'%a':r'(?P<a>[^\s\d]{3,3})', # Abbreviated weekday name
     '%A':r'(?P<A>[^\s\d]{6,9})', # Full weekday name
     '%b':r'(?P<b>[^\s\d]{3,3})', # Abbreviated month name
     '%B':r'(?P<B>[^\s\d]{3,9})', # Full month name
      # Appropriate date and time representation.
     '%c':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d) '
          r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)',
     '%p':r'(?P<p>(a|A|p|P)(m|M))', # Equivalent of either AM or PM
      # Appropriate date representation
     '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)',
      # Appropriate time representation
     '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'},
    _enum_with_abvs(1, 'January', 'February', 'March', 'April', 'May', 'June',
        'July', 'August', 'September', 'October', 'November', 'December'),
    _enum_with_abvs(0, 'Monday', 'Tuesday', 'Wednesday', 'Thursday',
        'Friday', 'Saturday', 'Sunday'),
    (('am','AM'),('pm','PM'))
    )
ENGLISH = LocaleAssembly(*ENGLISH_Lang)

SWEDISH_Lang = (
    {'%a':r'(?P<a>[^\s\d]{3,3})',
     '%A':r'(?P<A>[^\s\d]{6,7})',
     '%b':r'(?P<b>[^\s\d]{3,3})',
     '%B':r'(?P<B>[^\s\d]{3,8})',
     '%c':r'(?P<a>[^\s\d]{3,3}) (?P<d>[0-3]\d) '
          r'(?P<b>[^\s\d]{3,3}) (?P<Y>\d\d\d\d) '
          r'(?P<H>[0-2]\d):(?P<M>[0-5]\d):(?P<S>[0-6]\d)',
     '%p':r'(?P<p>(a|A|p|P)(m|M))',
     '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)',
     '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'},
    _enum_with_abvs(1, 'Januari', 'Februari', 'Mars', 'April', 'Maj', 'Juni',
        'Juli', 'Augusti', 'September', 'Oktober', 'November', 'December'),
    _enum_with_abvs(0, 'Måndag', 'Tisdag', 'Onsdag', 'Torsdag',
        'Fredag', 'Lördag', 'Söndag'),
    (('am','AM'),('pm','PM'))
    )
SWEDISH = LocaleAssembly(*SWEDISH_Lang)


class StrptimeError(Exception):
    """ Exception class for the module """
    def _ _init_ _(self, args=None): self.args = args

def _g2j(y, m, d):
    """ Gregorian-to-Julian utility function, used by _StrpObj """
    a = (14-m)/12
    y = y+4800-a
    m = m+12*a-3
    return d+((153*m+2)/5)+365*y+y/4-y/100+y/400-32045

class _StrpObj:
    """ An object with basic time-manipulation methods """
    def _ _init_ _(self, year=None, month=None, day=None, hour=None, minute=None, 
        second=None, day_week=None, julian_date=None, daylight=None):
        """ Sets up instances variables. All values can be set at
        initialization. Any info left out is automatically set to None. """
        def _set_vars(_adict, **kwds): _adict.update(kwds)
        _set_vars(self._ _dict_ _, **vars(  ))

    def julianFirst(self):
        """ Calculates the Julian date for the first day of year self.year """
        return _g2j(self.year, 1, 1)

    def gregToJulian(self):
    """ Converts the Gregorian date to day within year (Jan 1 == 1) """
        julian_day = _g2j(self.year, self.month, self.day)
        return julian_day-self.julianFirst(  )+1

    def julianToGreg(self):
        """ Converts the Julian date to the Gregorian date """
        julian_day = self.julian_date+self.julianFirst(  )-1
        a = julian_day+32044
        b = (4*a+3)/146097
        c = a-((146097*b)/4)
        d = (4*c+3)/1461
        e = c-((1461*d)/4)
        m = (5*e+2)/153
        day = e-((153*m+2)/5)+1
        month = m+3-12*(m/10)
        year = 100*b+d-4800+(m/10)
        return year, month, day

    def dayWeek(self):
        """ Figures out the day of the week using self.year, self.month, and
        self.day. Monday is 0. """
        a = (14-self.month)/12
        y = self.year-a
        m = self.month+12*a-2
        day_week = (self.day+y+(y/4)-(y/100)+(y/400)+((31*m)/12))%7
        if day_week==0: day_week = 6
        else: day_week = day_week-1
        return day_week

    def FillInInfo(self):
        """ Based on the current time information, it figures out what other
        info can be filled in. """
        if self.julian_date is None and self.year and self.month and self.day:
            julian_date = self.gregToJulian(  )
            self.julian_date = julian_date
        if (self.month is None or self.day is None
                ) and self.year and self.julian_date:
            gregorian = self.julianToGreg(  )
            self.month = gregorian[1] # year ignored, must already be okay
            self.day = gregorian[2]
        if self.day_week is None and self.year and self.month and self.day:
            self.dayWeek(  )

    def CheckIntegrity(self):
        """ Checks info integrity based on the range that a number can be.
        Any invalid info raises StrptimeError. """
        def _check(value, low, high, name):
            if value is not None and not low<value<high:
                raise StrptimeError, "%s incorrect"%name
        _check(self.month, 1, 12, 'Month')
        _check(self.day, 1, 31, 'Day')
        _check(self.hour, 0, 23, 'Hour')
        _check(self.minute, 0, 59, 'Minute')
        _check(self.second, 0, 61, 'Second')  # 61 covers leap seconds
        _check(self.day_week, 0, 6, 'Day of the Week')
        _check(self.julian_date, 0, 366, 'Julian Date')
        _check(self.daylight, -1, 1, 'Daylight Savings')

    def return_time(self):
        """ Returns a tuple of numbers in the format used by time.gmtime(  ).
        All instances of None in the information are replaced with 0. """
        temp_time = (self.year, self.month, self.day, self.hour, self.minute, 
            self.second, self.day_week, self.julian_date, self.daylight)
        return tuple([t or 0 for t in temp_time])

    def RECreation(self, format, DIRECTIVEDict):
        """ Creates re based on format string and DIRECTIVEDict """
        Directive = 0
        REString = []
        for char in format:
            if char=='%' and not Directive:
                Directive = 1
            elif Directive:
                try: REString.append(DIRECTIVEDict['%'+char])
                except KeyError: raise StrptimeError,"Invalid format %s"%char
                Directive = 0
            else:
                REString.append(char)
        return re.compile(''.join(REString), re.IGNORECASE)

    def convert(self, string, format, locale_setting):
        """ Gets time info from string based on format string and a locale
        created by LocaleAssembly(  ) """
        DIRECTIVEDict, MONTHDict, DAYDict, AM_PM = locale_setting
        REComp = self.RECreation(format, DIRECTIVEDict)
        reobj = REComp.match(string)
        if reobj is None: raise StrptimeError,"Invalid string (%s)"%string
        for found in reobj.groupdict().keys(  ):
            if found in 'y','Y': # year
                if found=='y': # without century
                    self.year = CENTURY+int(reobj.group('y'))
                else: # with century
                    self.year = int(reobj.group('Y'))
            elif found in 'b','B','m': # month
                if found=='m': # month number
                    self.month = int(reobj.group(found))
                else: # month name
                    try:
                        self.month = MONTHDict[reobj.group(found)]
                    except KeyError:
                        raise StrptimeError, 'Unrecognized month'
            elif found=='d': # day of the month
                self.day = int(reobj.group(found))
            elif found in 'H','I': # hour
                hour = int(reobj.group(found))
                if found=='H': # hour number
                    self.hour = hour
                else: # AM/PM format
                    try:
                        if reobj.group('p') in AM_PM[0]: AP = 0
                        else: AP = 1
                    except KeyError:
                        raise StrptimeError, 'Lacking needed AM/PM information'
                    if AP:
                        if hour==12: self.hour = 12
                        else: self.hour = 12+hour
                    else:
                        if hour==12: self.hour = 0
                        else: self.hour = hour
            elif found=='M': # minute
                self.minute = int(reobj.group(found))
            elif found=='S': # second
                self.second = int(reobj.group(found))
            elif found in 'a','A','w': # Day of the week
                if found=='w': # DOW number
                    day_value = int(reobj.group(found))
                    if day_value==0: self.day_week = 6
                    else: self.day_week = day_value-1
                else: # DOW name
                    try:
                        self.day_week = DAYDict[reobj.group(found)]
                    except KeyError:
                        raise StrptimeError, 'Unrecognized day'
            elif found=='j': # Julian date
                self.julian_date = int(reobj.group(found))
            elif found=='Z': # daylight savings
                TZ = reobj.group(found)
                if len(TZ)==3:
                    if TZ[1] in ('D','d'): self.daylight = 1
                    else: self.daylight = 0
                elif TZ.find('Daylight')!=-1: self.daylight = 1
                else: self.daylight = 0

def strptime(string, format='%a %b %d %H:%M:%S %Y',
        option=AS_IS, locale_setting=ENGLISH):
    """ Returns a tuple representing the time represented in 'string'.
    Valid values for 'options' are AS_IS, CHECK, and FILL_IN. 'locale_setting'
    accepts locale tuples created by LocaleAssembly(  ). """
    Obj = _StrpObj(  )
    Obj.convert(string, format, locale_setting)
    if option in FILL_IN,CHECK:
        Obj.CheckIntegrity(  )
    if option == FILL_IN:
        Obj.FillInInfo(  )
    return Obj.return_time(  )

17.20.1 See Also

The most up-to-date version of strptime is always available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime, where you will also find a test suite using PyUnit; Andrew Makebo's version of strptime is at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py.

    I l@ve RuBoard Previous Section Next Section