I l@ve RuBoard Previous Section Next Section

Solution

graphics/bulb_icon.gif

The answers to the three questions are as follows.

  1. What does "case-insensitive" mean?

    What "case-insensitive" actually means depends entirely on your application and language. For example, many languages do not have cases at all. For those that do, you still have to decide whether you want accented characters to compare equal to unaccented characters, and so on. This Item provides guidance on how to implement case-insensitivity for standard strings in whatever sense applies to your situation.

  2. Write a ci_string class that is identical to the standard std::string class but that is case-insensitive in the same way as the commonly provided extension stricmp().

    The "how can I make a case-insensitive string?" question is so common that it probably deserves its own FAQ梙ence this Item.

    Here's what we want to achieve:

    
    
    ci_string s( "AbCdE" ); 
    
    
    // case insensitive
    
    
    //
    
    
    assert( s == "abcde" );
    
    
    assert( s == "ABCDE" );
    
    
    // still case-preserving, of course
    
    
    //
    
    
    assert( strcmp( s.c_str(), "AbCdE" ) == 0 );
    
    
    assert( strcmp( s.c_str(), "abcde" ) != 0 );
    
    
    

    The key here is to understand what a string actually is in Standard C++. If you look in your trusty string header, you'll see something like this:

    
    
    typedef basic_string<char> string; 
    
    
    

    So string isn't really a class; it's a typedef of a template. In turn, the basic_string<> template is declared as follows, possibly with additional implementation-specific template parameters:

    
    
    template<class charT, 
    
    
             class traits = char_traits<charT>,
    
    
             class Allocator = allocator<charT> >
    
    
    class basic_string;
    
    
    

    So "string" really means "basic_string<char, char_traits<char>, allocator<char> >," possibly with additional defaulted template parameters specific to the implementation you're using. We don't need to worry about the allocator part, but the key here is the char_traits part, because char_traits defines how characters interact梐nd compare!

    So let's compare strings. basic_string supplies useful comparison functions that let you compare whether one string is equal to another, less than another, and so on. These string comparison functions are built on top of character comparison functions supplied in the char_traits template. In particular, the char_traits template supplies character comparison functions named eq() and lt() for equality and less-than comparisons, and compare() and find() functions to compare and search sequences of characters.

    If we want these to behave differently, all we have to do is provide a different char_traits template. Here's the easiest way:

    
    
    struct ci_char_traits : public char_traits<char> 
    
    
                  // just inherit all the other functions
    
    
                  //  that we don't need to replace
    
    
    {
    
    
      static bool eq( char c1, char c2 )
    
    
        { return toupper(c1) == toupper(c2); }
    
    
      static bool lt( char c1, char c2 )
    
    
        { return toupper(c1) <  toupper(c2); }
    
    
      static int compare( const char* s1,
    
    
                          const char* s2,
    
    
                          size_t n )
    
    
        { return memicmp( s1, s2, n ); }
    
    
               // if available on your platform,
    
    
               //  otherwise you can roll your own
    
    
      static const char*
    
    
      find( const char* s, int n, char a )
    
    
      {
    
    
        while( n-- > 0 && toupper(*s) != toupper(a) )
    
    
        {
    
    
            ++s;
    
    
        }
    
    
        return n >= 0 ? s : 0;
    
    
      }
    
    
    };
    
    
    

    And finally, the key that brings it all together:

    
    
    typedef basic_string<char, ci_char_traits> ci_string; 
    
    
    

    All we've done is create a typedef named ci_string that operates exactly like the standard string (after all, in most respects it is the standard string), except that it uses ci_char_traits instead of char_traits<char> to get its character comparison rules. Since we've handily made the ci_char_traits rules case-insensitive, we've made ci_string itself case-insensitive, without any further surgery梩hat is, we have a case-insensitive string without having touched basic_string at all. Now that's extensibility.

  3. Is making the case sensitivity a property of the object a good idea?

    It's often more useful to have case sensitivity be a property of the comparison function instead of a property of the object as shown here. For example, consider the following code:

    
    
    string    a = "aaa"; 
    
    
    ci_string b = "aAa";
    
    
    if( a == b ) /* ... */
    
    
    

Given a suitable operator==(), should the expression "a == b" evaluate to true, or to false? We could easily take the view that if either side were case-insensitive, the comparison should be case-insensitive. But what if we change the example just a little and introduce yet another instantiation of basic_string that does comparisons a third way:



typedef basic_string<char, yz_char_traits> yz_string; 





ci_string b = "aAa";


yz_string c = "AAa";


if( b == c ) /* ... */


Now, consider the question again: Should the expression "a == b" evaluate to true or to false? In this case, I think you'll agree that it's less obvious that we should arbitrarily prefer one object's ordering over another's.

Instead, consider how much clearer the examples would be when written as follows:



string a = "aaa"; 


string b = "aAa";


if( stricmp( a.c_str(), b.c_str() ) == 0 ) /* ... */


string c = "AAa";


if( EqualUsingYZComparison( b, c ) ) /* ... */


In many cases, it's more useful to have the case sensitivity be a characteristic of the comparison operation. But I've encountered cases in practice in which making it a characteristic of the object (especially when most or all comparisons were with C-style char* strings) is much more useful because you can simply compare values "naturally" (i.e., "if ( a == "text" ) ...") without having to remember to use the case-insensitive comparison every time.

This Item should give you a flavor for how the basic_string template works and how flexible it is in practice. If you want different comparisons than the ones memicmp() and toupper() provide, just replace the five functions shown here with your own code that performs character comparisons that are appropriate to your application.

    I l@ve RuBoard Previous Section Next Section