Previous Page
Next Page

12.5. Matching Anything

Always use the /s flag.

At this point, you might be starting to detect a pattern. Once again, the problem is that the dot metacharacter (.) doesn't mean what most people think it means. Most peopleeven those who actually know betterhabitually think of it as meaning: "match any character".

It's easy to forget that it doesn't really mean that, and accidentally write something like:

    
    # Capture the source of a Perl program...

    $text =~ m{\A                # From start of string...
               (.*?)             # ...match and capture any characters
               ^_  _END_  _$     # ...until the first _  _END_  _ line
              }xm;

But the dot metacharacter doesn't match newlines, so the only strings this regex will match are those that start with '_ _END_ _'. That's because the ^ (start-of-line) metacharacter can match only at the start of the string or after a newline. But the preceding dot metacharacter can never match a newline, so the only way the ^ can match is if the preceding dot matches a start-of-string. But the dot metacharacter never matches start-of-string, because dot always matches exactly one character and start-of-string isn't a character.

In other words, as with ^ and $, the default behaviour of the dot metacharacter fails to be unreasonable (i.e., to be what most people expect). Fortunately, however, dot can be made to conform to the typical programmer's unreasonable expectations, simply by adding the /s flag. Under /s mode, a dot really does match every character, including newline:


    

    # Capture the source of a Perl program...
$text =~ m{\A
# From start of string...
(.*?)
# ...match and capture any characters (including newlines!)
^_ _END_ _$
# ...until the first _  _END_  _ line
}xms;

Of course, the question then becomes: if you always use /s, how do you get the normal "any char but newline" meaning of dot when you actually need it? As with many of these guidelines, you do it by saying explicitly what you mean. If you need to match any character that isn't a newline, that's just the complemented character class [^\n]:


    

    # Delete comments....
$source_code =~ s{
# Substitute...
\#
# ...a literal octothorpe
[^\n]*
# ...followed by any number of non-newlines
} {$SPACE}gxms;
# Replacing it with a single space

    Previous Page
    Next Page