12.2. Line Boundaries
In addition to always using the /x flag, always use the /m flag. In every regular expression you ever write. The normal behaviour of the ^ and $ metacharacters is unintuitive to most programmers, especially if they're coming from a Unix background. Almost all of the Unix utilities that feature regular expressions (e.g., sed, grep, awk) are intrinsically line-oriented. So in those utilities, ^ and $ naturally mean "match at the start of any line" and "match at the end of any line", respectively. But they don't mean that in Perl. In Perl, ^ and $ mean "match at the start of the entire string" and "match at the end of the entire string". That's a crucial difference, and one that leads to a very common type of mistake:
# Find the end of a Perl program...
$text =~ m{ [^\0]*? # match the minimal number of non-null chars
^_ _END_ _$ # until a line containing only an end-marker
}x;In fact, what that code really does is:
$text =~ m{ [^\0]*? # match the minimal number of non-null chars
^ # until the start of the string
_ _END_ _ # then match the end-marker
$ # then match the end of the string
}x;The minimal number of characters until the start of the string is, of course, zero[*]. Then the regex has to match '_ _END_ _'. And then it has to be at the end of the string. So the only strings that this pattern matches are those that consist of '_ _END_ _'. That is clearly not what was intended.
The /m mode makes ^ and $ work "naturally"[
The previous example could be fixed by making those two metacharacters actually mean what the original developer thought they meant, simply by adding a /m:
Which now really means:
$text =~ m{ [^\0]*? Consistently using the /m on every regex makes Perl's behaviour consistently conform to your unreasonable expectations. So you don't have to unreasonably change your expectations to conform to Perl's behaviour[*].
|