Previous Section  < Day Day Up >  Next Section

B.3 Anchors

Anchors align a pattern for more specific matching. A pattern such as ba(na)+ matches banana but also cabana or bananarama. As long as text matching ba(na)+ is somewhere in a string, the pattern matches. An anchor, however, matches a pattern at the beginning or end of a string. The ^ anchor matches the beginning of a string and the $ anchor matches the end of a string. For example, this pattern matches strings that begin with Gre:

^Gre

The pattern matches Green, Grey Lantern, and Grep is my favorite, but not GGreen VVegetables, gre, or InGres.

This pattern matches strings that end with an exclamation point:

!$

It matches "Zip!," "Zoom!," and "Pow! Kablam!," but not "Kerfloofie.," "! is the negation operator," "Pow! Oh.," or "!!!!!!!!?."

You can use both anchors in a single pattern to match an entire string. The pattern ^ba(na)+ matches banana and bananarama but not cabana. Similarly, ba(na)+$ matches banana and cabana but not bananarama. Anchored on both ends, however, ^ba(na)+$ matches only banana (and bananana, banananana, and so on.) This pattern matches various nicknames for the name William:

^(w|W|b|B)illy?$

It matches Will, will, Bill, bill, Willy, willy, Billy, and billy, but not Willa, billo, twill, handbill, or William.

In addition to the ^ and $ anchors, there are anchor metacharacters that deal with word boundaries. The \b anchor matches at a word boundary and \B matches everywhere that isn't a word boundary. A word boundary is between one character that is a letter, digit, or underscore and another character that is none of those.[3] So, in the phrase It's not a_tumor., the word boundaries are before the I, before and after the apostrophe, before and after each space, and before and after the period.

[3] More specifically, a word boundary is between a place where something matches \w and something does not match \w. This includes the beginning of strings that start with word characters and the end of strings that end with word characters. The \w metacharacter is discussed in Section B.4.

The word boundary anchors are useful for matching a string that could occur as part of another word. For example, this pattern matches fish only when it's not part of a compound word:

\b[fF]ish

The pattern matches fish, Go fish!, and Hamilton Fish High School, but not bluefish, sportfishing, or swordfish. However, it also matches sport-fishing, since a word boundary is between - and f.

    Previous Section  < Day Day Up >  Next Section