Previous Section  < Day Day Up >  Next Section

Appendix B. Regular Expression Basics

Behind the innocuous and generic phrase regular expression lives an intricate and powerful world of text pattern matching. With regular expressions, you can make sure that a user really entered a ZIP Code or an email address in a form field, or find all the HTML <a> tags in a page. If your web site relies on data feeds that come in text files, such as sports scores, news articles, or frequently updated headlines, regular expressions can help you make sense of these.

This appendix provides an overview of the most useful and commonly encountered parts of the regular expression menagerie. By learning the special meanings of 5 or 10 symbols and 2 or 3 PHP functions, you can use regular expressions to solve most of the text-processing problems you run into when building a web site with PHP. There are some dark corners and steep ravines of the regular expression landscape that are not covered here, however, such as locale support, lookahead and assertions, and conditional subpatterns. To learn more about regular expressions, see the PCRE section of the PHP Manual, at http://www.php.net/pcre, or read the comprehensive Mastering Regular Expressions by Jeffrey E.F. Friedl (O'Reilly).

To work with regular expressions in PHP, use the functions in the PCRE (Perl-compatible regular expressions) extension.[1] These functions are included with PHP by default and are described in the online manual at http://www.php.net/pcre. Section B.6, later in this appendix, gives an overview of the PCRE functions. If you're already familiar with regular expression basics, read that section to learn the language-specific details of using regular expressions in PHP.

[1] Generally, it's best to avoid the POSIX regular expression functions: ereg( ) and friends. They are not as capable as the PCRE functions.

A regular expression is a string. That string defines a pattern that matches other strings. For example, the regular expression \d{5}(-\d{4})? matches U.S. ZIP or ZIP+4 Codes:


\d

A digit (0-9)


{5}

A total of five of the previous item (a digit)


-

A literal - character


\d

A digit


{4}

A total of four of the previous item (a digit)


( )?

Makes what's inside the parentheses optional

So, the regular expression \d{5}(-\d{4})? matches "five digits, optionally followed by a hyphen and four digits."

Here's another regular expression: </?[bBiI]>. This one matches opening or closing HTML <b> or <i> tags:


<

A literal < character


/

A literal / character


?

Make the previous item (the /) optional


[bBiI]

One of anything inside the square brackets: b, B, i, or I


>

A literal > character

The regular expression </?[bBiI]> means "A less-than sign, followed by an optional forward slash, followed by a b, B, i, or I, followed by a greater-than sign." This matches eight HTML tags: <b>, <B>, </b>, </B>, <i>, <I>, </i>, and </I>.

    Previous Section  < Day Day Up >  Next Section