[ Team LiB ] Previous Section Next Section

6.3 Regular Expression Support

Regular expressions have been used in various programming languages and tools for many years. The FCL includes a set of classes for utilizing the power of regular expressions that are designed to be compatible with Perl 5 regular expressions. In addition, the regular expression classes implement some additional functionality, such as named capture groups, right-to-left pattern matching, and expression compilation.

Regular expression knowledge seems to be one of those topics that most programmers have learned and then forgotten more than once. For the purposes of this chapter, we presume some previous use and experience with regular expressions within Perl 5. The .NET regexp classes are a superset of Perl 5 functionality, so this will serve as a good conceptual starting point.

For readers new to regular expressions, we suggest starting with some basic Perl 5 introductions. The perl.com site has some great resource materials and introductory tutorials. In addition, the definitive work on regular expressions is Mastering Regular Expressions, by Jeffrey E. F. Friedl (O'Reilly & Associates). For those who want to get the most out of working with regular expressions, this book is highly recommended.

6.3.1 The Regex Class

The Regex class is the heart of the BCL regular expression support. Used both as an object instance and as a static type, the Regex class represents an immutable, compiled instance of a regular expression that can be applied to a string via a matching process.

Internally, the regular expression is stored as either a sequence of internal regular expression bytecodes that are interpreted at match time or as compiled MSIL opcodes that are JIT-compiled by the CLR at runtime. This allows you to make a trade-off between a worsened regular expression startup time and memory utilization versus higher raw match performance at runtime.

The Regex class contains several static methods:

Method

Purpose

Escape( )

Ignores regex metacharacters within a string

IsMatch( )

Methods that return a bool result if the supplied regular expression matches within the string

Match( )

Methods that return a Match instance

Matches( )

Methods that return a list of Match instances as a collection

Replace( )

Methods that replace the matched regular expressions with replacement strings

Split( )

Methods that return an array of strings determined by the expression

Unescape( )

Unescapes any escaped characters within a string

For more information on the regular-expression options, supported character escapes, substitution patterns, character sets, positioning assertions, quantifiers, grouping constructs, backreferences, and alternation, see Appendix A.

6.3.2 The Match and MatchCollection Classes

The Match class represents the result of applying a regular expression to a string, looking for the first successful match. The MatchCollection class contains a collection of Match instances that represent the result of applying a regular expression to a string recursively until the first unsuccessful match occurs.

6.3.3 The Group Class

The Group class represents the results from a single grouping expression. From this class, it is possible to drill down to the individual subexpression matches with the Captures property.

6.3.4 The Capture and Capture Collection Classes

The CaptureCollection class contains a collection of Capture instances, each representing the results of a single subexpression match.

    [ Team LiB ] Previous Section Next Section