Previous Page
Next Page

12.16. Capture Variables

Always give captured substrings proper names.

$1, $2, etc. are dreadful names for variables. Like the parameter variables $_[0], $_[1], etc. (see "Named Arguments" in Chapter 9), they convey absolutely nothing about the values they store, except the order in which they occurred. They produce unreadable code like this:

    CONFIG_LINE:
    while (my $config = <>) {
        # Ignore lines that are unrecognisable...
        next CONFIG_LINE
            if $config !~ m/ \A  (\S+)  \s* = \s*  ([^;]+) ;  \s* \# (.*)/xms;

        # Verify the option makes sense...
        debug($3);
        croak "Unknown option ($1)"
            if not exists $option{$2};

        # Record the configuration option...
        $option{$2} = $1;
    }

As the capture variables don't have meaningful names, it's much harder to work out what this code is actually doing, and to verify that it's correct. (It's not.)

Because numbered variables suffer from the same drawbacks as numbered arguments, it's not surprising that the solution is the same, too: simply unpack $1, $2, etc. into sensibly named variables immediately after a successful match. Doing that makes the purposeand the errorsmuch more obvious:


    CONFIG_LINE:
    while (my $config = <>) {
        
# Ignore lines that are unrecognisable...
next CONFIG_LINE if $config !~ m/ \A (\S+) \s* = \s* ([^;]+) ; \s* \# (.*)/xms;
# Name captured components...
my ($opt_name, $opt_val, $comment) = ($1, $2, $3);
# Verify the option makes sense...
debug($comment); croak "Unknown option ($opt_name)" if not exists $option{$opt_val};
# Oops: value used as key

        # Record the configuration option...
$option{$opt_val} = $opt_name;
# Oops*2: value as key; name as value
}

That, in turn, makes the code far easier to correct:


    CONFIG_LINE:
    while (my $config = <>) {
        
# Ignore lines that are unrecognisable...
next CONFIG_LINE if $config !~ m/ \A (\S+) \s* = \s* ([^;]+) ; \s* \# (.*)/xms;
# Name captured components...
my ($opt_name, $opt_val, $comment) = ($1, $2, $3);
# Verify that the option makes sense...
debug($comment); croak "Unknown option ($opt_name)" if not exists $option{$opt_name};
# Name used as key

        # Record the configuration option...
$option{$opt_name} = $opt_val;
# Names as key; value as value
}

Naming the captures improves maintainability in another way too. If it later became necessary to capture some other piece of the match, some of the numbered variables might change number. For example, suppose you needed to support appending to an option as well as assigning. Then you'd need to capture the operator as well. The original code would have become:

    CONFIG_LINE:
    while (my $config = <>) {
        # Ignore lines that are unrecognisable...
        next CONFIG_LINE
            if $config !~ m/\A (\S+) \s* (=|[+]=) \s* ([^;]+) ; \s* \# (.*)/xms;

        # Verify that the option makes sense...
        debug($4);
        croak "Unknown option ($1)"
            if not exists $option{$1};

        # Replace or append value depending on specified operator...
        if ($2 eq '=') {
            $option{$1} = $3;
        }
        else {
            $option{$1}.= $3;
        }
    }

The Variable Formerly Known As $2 is now $3, and the old $3 is now $4. The odds of correctly managing that code change diminish rapidly as the size of the if blockor the number of capturesincreases. But, if the captures are unpacked into named variables, then none of the previous names needs to change when a new capture is added:


    CONFIG_LINE:
    while (my $config = <>) {

        

        # Ignore lines that are unrecognisable...
next CONFIG_LINE if $config !~ m/\A (\S+) \s* (=|[+]=) \s* ([^;]+) ; \s* \# (.*)/xms;
# Unpack the components of the config line...
my ($opt_name, $operator, $opt_val, $comment) = ($1, $2, $3, $4);
# Verify that the option makes sense...
debug($comment); croak "Unknown option ($opt_name)" if not exists $option{$opt_name};
# Replace or append value depending on specified operator...
if ($operator eq '=') { $option{$opt_name} = $opt_val; } else { $option{$opt_name}.= $opt_val; } }

Better still, Perl provides a way to assign captured substrings directly to named variables, without ever mentioning the numbered variables explicitly. If a regex match is performed in a list context, the list it returns is the list of captures that it made. That is, a match in a list context returns the list ($1, $2, $3,etc. ). Those captures can then be unpacked directly, like so:


    CONFIG_LINE:
    while (my $config = <>) {
        
# Match config line in list context, capturing components into named vars...
my ($opt_name, $operator, $opt_val, $comment) = $config =~ m/\A (\S+) \s* (=|[+]=) \s* ([^;]+) ; \s* \# (.*)/xms;
# Process line only if it was recognizable...
next CONFIG_LINE if !defined $opt_name;
# Verify that the option makes sense...
debug($comment); croak "Unknown option ($opt_name)" if not exists $option{$opt_name};
# Replace or append value depending on specified operator...
if ($operator eq '=') { $option{$opt_name} = $opt_val; } else { $option{$opt_name}.= $opt_val; } }

Capturing directly to named variables in this way avoids the possibility of introducing subtle unpacking mistakes such as:

    
    # Ignore lines that are unrecognisable...
    next CONFIG_LINE
        if $config !~ m/ \A  (\S+)  \s* (=|[+]=) \s*  ([^;]+) ;  \s* \# (.*)/xms;

    # Unpack the components of the config line...
    my ($opt_name, $operator, $opt_val, $comment) = ($1, $2, $3);    # Missing $4!

because a match in a list context always returns all of its captures, not just the ones you remembered to specify explicitly.

List-context captures are the least error-prone way of extracting information from pattern matches and, hence, strongly recommended. Note, however, that list-context captures aren't appropriate for regexes that use the /gc modifier (see the following guideline, "Piecewise Matching").

    Previous Page
    Next Page