Previous Page
Next Page

5.8. Dollar-Underscore

Beware of any modification via $_.

One particularly easy way to introduce subtle bugs is to forget that $_ is often an alias for some other variable. Any assignment to $_ or any other form of transformation on it, such as a substitution or transliteration, is probably changing some other variable. So any change applied to $_ needs to be scrutinized particularly carefully.

This problem can be especially insidious when $_ isn't actually being named explicitly. For example, suppose you needed a subroutine that would return a copy of any string passed to it, with the leading and trailing whitespace trimmed from the copy. And suppose you also want that subroutine to default to trimming $_ if no explicit argument is provided (just as the built-in chomp does). You might write such a subroutine like this:

    sub trimmed_copy_of {
        # Trim explicit arguments...
        if (@_ > 0) {
            my ($string) = @_;
            $string =~ s{\A \s* (.*?) \s* \z}{$1}xms;
            return $string;
        }
        # Otherwise, trim the default argument (i.e. $_)...
        else {
            s{\A \s* (.*?) \s* \z}{$1}xms;
            return $_;
        }
    }

and then use it like so:

    print trimmed_copy_of($error_mesg);

    for (@diagnostics) {
        print trimmed_copy_of;
    }

Unfortunately, that implementation of trimmed_copy_of( ) is fatally flawed. After using the function in the previous code, the contents of $error_mesg are unchanged (as they should be), but each of the elements of @diagnostics has been unexpectedly shaved. That's because trimmed_copy_of( ) correctly deals with explicit arguments by copying them into a separate variable and then changing that copy:

        if (@_ > 0) {
            my ($string) = @_;
            $string =~ s{\A \s* (.*?) \s* \z}{$1}xms;
            return $string;
        }

But the subroutine applies its substitution directly to the (implicit) $_, without first copying its contents:

        else {
            s{\A \s* (.*?) \s* \z}{$1}xms;
            return $_;
        }

Within the for loop, the $_ variable is sequentially aliased to each element of the array:

    for (@diagnostics) {
        print trimmed_copy_of;
    }

which means that the substitution applied to $_ inside trimmed_copy_of( ) will alter the original array elements.

Something has clearly gone wrong in the design or the implementation. Either trimmed_copy_of( ) should never change the string it's trimming, or it should always change it. If it should never trim the original, the subroutine needs to be written:


    sub trimmed_copy_of {
        my $string = (@_ > 0) ? shift : $_;
        $string =~ s{\A \s* (.*?) \s* \z}{$1}xms;
        return $string;
    }

On the other hand, if the intention was that the subroutine consistently modify its (explicit or implicit) argument, then it should have been written like so:


    sub trim_str {
        croak 'Useless use of trim_str( ) in non-void context'
            if defined wantarray;

        for my $orig_arg ( @_ ? @_ : $_ ) {               
# all args or just $_
$orig_arg =~ s{\A \s* (.*?) \s* \z}{$1}xms;
# change the actual args
} return; }

in which case it would be used differently, too:


    for my $warning ($error_mesg, @diagnostics) {
        trim_str $warning;
        print $warning;
    }

There are several features of this second version of the subroutine that are worth noting. First, because the behaviour of the subroutine changed, its name also needs to change. trimmed_copy_of( ) returns a trimmed copy, so it's named with a past participle that describes how the argument was modified. trim_str( ) does something to its actual argument, so it's named with an imperative verb indicating the action to be carried out.

Next, there's the rather unusual test and exception in this second version:


    croak 'Useless use of trim_str( ) in non-void context'
            if defined wantarray;

You're probably more familiar with exceptions that warn about the useless use of constructs in void contexts, but here the subroutine dies if the context specifically isn't void. That's because the trim_str( ) subroutine exists solely to modify its arguments. It doesn't return a useful value, so anyone using it in a scalar context:

    $tidy_text = trim_str $raw_text;

or a list context:

    print trim_str $message;

is making a mistake. Killing them for it immediately is probably a kindness.

Finally, the heart of the trimming operation is:


        for my $orig_arg ( @_ ? @_ : $_ ) {              
# all args or just $_
$orig_arg =~ s{\A \s* (.*?) \s* \z}{$1}xms; }

In other words, if there is at least one element in the subroutine's argument list (@_), then iterate through those arguments, changing each of them. Otherwise, iterate through only $_, changing it. The use of (@_ ? @_ : $_) to generate the for loop's list is sufficiently unusual and line-noisy that it warrants clarification with an end-of-line comment.

Note too that that loop could have been written as:

        for ( @_ ? @_ : $_ ) {                          # all args or just $_
            s{\A \s* (.*?) \s* \z}{$1}xms;
        }

but it would then almost certainly have been harder to comprehend and maintain. In that version, the implicit $_ alias within the for loop would be aliased either sequentially to the elements of @_ (which are themselves aliases to the subroutine's actual arguments) or to whatever the $_ outside the loop was aliased to. At which point your brain explodes.

Similar problems caused by unintended modifications via $_ can also crop up within the block of a map or grep. See "List Processing Side Effects" in Chapter 6 for specific advice on avoiding that particular kind of pedesagittry.

    Previous Page
    Next Page