Previous Page
Next Page

6.14. List Processing Side Effects

Never modify $_ in a list function.

One particular feature of the way the map, grep, and first functions work can easily become a source of subtle errors. These functions all use the $_ variable to pass each list element into their associated block. But, for better efficiency, these functions alias $_ to each list value they're iterating, rather than copying each value into $_.

You probably don't often think of map, grep, and first as creating aliases. You probably just think of those functions as taking a list and returning a second, independent list. And, most importantly, you almost certainly don't expect them to change the original list.

However, if the block you give to a map, grep, or first modifies $_ in any way, then it's actually modifying an alias to some element of the function's list. That means it's actually modifying the original element itself, which is almost certainly an error.

This kind of mistake commonly occurs in code like this:

    # Select .pm files for which no corresponding .pl file exists...
        = grep { s/.pm\z/.pl/xms && !-e } @pm_files;

The intention here is almost certainly virtuous. The thought process was probably something like:

The implicit $_ successively holds a copy of each of the filenames in @pm_files. I'll replace the .pm suffix of that copy with .pl, then see if the resulting file exists. If it does, then the original (.pm) filename will be passed through the grep to be collected in @pm_files_without_pl_files.

The mistake is simple, but deadly: $_ doesn't successively hold a copy of anything. It successively holds aliases. So the actual effect of the grep is far more sinister. $_ is an aliasthat is, just another namefor each of the filenames in @pm_files. So the substitution in the grep block replaces the .pm suffix of each original filename with .pl; then the -e checks whether the resulting file exists. If the file doesn't exist, then the filename (now ending in .pl) will be passed through to @pm_files_without_pl_files. And, regardless of whether that name is passed through or not, the block will have modified the original element in @pm_files.
Not only did that grep statement unintentionally mess up the contents of @pm_files, it didn't even do the job it was supposed to do. Because it changes each $_ on the way through, what you actually get back are the names of the .pl files that were M.I.A., not the .pm files that were looking for them.
This kind of error can occur anywhere that the block of any list-processing function uses any of Perl's numerous $_-modifying features. For example:
    # Find the first "chunk" that spans more than one line
        = first { chomp; m/\n/xms; } @file_chunks;

Here, the first block chomps the actual elements of @file_chunks, because the raw chomp chomps $_, which is successively aliased to each element of @file_chunks. But the first stops calling its block as soon as one of those post-chomped elements is found to still have a newline in it (m/\n/xms).
So, after this assignment statement executes, the first will have surreptitiously chomped each element in @file_chunks, up to and including the first element that contained a newline. But, because first will have stopped checking at that point, none of the elements after the first match will have been modified at all. So @file_chunks is left in a state that is simultaneously unexpected (it's not obvious that the array was being modified at all), inconsistent (only part of the array has been modified), and unpredictable (how much was modified depends on the contents of the array).
Of course, there is no limit to human iniquity, and occasionally that kind of subtle nastiness is actually intentional. For example:
    use List::MoreUtils qw( uniq );

    # Remove directory pathnames from filenames and collect separately...
    @dir_paths = uniq map { s{ \A (.*/) }{}xms ? $1 : './' } @file_paths;

In this case, the sneaky substitution within the map block is deliberate. The implementer genuinely wants to chop off any leading 'some/path/here/' from each element of @file_paths, and at the same time collect those off-cuts into the @dir_paths array. Ten-out-of-ten for Perl savoir-faire, but minus several million for maintainability.
The rule here is simple: no map, grep, or first block should ever have a side effect. In particular, no map, grep, or first block should ever modify $_.
If your block really does need to modify a copy of each list element, then create the copy explicitly within the block:

        = grep {
              my $file = $_;
              $file =~ s/.pm\z/.pl/xms;
              !-e $file;
          } @pm_files;

In this version, the substitution is applied to an explicit copy of the filename (in $file), so the original strings in @pm_files will be unchanged, and the filenames that flow through to @pm_files_without_pl_files will retain their original .pm suffixes.
On the other hand, if you find you really do need side effects in your map, grep, or first block, then don't use map or grep or first at all. Rewrite the code as a for loop instead. That way, the side-effects in the loop can easily be detected and understood:


    # Track directory paths to ensure uniqueness...
my %seen_dir; FILE_PATH: for my $file (@file_paths) {
# Default to current directory...
my $dir_path = './';

        # Capture and remove any actual directory path and use it as the path...
if ($file =~ s{ \A (.*/) }{}xms) { $dir_path = $1; }
# Reject repeated directory paths...
next FILE_PATH if $seen_dir{$dir_path}++;
# Record the extracted path...
push @dir_paths, $dir_path; }

    Previous Page
    Next Page