Previous Page
Next Page

10.9. Simple Slurping

Slurp a filehandle with a do block for purity.

Whenever you do need to read in an entire file at once, the syntax shown in the final example of the previous guideline is the right way to do it:


    my $code = do { local $/; <$in> };

Localizing the global $/ variable (a.k.a. $RS or $INPUT_RECORD_SEPARATOR, under use English) temporarily replaces it with a version whose value is undef. But, if the input record separator is undefined, there is effectively no input record separator, so Perl treats the input as a single, unseparated record, and the single <> (or readline) reads in the entire input stream as a single "line".

Reading in a complete file or stream this way is much more efficient than "concatenative" approaches such as:

    my $code;
    while (my $line = <$in>) {
        $code .= $line;
    }

or:

    my $code = join $EMPTY_STR, <$in>;

That second alternative is particularly bad because, like the for (<>) discussed earlier, the join evaluates the read operation in a list context, constructs a list of individual lines, and then joins them back together to create a single string. This process requires about three times as much memory as:


    my $code = do { local $/; <$in> };

It's also appreciably slower, and doesn't scale nearly as well as the size of the input text increases[*].

[*] By the way, for all its virtues, the do {...} approach isn't the fastest way to slurp a file of known (and very large) length. The very quickest way to do that is with a low-level system read:

        
        sysread $fh, $text, -s $fh;

But then, of course, you have to live with the cryptic syntax, and with any idiosyncrasies that low-level I/O might be subject to on your particular platform. If you do need to use this highest-speed approach to slurping files, at least consider using the File::Slurp CPAN module, which encapsulates that messy sysread in a tidy read_file( ) subroutine.

Note that it's important to put that localization-and-read inside a do {...} or in some other small block. A common mistake is to write this instead:

    $/ = undef;
    my $text = <$in>;

That works perfectly well, in itself, but it also undefines the global input record separator, rather than its temporary localized replacement. But the global input record separator controls the read behaviour of every filehandleeven those that are lexically scoped, or in other packages. So, if you don't localize the change in $/ to some small scope, you're dooming every subsequent read everywhere in your program to vile slurpitude.

    Previous Page
    Next Page