Section 10.7. Input Loops

10.7. Input Loops

Use while (<>), not for (<>).

Programmers are occasionally tempted to write input loops using a for, like this:

    use Regexp::Common;
    Readonly my $EXPLETIVE => $RE{profanity};

    for my $line (<>) {
        $line =~ s/$EXPLETIVE/[DELETED]/gxms;
        print $line;
    }

That's presumably because for loops are inherently finite in their number of iterations, and hence intrinsically more robust. Or perhaps it's just that the keyword is two characters shorter.

Whatever the reason, using a for loop to iterate input is a very inefficient and brittle solution. The iteration list of a for loop is (obviously) a list context. So in the example, the <> operator is called in a list context. Evaluating <> in list context causes it to read in every line it can, building a temporary list as it does. Once the input is complete, that list becomes the list to be iterated by the for.

There are several problems with that approach. For a start, it means the for loop won't start to iterate until the entire input stream has been read and an end-of-file encountered. This means that the previous code can't be used interactively. Moreover, constructing a (potentially very long) list of the input lines is expensive, both in terms of the memory required to store the entire list and in terms of the time required to allocate that memory and to actually build the list.

Worst of all, the for input loop doesn't scale well. Its memory requirements are linearly proportional to the total size of the input, with something like a 200% overhead^[*]. That means that a sufficiently large input might actually break the input loop with a memory allocation failure (Out of memory!), or at least slow it down intolerably with excessive memory allocation and swapping overheads.

^[*] Under Perl 5.8, for example, to read in 100,000 lines of 30 characters each (i.e., 3 MB of data) in a for loop requires just under 6 MB of allocated memory for the initial list. Reading in a file of one million such lines requires 59 MB of allocated memory, before the loop even starts. In contrast, the equivalent while loop never uses more than 55 bytes for either file.

In contrast, an equivalent while loop:


    while (my $line = <>) {
        $line =~ s/$EXPLETIVE/[DELETED]/gxms;
        print $line;
    }

reads and processes only one line at a time. This version can be used interactively, and never allocates more memory than is needed to accommodate the longest individual line. So use a while instead of a for when reading input.

By the way, the same problems don't arise when iterating large ranges:


    for my $n (2..1_000_000_000) {
        my @factors = factors_of($n);

        if (@factors == 2) {
            print "$n is prime\n";
        }
        else {
            print "$n is composite with factors: @factors\n";
        }
    }

In modern versions of Perl, ranges are lazily evaluated, so the previous code doesn't first have to build a list of 999,999,999 consecutive integers before the for can start iterating.