Section 19.10. Caching

19.10. Caching

Look for opportunities to use caches.

It makes sense not to do the same calculation twice, if the result is small enough that it can reasonably be stored for reuse. The simplest form of that is putting a result into an interim variable whenever it will be used more than once. That is, instead of calling the same functions twice on the same data:

    print form(
        'hash alone: {>>>,>>>,>>} bytes', size(\%lookup),
        'data alone: {>>>,>>>,>>} bytes', total_size(\%lookup)-size(\%lookup),
        '==============================',
        'total:      {>>>,>>>,>>} bytes', total_size(\%lookup),
    );

call them once, store the results temporarily, and retrieve them each time they're needed:


    my $hash_mem  = size(\%lookup);
    my $total_mem = total_size(\%lookup);
    my $data_mem  = $total_mem - $hash_mem;

    print form(
        'hash alone: {>>>,>>>,>>} bytes',  $hash_mem,
        'data alone: {>>>,>>>,>>} bytes',  $data_mem,
        '==============================',

        'total:      {>>>,>>>,>>} bytes',  $total_mem,
    );

This often has the additional benefit of allowing you to name the interim values in ways that make the code more comprehensible.

Subroutines like size( ) and total_size( ) and functions like rand( ) or readline( ) don't always return the same result when called with the same arguments. Such subroutines are good candidates for temporary and localized reuse of results, but not for longer-term caching.

On the other hand, pure functions like sqrt( ) and int( ) and crypt( ) do always return the same result for the same list of arguments, so their return values can be stored long-term and reused whenever they're needed again. For example, if you have a subroutine that returns a case-insensitive SHA-512 digest:

    sub lc_digest {
        my ($text) = @_;

        use Digest::SHA qw( sha512 );
        return sha512(lc $text);
    }

then you could (potentially) speed it up over many calls by giving it a private look-up table in which results can be cached as they're computed, as shown in Example 19-9.

Example 19-9. Adding a cache to a digest subroutine


{
    my %cache;

    sub lc_digest {
        my $text = lc shift;

        # Compute the answer only if it's not already known...
        if (!exists $cache{$text}) {
            use Digest::SHA qw( sha512 );
            $cache{$text} = sha512($text);
        }

        return $cache{$text};
    }
}

On the other hand, if the range of possible data for a computation is small and the number of computations is large, then it's often simpler and more efficient to pre-compute the entire look-up table and then access it directly, thereby eliminating the cost of a subroutine call. For example, suppose you were doing some kind of image processing and needed square roots for pixel intensity values in the range 0 to 255. You could write:


    for my $row (@image_rows) {
        for my $pixel_value (@{$row}) {
            $pixel_value = sqrt($pixel_value);

        }
    }

or you could dramatically reduce the number of sqrt operations by precomputing all possible values and creating a look-up table:


    my @sqrt_of = map { sqrt $_ } 0..255;

    for my $row (@image_rows) {
        for my $pixel_value (@{$row}) {
            $pixel_value = $sqrt_of[$pixel_value];
        }
    }

For a thorough discussion of the many applications and advantages of caching, see Chapter 3 of Higher-Order Perl, by Mark Jason Dominus (Morgan Kaufmann, 2005)