Section 19.12. Caching for Optimization

19.12. Caching for Optimization

Benchmark any caching strategy you use.

Caching is a strategy that would seem to have no downside. After all, computing a value only once is obviously always going to be quicker than recomputing it many times^[]. That's true, of course, but it isn't the whole story. Occasionally, caching can backfire and actually make a computation slower.

^[] At least, until your calculations turn quantum.

It's certainly the case that computing once is always quicker than recomputing every time. However, caching isn't quite a case of computing-once; it's actually a case of computing-once-and-forever-after-rechecking-whether-you've-already-computed-and-if-so-then-accessing-the-previously-computed-value. That more complicated process may not always be quicker than recomputing every time. Searching and then accessing a look-up table has an intrinsic cost, which can occasionally be greater than redoing the entire calculation. Especially if the look-up table is a hash.

So, whenever you decide to add caching to a computation, it's essential to benchmark the resulting code, to make sure that the cache look-up costs aren't more expensive that the computation itself. For example, for the pixel square roots from the previous guideline, a simple speed comparison:


    use Benchmark qw( cmpthese );

    my @sqrt_of = map {sqrt $_} 0..255;

    cmpthese -30, {
        recompute      => q{ for my $n (0..255) { my $res = sqrt $n      } },
        look_up_array  => q{ for my $n (0..255) { my $res = $sqrt_of[$n] } },
    };

reveals that, in this instance, using a look-up table is only about 9% faster than just calling sqrt directly every time:


                     Rate         recompute   look_up_array
    recompute       3951/s            --             -8%
    look_up_array   4291/s            9%             --

You then need to decide whether that marginal performance improvement is enough to warrant the additional complexity in the code.

By the way, if you're using Memoize to install your caching, you can tell it to install the memoized version of your subroutine under a different name by using the INSTALL option. This makes it easy to benchmark the relative performance of two versions:


    
    # Install a memoized version of lc_digest() as fast_lc_digest(  )...
    memoize( 'lc_digest', INSTALL=>'fast_lc_digest' );

    # See if it is actually any faster on real data sets...
    cmpthese -30, {
        nomemo => q{ for my $text (@real_data) { my $res = lc_digest($text); }      },
        memo   => q{ for my $text (@real_data) { my $res = fast_lc_digest($text); } },
    };