Previous Page
Next Page

19.13. Profiling

Don't optimize applicationsprofile them.

In the previous guideline, the benchmarked comparison between repeatedly computing sqrt $pixel_value and repeatedly looking up $sqrt_of[$pixel_value] indicated that caching provided a 9% improvement:

                     Rate         recompute   look_up_array
    recompute       3951/s            --             -8%
    look_up_array   4291/s            9%             --

That sounds impressive, but it's important to keep those numbers in perspective. Each iteration of the test did 256 square root retrievals. So, overall, the test was achieving 1,011,456 (i.e., 3951 x 256) sqrt calls per second, compared to 1,098,496 @sqrt_of look-ups per second.

Suppose you were processing the 786,432 pixels of a typical 1024 x 768 image. Using the example performance figures, the repeated sqrt calls would require around 0.78 seconds to process that many pixels, whereas the look-up table would take only about 0.72 seconds. Adding a cache to this section of your code would save you a grand total of 0.06 seconds per image.

That's an all-too-common outcome when code is optimized: developers focus their efforts on those components that are easy to optimize, rather than on those components in which improvements will produce the greatest benefit.

How do you find those places where optimization will do the most good? By understanding where your application spends most of its time. And the easiest way to do that is to profile your program using the standard Devel::DProf module, which can determine how long your application spends within each subroutine in your source code. That is, instead of running your program in the usual way:

    > perl datafile

run it under the auspices of the profiler module:

    > perl -d:DProf datafile

The -d: debugging flag is a shorthand for -MDevel::, so specifying -d:DProf is the same as specifying -MDevel::DProf. That tells perl to include that profiling module before the start of the source.

The module itself simply watches every subroutine call within your code, noting how much time elapses between the invocation and return, and adding that duration to a record of the total time spent in each subroutine. At the end of the program, the module creates a file called tmon.out in the current directory.

It's possible to directly interpret the raw data in the file (see the module docs for details), but much easier to understand it by passing it through the standard dprofpp application.

For example, you could use Devel::DProf to profile a call to your new autoformat application, and then summarize the results with dproffpp like so:

    > perl -d:DProf ~/bin/autoformat  < Ch_19.txt  > Ch_19_formatted.txt

dprofpp tmon.out Total Elapsed Time = 3.212516 Seconds User+System Time = 0.722516 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 16.6 0.120 0.416 3 0.0399 0.1387 main::BEGIN 11.0 0.080 0.138 9 0.0089 0.0153 Text::Autoformat::BEGIN 8.30 0.060 0.066 1 0.0600 0.0665 Getopt::Declare::parse 7.89 0.057 0.057 221 0.0003 0.0003 Text::Balanced::_failmsg 6.09 0.044 0.075 1 0.0437 0.0749 Text::Autoformat::autoformat 5.54 0.040 0.089 3 0.0133 0.0298 Getopt::Declare::Arg::BEGIN 5.26 0.038 0.252 1 0.0381 0.2516 Getopt::Declare::new 4.01 0.029 0.059 26 0.0011 0.0023 Getopt::Declare::BEGIN 3.88 0.028 0.111 70 0.0004 0.0016 Text::Balanced::extract_codeblock 3.60 0.026 0.074 133 0.0002 0.0006 Text::Balanced::_match_codeblock 2.77 0.020 0.020 1 0.0200 0.0199 Text::Balanced::ErrorMsg::BEGIN 2.77 0.020 0.030 3 0.0066 0.0099 vars::BEGIN 2.77 0.020 0.020 5 0.0040 0.0040 Exporter::as_heavy 2.77 0.020 0.020 17 0.0012 0.0012 Exporter::import 1.38 0.010 0.010 1 0.0100 0.0100 AutoLoader::AUTOLOAD

For each subroutine, the table produced by dprofpp shows (amongst other details):

  • The total time spent in the subroutine (%Time), as a percentage of the total time spent running the application

  • The actual amount of time spent within the subroutine (ExclSec)

  • The actual amount of time spent within the subroutine and any subroutines it called (CumulS)

  • The number of calls made to the subroutine (#Calls)

Looking at the output of the previous example, you can see that the program spends about 8% of its time in Text::Balanced::_failmsg( ), making a total of 221 calls to that subroutine. In contrast, it spends only about half that amount of time in 70 calls to Text::Balanced::extract_codeblock( ). So it probably makes sense to focus any optimization efforts on _failmsg( ) first.

If you need finer-grained profiling, the Devel::SmallProf CPAN module allows you to count how many times each line of your program is executed, which makes it easier to determine precisely what statement is causing a particular subroutine to be expensive:

    > perl -d:SmallProf datafile

The result is a file named smallprof.out, which is actually a copy of your source code with each line prefixed by the number of times it was executed, the total amount of time spent executing the line, and the line number. Although the module lacks a utility like dprofpp to summarize its output, it is an excellent investigative tool once you have identified the main suspects using Devel::DProf.

    Previous Page
    Next Page