Section 15.10. Accessors

15.10. Accessors

Provide separate read and write accessors.

Most developers who write classes in Perl provide access to an object's attributes in the way that's demonstrated in Example 15-6.

That is, they write a single method^[*] for each attribute, giving that method the same name as the attribute. Each accessor method always returns the current value of its corresponding attribute, and each can be called with an extra argument, in which case it also updates the attribute to that new value. For example:

^[*] Sometimes referred to as a mutator.

    
    # Create the new military record...
    my $dogtag = Dogtag->new({ serial_num => 'AGC10178B' });

    $dogtag->name( 'MacArthur', 'Dee' );    # Called with args, so store name attr
    $dogtag->rank( 'General' );             # Called with arg, so store rank attr

    
    # Called without arg, so just retrieve attribute values...
    print 'Your new commander is: ',
          $dogtag->rank(), $SPACE, $dogtag->name( )->{surname},
          "\n";

    print 'Her serial number is:  ', $dogtag->serial_num( ), "\n";

This approach has the advantage of requiring only a single, obviously named method per attribute, which means less code to maintain. It also has the advantage that it's a widely known convention, used both throughout Perl's OO-related manpages and in numerous books.

However, despite those features, it's clearly not the best way to write accessor methods.

Example 15-6. The usual way accessor methods are implemented

package Dogtag;
use Class::Std::Utils;
{
    # Attributes...
    my %name_of;
    my %rank_of;
    my %serial_num_of;

    # The usual inside-out constructor...
    sub new {
        my ($class, $arg_ref) = @_;

        my $new_object = bless anon_scalar( ), $class;

        $serial_num_of{ident $new_object} =  $arg_ref->{serial_num},

        return $new_object;
    }

    # Control access to the name attribute...
    sub name {
        my ($self, $new_surname, $new_first_name) = @_;
        my $ident = ident($self);          # Factor out repeated calls to ident( )

        
        # No argument means return the current value...
        return $name_of{$ident} if @_ == 1;

        # Otherwise, store the two components of the new value...
        $name_of{$ident}{surname}    = $new_surname;
        $name_of{$ident}{first_name} = $new_first_name;

        return;
    }

    
    # Same deal for accessing the rank attribute...
    sub rank {
        my ($self, $new_rank) = @_;

        return $rank_of{ident $self} if @_ == 1;

        $rank_of{ident $self} = $new_rank;

        return;
    }

    
    # Serial numbers are read-only, so this accessor is much simpler...
    sub serial_num {
        my ($self) = @_;

        return $serial_num_of{ident $self};
    }

    
    # [Other methods of the class here]

    sub DESTROY {
        my ($self) = @_;
        my $ident = ident($self);     # Factor out repeated calls to ident( )

        for my $attr_ref (\%name_of, \%rank_of, \%serial_num_of) {
            delete $attr_ref->{$ident};
        };

        return;
    }
}

For a start, these dual-purpose methods suffer from some of the same drawbacks as the dual-purpose constructors that were advised against earlier (see the "Cloning" guideline). For example, this might or might not change the dogtag's name:

    $dogtag->name(@curr_soldier);

depending on whether @curr_soldier is empty. That might sometimes be very desirable behaviour, but it can also mask some very subtle bugs if it's not what was intended. Either way, a dual-purpose accessor doesn't always give you the ability to encode your intentions unambiguously.

The combined store/retrieve methods are also marginally less efficient than they could be, as they have to perform an extra conditional test every time they're called, in order to work out what they're supposed to do. Comparisons of this kind are very cheap, so it's not a big dealat least, not until your system scales to the point where you're doing a very large number of accesses.

The final problem with this approach is subtler and more profound; in fact, it's psychological. There's actually a nasty flaw in the code of one of the accessors shown in Example 15-6. It's comparatively hard to see because it's a sin of omission. It bites developers because of the way they naturally think.

The problem is in the serial_num( ) method: unlike the other two accessors, it isn't dual-purpose. The consistent get/set behaviour of the name( ) and rank( ) methods^[*] sets up and then reinforces a particular expectation: pass an argument, update the attribute.

^[*] And of the future billet( ) and company( ) and platoon( ) and assignment( ) and service_history( ) and fitrep( ) and medical_record( ) and citations( ) and shoesize( ) methods.

So it's natural to expect that the following will also work as intended:

    
    # convert from old serial numbers to the new prefixed scheme...
    for my $dogtag (@division_personnel) {
        my $old_serial_num = $dogtag->serial_num( );
        $dogtag->serial_num( $division_code . $old_serial_num );
    }

But, of course, it doesn't work at all. Worse, it fails silently. The call to serial_num( ) completely ignores any arguments passed to it, and quietly goes about its sole task of returning the existing serial number, which is then silently thrown away. Debugging these kinds of problems can be exceptionally difficult, because your brain gets in the way. Having subliminally recognized the "pass argument; set attribute" pattern, your brain will have filed that belief away as one of the axioms of the class, and when it later sees:

    $dogtag->serial_num( $division_code . $old_serial_num );

it automatically excludes the possibility that that statement could possibly be the cause of the program's misbehaviour. You're passing an argument, so it must be updating the attribute. That's a given. The problem must be somewhere else.

Of course, none of this happens at a conscious level. You just automatically ignore the offending line and start debugging the hard way, tracing the data back to see where it "got corrupted" and then forward to see where it "gets erased". Finally, after a couple of fruitless, frustrating hours some weedy intern on his very first day, being shown around by your boss, will glance over your shoulder, look straight at the serial_num( ) call, and point out your "obvious" error.

The real problem here isn't your brain's psychological blind-spot; the real problem is that the rest of the dual-purpose accessors are guessing your intention from the data sent to them. But the single-purpose serial_num( ) doesn't need to guess; it always knows exactly what to do. The natural, human response is to rejoice in that certainty and simply code for what you know the method should always do, rather than catering for what others might potentially think it could do.

The problem isn't hard to solve, of course. You simply rewrite serial_num( ) to anticipate and avoid the inevitable psychological trap:

        
        # Serial numbers are read-only, so this accessor is much simpler...
        sub serial_num {
            my ($self) = @_;

            croak q{Can't update serial number} if @_ > 1;

            return $serial_num_of{ident $self};
        }

Unfortunately, very few developers ever do that. It's easier not to write the extra line. And it's much easier not to have to ponder the gestalt psychodynamic ramifications of the class on the collective developer consciousness in order to work out that you needed to write that extra line in the first place.

Under the dual-purpose accessor idiom, the natural inclination to omit that "unnecessary" code leaves the interpreter unable to diagnose a common mistake. Fortunately, it isn't difficult to turn those consequences around, so that leaving unnecessary code out causes the interpreter to diagnose the mistake. All you need to do is split the two distinct access tasks into two distinct methods, as shown in Example 15-7.

Example 15-7. A better way to implement class accessors


    
    # Control access to the name attribute...
    sub set_name {
        my ($self, $new_surname, $new_first_name) = @_;

        # Check that all arguments are present and accounted for...
        croak( 'Usage: $obj->set_name($new_surname, $new_first_name)' )
            if @_ < 3;

        # Store components of new value in a hash...
        $name_of{ident $self}{surname}    = $new_surname;
        $name_of{ident $self}{first_name} = $new_first_name;

        return;
    }

    sub get_name {
        my ($self) = @_;
        return $name_of{ident $self};
    }

    
    # Same deal for accessing the rank attribute...
    sub set_rank {
        my ($self, $new_rank) = @_;

        $rank_of{ident $self} = $new_rank;

        return;
    }

    sub get_rank {
        my ($self) = @_;
        return $rank_of{ident $self};
    }

    # Serial numbers are read-only, so there's no set_serial_num( ) accessor...
    sub get_serial_num {
        my ($self) = @_;
        return $serial_num_of{ident $self};
    }

Here, each accessor that returns a value just returns that value, whereas each accessor that stores a value expects a second argument (the new value), uses it to update the attribute, and then returns nothing.

Any code that uses these accessors will now explicitly record the developer's intention for each accessor call:


    
    # Create the new military record...
    my $dogtag = Dogtag->new( {serial_num => 'AGC10178B'} );

    $dogtag->set_name( 'MacArthur', 'Dee' );
    $dogtag->set_rank( 'General' );

    # Retrieve attribute values...
    print 'Your new commander is: ',
          $dogtag->get_rank(), $SPACE, $dogtag->get_name( )->{surname}, "\n";

    print 'Her serial number is:  ',
          $dogtag->get_serial_num( ), "\n";

The code is also now slightly easier to read, because you can tell at a glance whether a particular accessor call is updating or retrieving an attribute value. So the former "reminder" comments (# Called with arg, so store name attr) are no longer necessary; the code is now self-documenting in that respect.

More importantly, no-one is ever going to mistakenly write:

    $dogtag->get_serial_num( $division_code . $old_serial_num );

Human brains don't misbehave that particular waywhich means you don't have to remember to have get_serial_number( ) test for that possibility.

That's not to say that developers who use the class won't still misgeneralize the getting-vs-storing axiom. They will. But now, having successfully called set_name( ) and set_rank( )^[*], the rule they'll mistakenly devise is: "call set_whatever( ); update an attribute". Hence when they erroneously try to update the serial number, what they'll write is:

^[*] And set_billet( ) and set_company( ) and set_platoon( ) and...aw, you get the idea.

    $dogtag->set_serial_num( $division_code . $old_serial_num );

At which point the interpreter will immediately shoot to kill:


    Can't locate object method "set_serial_num" via package "Dogtag"
    at rollcall.pl line 99

Now the natural programmer tendency to leave out extraneous code is actually working in your favour. By not implementing set_serial_num( ), you've ensured that any erroneous attempts to use it are automatically detected, and loudly reported.

Implementing separate "get" and "set" accessors for attributes offers a significant improvement in readability and self-documentation, and even a marginal boost in performance. By using distinct method names for distinct operations, you can better encode your intentions in your source code, use one human frailty (under-exertion) to guard against another (overgeneralization) andbest of allconvince the compiler to debug your colleagues' miswired brains for you.