It is nearly a triviality that one can multiply by a constant with a sequence of shift left and add instructions. For example, to multiply x by 13 (binary 1101), one can code
where r gets the result.
In this section, left shifts are denoted by multiplication by a power of 2, so the above plan is written r 8x + 4x + x, which is intended to show four instructions on the basic RISC and most machines.
What we want to convey here is that there is more to this subject than meets the eye. First of all, there are other considerations besides simply the number of shift's and add's required to do a multiplication by a given constant. To illustrate, below are two plans for multiplying by 45 (binary 101101).
The plan on the left uses a variable t that holds x shifted left by a number of positions that corresponds to a 1-bit in the multiplier. Each shifted value is obtained from the one before it. This plan has these advantages:
?span style='font:7.0pt "Times New Roman"'> It requires only one working register other than the input x and the output r.
?span style='font:7.0pt "Times New Roman"'> Except for the first two, it uses only 2-address instructions.
?span style='font:7.0pt "Times New Roman"'> The shift amounts are relatively small.
The same properties are retained when the plan is applied to any multiplier.
The scheme on the right does all the shift's first, with x as the operand. It has the advantage of increased parallelism. On a machine with sufficient instruction-level parallelism, the scheme on the right executes in three cycles, whereas the scheme on the left, running on a machine with unlimited parallelism, requires four.
In addition to these details, it is nontrivial to find the minimum number of operations to accomplish multiplication by a constant, where by an "operation" we mean an instruction from a typical computer's set of add and shift instructions. In what follows, we assume this set consists of add, subtract, shift left by any constant amount, and negate. We assume the instruction format is three-address. However, the problem is no easier if one is restricted to only add (adding a number to itself, and then adding the sum to itself, and so on, accomplishes a shift left of any amount), or if one augments the set by instructions such as the HP PA-RISC's shift and add instructions. (These shift the contents of a register left by one, two, or three positions, add it to a second register, and put the result in a third register. Thus, it can multiply by 3, 5, or 9 in a single, presumably fast, instruction.) We also assume that only the least significant 32 bits of the product are wanted.
The first improvement to the basic binary decomposition scheme suggested above is to use subtract to shorten the sequence when the multiplier contains a group of three or more consecutive 1-bits. For example, to multiply by 28 (binary 11100), we can compute 32x - 4x (three instructions) rather than 16x + 8x + 4x (five instructions). On two's-complement machines, the result is correct even if the intermediate result of 32x overflows and the final result does not.
To multiply by a constant m with the basic binary decomposition scheme (using only shift's and add's) requires
instructions, where d = 1 if m ends in a 1-bit (is odd), and d = 0 otherwise. If subtract is also used, it requires
instructions, where g(m) is the number of groups of two or more consecutive 1-bits in m, s(m) is the number of "singleton" 1-bits in m, and d has the same meaning as before.
For a group of size 2, it makes no difference which method is used.
The next improvement is to treat specially groups that are separated by a single 0-bit. For example, consider m = 55 (binary 110111). The group method calculates this as (64x - 16x) + (8x - x), which requires six instructions. Calculating it as 64x - 8x - x, however, requires only four. Similarly, we can multiply by binary 110111011 as illustrated by the formula 512x - 64x - 4x - x (six instructions).
The formulas above give an upper bound on the number of operations required to multiply a variable x by any given number m. Another bound can be obtained based on the size of m in bits梩hat is, on
Theorem. Multiplication of a variable x by an n-bit constant m, m 1, can be accomplished with at most n instructions of the type add, subtract, and shift left by any given amount.
Proof. (Induction on n.) Multiplication by 1 can be done in 0 instructions, so the theorem holds for n = 1. For n > 1, if m ends in a 0-bit, then multiplication by m can be accomplished by multiplying by the number consisting of the left n - 1 bits of m (that is, by m/2), in n - 1 instructions, followed by a shift left of the result by one position. This uses n instructions altogether.
If m ends in binary 01, then mx can be calculated by multiplying x by the number consisting of the left n - 2 bits of m, in n - 2 instructions, followed by a left shift of the result by 2, and an add of x. This requires n instructions altogether.
If m ends in binary 11, then consider the cases in which it ends in 0011, 0111, 1011, and 1111. Let t be the result of multiplying x by the left n - 4 bits of m. If m ends in 0011, then mx = 16t + 2x + x, which requires (n - 4) + 4 = n instructions. If m ends in 0111, then mx = 16t + 8x - x, which requires n instructions. If m ends in 1111, then mx = 16t + 16x - x, which requires n instructions. The remaining case is that m ends in 1011.
It is easy to show that mx can be calculated in n instructions if m ends in 001011, 011011, or 111011. The remaining case is 101011.
This reasoning can be continued, with the "remaining case" always being of the form 101010?0101011. Eventually, the size of m will be reached, and the only remaining case is the number 101010?0101011. This n-bit number contains n/2 + 1 1-bits. By a previous observation, it can multiply x with 2(n/2 + 1) - 2 = n instructions.
Thus, in particular, on a 32-bit machine multiplication by any constant can be done in at most 32 instructions, by the method described above. By inspection, it is easily seen that for n even, the n-bit number 101010?01011 requires n instructions, and for n odd, the n-bit number 1010101?10110 requires n instructions, so the bound is tight.
The methodology described so far is not too hard to work out by hand, or to incorporate into an algorithm such as might be used in a compiler. But such an algorithm would not always produce the best code, because further improvement is sometimes possible. This can result from factoring the multiplier m or some intermediate quantity along the way of computing mx. For example, consider again m = 45 (binary 101101). The methods described above require six instructions. Factoring 45 as 5 ?9, however, gives a four-instruction solution:
Factoring may be combined with additive methods. For example, multiplication by 106 (binary 1101010) requires seven instructions by the additive methods, but writing it as 7 ?15 + 1 leads to a five-instruction solution.
With factoring, the maximum number of instructions needed to multiply by an n-bit constant is, to the writer's knowledge, an open problem. For large n it may be less than the bound of n proved above. For example, m = 0xAAAAAAAB requires 32 instructions without factoring, but writing this value as 2 ?5 ?17 ?257 ?65537 + 1 gives a ten-instruction solution. (Ten instructions, however, is probably not typical of large numbers. The factorization reflects the simple bit pattern of alternate 1's and 0's.)
This should give an idea of the combinatorics involved in this seemingly simple problem. Knuth [Knu2, sec. 4.6.3] discusses the closely related problem of computing am using a minimum number of multiplications. This is analogous to the problem of multiplying by m using only addition instructions. A compiler algorithm for computing mx is described in [Bern].