At this point you may wonder if other divisors present other problems. We see in this section that they do not; the three examples given illustrate the only cases that arise (for d 2).
Some of the proofs are a bit complicated, so to be cautious, the work is done in terms of a general word size W.
Given a word size W 3 and a divisor d, 2 d < 2W - 1, we wish to find the least integer m and integer p such that
with 0 m < 2W and p W.
The reason we want the least integer m is that a smaller multiplier may give a smaller shift amount (possibly zero) or may yield code similar to the "divide by 5" example, rather than the "divide by 7" example. We must have m 2W - 1 so the code has no more instructions than that of the "divide by 7" example (that is, we can handle a multiplier in the range 2W - 1 to 2W - 1 by means of the add that was inserted in the "divide by 7" example, but we would rather not deal with larger multipliers). We must have p W because the generated code extracts the left half of the product mn, which is equivalent to shifting right W positions. Thus, the total right shift is W or more positions.
There is a distinction between the multiplier m and the "magic number," denoted M. The magic number is the value used in the multiply instruction. It is given by
Because (1b) must hold for which implies
Let nc be the largest (positive) value of n such that (nc, d) = d - 1. nc exists because one possibility is nc = d - 1. It can be calculated from nc is one of the highest d admissible values of n, so
Because (1a) must hold for n = nc,
Combining this with (2) gives
Because m is to be the least integer satisfying (4), it is the next integer greater than 2p/d; that is,
Combining this with the right half of (4) and simplifying gives
Thus, the algorithm to find the magic number M and the shift amount s from d is to first compute nc, and then solve (6) for p by trying successively larger values. If p < W, set p = W (the theorem below shows that this value of p also satisfies (6)). When the smallest p W satisfying (6) is found, m is calculated from (5). This is the smallest possible value of m, because we found the smallest acceptable p, and from (4) clearly smaller values of p yield smaller values of m. Finally, s = p - W and M is simply a reinterpretation of m as a signed integer (which is how the mulhs instruction interprets it).
Forcing p to be at least W is justified by the following:
Theorem DC1. If (6) is true for some value of p, then it is true for all larger values of p.
Proof. Suppose (6) is true for p = p0. Multiplying (6) by 2 gives
From Theorem D5, Combining gives
Therefore, (6) is true for p = p0 + 1, and hence for all larger values.
Thus, one could solve (6) by a binary search, although a simple linear search (starting with p = W) is probably preferable, because usually d is small, and small values of d give small values of p.
We must show that (6) always has a solution and that 0 m < 2W. (It is not necessary to show that p W, because that is forced.)
We show that (6) always has a solution by getting an upper bound on p. As a matter of general interest, we also derive a lower bound under the assumption that p is not forced to be at least W. To get these bounds on p, observe that for any positive integer x, there is a power of 2 greater than x and less than or equal to 2x. Hence from (6),
Because 0 rem(2p, d) d - 1,
From (3a) and (3b), nc max(2W - 1 - d, d - 1). The lines f1(d) = 2W - 1 - d and f2(d) = d - 1 cross at d = (2W - 1 + 1)/2. Hence nc (2W - 1 - 1)/2. Because nc is an integer, nc 2W - 2. Because nc, d 2W - 1 - 1, (7) becomes
The lower bound p = W - 1 can occur (e.g., for W = 32, d = 3), but in that case we set p = W.
If p is not forced to equal W, then from (4) and (7),
Using (3b) gives
Because nc 2W - 1 - 1(3a),
If p is forced to equal W, then from (4),
Because 2 d 2W - 1 - 1 and nc 2W - 2,
Hence in either case m is within limits for the code schema illustrated by the "divide by 7" example.
Equation (5) and inequality (6) are easily seen to imply (4). (In the case that p is forced to be equal to W, (6) still holds, as shown by Theorem DC1.) In what follows, we consider separately the following five ranges of values of n:
From (4), because m is an integer,
Multiplying by n/2p, for n 0 this becomes
For 0 n nc, 0 (2p - 1)n/(2pdnc) < 1/d, so by Theorem D4,
Hence (1a) is satisfied in this case (0 n nc).
For n > nc, n is limited to the range
because n nc + d contradicts the choice of nc as the largest value of n such that rem(nc, d) = d - 1 (alternatively, from (3a), n nc + d implies n 2W - 1). From (4), for n 0,
By elementary algebra, this can be written
From (9), 1 n - nc d - 1, so
Because nc d - 1 (by (3b)) and (nc + 1)/nc has its maximum when nc has its minimum,
In (10), the term (nc + 1)/d is an integer. The term (n - nc)(nc + 1)/dnc is less than or equal to 1. Therefore, (10) becomes
For all n in the range (9), Hence (1a) is satisfied in this case (nc + 1 n nc + d - 1).
For n < 0, from (4) we have, because m is an integer,
Multiplying by n/2p for n < 0 this becomes
Using Theorem D2 gives
Because n + 1 0, the right inequality can be weakened, giving
For -nc n - 1,
Hence by Theorem D4,
so that (1b) is satisfied in this case (-nc n -1).
For n < -nc, n is limited to the range
(From (3a), n < - nc - d implies that n < -2W - 1, which is impossible.) Performing elementary algebraic manipulation of the left comparand of (11) gives
For -nc - d n - nc - 1,
The ratio (nc + 1)/nc is a maximum when nc is a minimum; that is, nc = d - 1. Therefore,
From (13), because (- nc - 1)/d is an integer and the quantity added to it is between 0 and -1,
For n in the range - nc - d + 1 n - nc - 1,
Hence 梩hat is, (1b) is satisfied.
The last case, n = - nc - d, can occur only for certain values of d. From (3a), - nc - d - 2W - 1, so if n takes on this value, we must have n = -nc - d = -2W - 1, and hence nc = 2W - 1 - d. Therefore, rem(2W - 1, d)= rem(nc + d, d) = d - 1 (that is, d divides 2W - 1 + 1).
For this case (n = - nc - d), (6) has the solution p = W - 1(the smallest possible value of p), because for p = W - 1,
Then from (5),
so that (1b) is satisfied.