10-6 Incorporation into a Compiler

For a compiler to change division by a constant into a multiplication, it must compute the magic number M and the shift amount s, given a divisor d. The straightforward computation is to evaluate (6) or (18) for p = W, W + 1, ?until it is satisfied. Then, m is calculated from (5) or (17). M is simply a reinterpretation of m as a signed integer, and s = p - W.

The scheme described below handles positive and negative d with only a little extra code, and it avoids doubleword arithmetic.

Recall that nc is given by



Hence |nc| can be computed from



The remainder must be evaluated using unsigned division, because of the magnitude of the arguments. We have written rem(t, |d|) rather than the equivalent rem(t, d), to emphasize that the program must deal with two positive (and unsigned) arguments.

From (6) and (18), p can be calculated from

Equation 19



and then |m| can be calculated from (c.f. (5) and (17)):

Equation 20



Direct evaluation of rem(2p,|d|) in (19) requires "long division" (dividing a 2W-bit dividend by a W-bit divisor, giving a W-bit quotient and remainder), and in fact it must be unsigned long division. There is a way to solve (19), and in fact to do all the calculations, that avoids long division and can easily be implemented in a conventional HLL using only W-bit arithmetic. We do, however, need unsigned division and unsigned comparisons.

We can calculate rem(2p, |d|) incrementally, by initializing two variables q and r to the quotient and remainder of 2p divided by |d| with p = 2W - 1, and then updating q and r as p increases.

As the search progresses梩hat is, when p is incremented by 1?span class=docemphasis1>q and r are updated from (see Theorem D5(a))

q = 2*q; 
r = 2*r; 
if (r >= abs(d)) {
牋 q = q + 1; 
牋爎 = r - abs(d);} 

The left half of inequality (4) and the right half of (16), together with the bounds proved for m, imply that graphics/10icon90.gifso q is representable as a W-bit unsigned integer. Also, 0 r < |d| so r is representable as a W-bit signed or unsigned integer. (Caution: The intermediate result 2r can exceed 2W - 1 - 1, so r should be unsigned and the comparison above should also be unsigned.)

Next, calculate d = |d| - r. Both terms of the subtraction are representable as W-bit unsigned integers, and the result is also (1 d |d|), so there is no difficulty here.

To avoid the long multiplication of (19), rewrite it as



The quantity graphics/10icon92.gifis representable as a W-bit unsigned integer (similarly to (7), from (19) it can be shown that graphics/10icon93.gifand, for d = -2W - 1, nc = - 2W - 1 + 1 and p = 2W - 2 so that graphics/10icon94.giffor W 3). Also, it is easily calculated incrementally (as p increases) in the same manner as for rem(2p, |d|). The comparison should be unsigned, for the case graphics/10icon95.gif(which can occur, for large d).

To compute m, we need not evaluate (20) directly (which would require long division). Observe that



The loop closure test graphics/10icon97.gifis awkward to evaluate. The quantity graphics/10icon98.gifis available only in the form of a quotient q1 and a remainder graphics/10icon99.gifmay or may not be an integer (it is an integer only for d = 2W - 2 + 1 and a few negative values of d). The test graphics/10icon100.gifmay be coded as



The complete procedure for computing M and s from d is shown in Figure 10-1, coded in C, for W = 32. There are a few places where overflow can occur, but the correct result is obtained if overflow is ignored.

Figure 10-1 Computing the magic number for signed division.
struct ms {int M;牋牋牋牋?// Magic number 
牋牋牋牋牋int s;};牋牋牋牋 // and shift amount. 
struct ms magic(int d) {牋 // Must have 2 <= d <= 2**31-1 
牋牋牋牋牋牋牋牋牋牋牋牋牋?/ or牋 -2**31 <= d <= -2. 
牋爄nt p; 
牋爑nsigned ad, anc, delta, q1, r1, q2, r2, t; 
牋燾onst unsigned two31 = 0x80000000;牋牋 // 2**31. 
牋爏truct ms mag; 
牋 ad = abs(d); 
牋爐 = two31 + ((unsigned)d >> 31); 
牋燼nc = t - 1 - t%ad;牋牋 // Absolute value of nc. 
牋爌 = 31;牋牋牋牋牋牋牋牋 // Init. p. 
牋爍1 = two31/anc;牋牋牋牋 // Init. q1 = 2**p/|nc|. 
牋爎1 = two31 - q1*anc;牋?// Init. r1 = rem(2**p, |nc|). 
牋爍2 = two31/ad;牋牋牋牋?// Init. q2 = 2**p/|d|. 
牋爎2 = two31 - q2*ad;牋牋 // Init. r2 = rem(2**p, |d|). 
牋燿o {
牋牋?p = p + 1; 
牋牋牋q1 = 2*q1;牋牋牋牋牋 // Update q1 = 2**p/|nc|. 
牋牋牋r1 = 2*r1;牋牋牋牋牋 // Update r1 = rem(2**p, |nc|. 
牋牋牋if (r1 >= anc) {牋牋 // (Must be an unsigned 
牋牋牋牋爍1 = q1 + 1;牋牋?// comparison here). 
牋牋牋牋爎1 = r1 - anc;} 
牋牋牋q2 = 2*q2;牋牋牋牋牋 // Update q2 = 2**p/|d|. 
牋牋牋r2 = 2*r2;牋牋牋牋牋 // Update r2 = rem(2**p, |d|. 
牋牋牋if (r2 >= ad) {牋牋?// (Must be an unsigned 
牋牋牋牋爍2 = q2 + 1;牋牋?// comparison here). 
牋牋牋牋爎2 = r2 - ad;} 
牋牋牋delta = ad - r2; 
牋爙 while (q1 < delta || (q1 == delta && r1 == 0)); 
mag.M = q2 + 1; 
牋爄f (d < 0) mag.M = -mag.M; // Magic number and 
牋爉ag.s = p - 32;牋牋牋牋牋?// shift amount to return. 
牋爎eturn mag; 

To use the results of this program, the compiler should generate the li and mulhs instructions, generate the add if d > 0 and M < 0, or the sub if d < 0 and M > 0, and generate the shrsi if s > 0. Then, the shri and final add must be generated.

For W = 32, handling a negative divisor may be avoided by simply returning a precomputed result for d = 3 and d = 715,827,883, and using m(-d) = -m(d) for other negative divisors. However, that program would not be significantly shorter, if at all, than the one given in Figure 10-1.