10-11 Miscellaneous Topics (Unsigned)

Theorem DC2U. The least multiplier m is odd if p is not forced to equal W.

Theorem DC3U. For a given divisor d, there is only one multiplier m having the minimal value of p, if p is not forced to equal W.

The proofs of these theorems follow very closely the corresponding proofs for signed division.

The Divisors with the Best Programs (Unsigned)

For unsigned division, to find the divisors (if any) with optimal programs of two instructions to obtain the quotient (li, mulhu), we can do an analysis similar to that of the signed case (see "The Divisors with the Best Programs" on page 175). The result is that such divisors are the factors of 2W or 2W + 1, except for d = 1. For the common word sizes, this leaves very few nontrivial divisors that have optimal programs for unsigned division. For W = 16, there are none. For W = 32, there are only two: 641 and 6,700,417. For W = 64, again there are only two: 274,177 and 67,280,421,310,721.

The case d = 2k, k = 1, 2, ? deserves special mention. In this case, algorithm magicu produces p = W (forced), m = 232 - k. This is the minimal value of m, but it is not the minimal value of M. Better code results if p = W + k is used, if sufficient simplifications are done. Then, m = 2W, M = 0, a = 1, and s - k. The generated code involves a multiplication by 0 and can be simplified to a single shift right k instruction. As a practical matter, divisors that are a power of 2 would probably be special-cased without using magicu. (This phenomenon does not occur for signed division, because for signed division m cannot be a power of 2. Proof: For d > 0, inequality (4) combined with (3b) implies that d - 1 < 2p/m < d. Therefore, 2p/m cannot be an integer. For d < 0, the result follows similarly from (16) combined with (15b)).

For unsigned division, the code for the case m 2W is considerably worse than the code for the case m < 2W, if the machine does not have shrxi. Hence it is of interest to have some idea of how often the large multipliers arise. For W = 32, among the integers less than 100, there are 31 "bad" divisors: 1, 7, 14, 19, 21, 27, 28, 31, 35, 37, 38, 39, 42, 45, 53, 54, 55, 56, 57, 62, 63, 70, 73, 74, 76, 78, 84, 90, 91, 95, and 97.

Using Signed in Place of Unsigned Multiply, and the Reverse

If your machine does not have mulhu, but it does have mulhs (or signed long multiplication), the trick given in "High-Order Product Signed from/to Unsigned," on page 132, might make our method of doing unsigned division by a constant still useful.

That section gives a seven-instruction sequence for getting mulhu from mulhs. However, for this application it simplifies, because the magic number M is known. Thus, the compiler can test the most significant bit of the magic number, and generate code such as the following for the operation "mulhu q,M,n." Here t denotes a temporary register.

?M31 = 0牋牋牋牋牋 M31 = 1 
mulhs q,M,n牋牋牋 mulhs q,M,n 
shrsi t,n,31牋牋?shrsi t,n,31 
and牋 t,t,M牋牋牋 and牋 t,t,M 
add牋 q,q,t牋牋牋 add牋 t,t,n 
牋牋牋牋牋牋牋牋牋add牋 q,q,t 

Accounting for the other instructions used with mulhu, this uses a total of six to eight instructions to obtain the quotient of unsigned division by a constant, on a machine that does not have unsigned multiply.

This trick may be inverted, to get mulhs in terms of mulhu. The code is the same as that above except the mulhs is changed to mulhu and the final add in each column is changed to sub.

A Simpler Algorithm (Unsigned)

Dropping the requirement that the magic number be minimal yields a simpler algorithm. In place of (27) we can use

Equation 30



and then use (26) to compute m, as before.

It should be clear that this algorithm is formally correct (that is, that the value of m computed does satisfy equation (22)), because its only difference from the previous algorithm is that it computes a value of p that, for some values of d, is unnecessarily large. It can be proved that the value of m computed from (30) and (26) is less than 2W + 1. We omit the proof and simply give the algorithm (Figure 10-3).

Figure 10-3 Simplified algorithm for computing the magic number, unsigned division.
struct mu {unsigned M;牋牋 // Magic number, 
牋牋牋牋牋int a;牋牋牋牋牋 // "add" indicator, 
牋牋牋牋牋int s;};牋牋牋牋 // and shift amount. 
struct mu magicu2(unsigned d) {
牋牋牋牋牋牋牋牋牋牋牋牋牋 // Must have 1 <= d <= 2**32-1. 
牋爄nt p; 
牋爑nsigned p32, q, r, delta; 
牋爏truct mu magu; 
牋爉agu.a = 0;牋牋牋牋牋牋 // Initialize "add" indicator. 
牋爌 = 31;牋牋牋牋牋牋牋牋 // Initialize p. 
牋爍 = 0x7FFFFFFF/d;牋牋牋 // Initialize q = (2**p - 1)/d. 
牋爎 = 0x7FFFFFFF - q*d;牋 // Init. r = rem(2**p - 1, d). 
牋燿o {
牋牋?p = p + 1; 
牋牋牋if (p == 32) p32 = 1;牋牋 // Set p32 = 2**(p-32). 
牋牋牋else p32 = 2*p32; 
牋牋牋if (r + 1 >= d - r) {
牋牋牋牋 if (q >= 0x7FFFFFFF) magu.a = 1; 
牋牋牋牋爍 = 2*q + 1;牋牋牋牋牋 // Update q. 
牋牋牋牋爎 = 2*r + 1 - d;牋牋牋 // Update r. 
牋牋牋else {
牋牋牋牋 if (q >= 0x80000000) magu.a = 1; 
牋牋牋牋?/span>q = 2*q; 
牋牋牋牋爎 = 2*r + 1; 
牋牋牋delta = d - 1 - r; 
lang=EN-GB>} while (p < 64 && p32 < delta); 
牋爉agu.M = q + 1;牋牋牋牋 // Magic number and 
牋爉agu.s = p - 32;牋牋牋?// shift amount to return 
牋爎eturn magu;牋牋牋牋牋?// (magu.a was set above). 

Alverson [Alv] gives a much simpler algorithm, discussed in the next section, but it gives somewhat large values for m. The point of algorithm magicu2 is that it nearly always gives the minimal value for m when d 2W - 1. For W = 32, the smallest divisor for which magicu2 does not give the minimal multiplier is d = 102,807, for which magicu calculates m = 2,737,896,999 and magicu2 calculates m = 5,475,793,997.

There is an analog of magicu2 for signed division by positive divisors, but it does not work out very well for signed division by arbitrary divisors.