Theorem DC2U. The least multiplier m is odd if p is not forced to equal W.

Theorem DC3U. For a given divisor d, there is only one multiplier m having the minimal value of p, if p is not forced to equal W.

The proofs of these theorems follow very closely the corresponding proofs for signed division.

For unsigned division, to find the divisors (if any) with
optimal programs of two instructions to obtain the quotient (`li`, `mulhu`), we can do an analysis similar to
that of the signed case (see "The Divisors with the Best Programs" on
page 175). The result is that such divisors are the factors of 2^{W} or 2^{W}
+ 1, except for d = 1. For the common word
sizes, this leaves very few nontrivial divisors that have optimal programs for
unsigned division. For W = 16, there are none.
For W = 32, there are only two: 641 and
6,700,417. For W = 64, again there are only
two: 274,177 and 67,280,421,310,721.

The case d = 2^{k}, k = 1,
2, ? deserves special mention. In this case, algorithm magicu produces p = W (forced), m = 2^{32
-} ^{k}. This is the minimal value
of m, but it is not the minimal value of M. Better code results if p
= W + k is
used, if sufficient simplifications are done. Then, m
= 2^{W}, M
= 0, a = 1, and s
- k. The generated code involves a
multiplication by 0 and can be simplified to a single shift
right k instruction. As a practical matter, divisors that are a power of
2 would probably be special-cased without using magicu.
(This phenomenon does not occur for signed division, because for signed
division m cannot be a power of 2. Proof: For d > 0, inequality (4) combined with (3b) implies
that d - 1 < 2^{p}/m < d. Therefore,
2^{p}/m
cannot be an integer. For d < 0, the result
follows similarly from (16) combined with (15b)).

For unsigned division, the code for the case m 2^{W} is
considerably worse than the code for the case m
< 2^{W}, if the machine does not
have `shrxi`. Hence it is of
interest to have some idea of how often the large multipliers arise. For W = 32, among the integers less than 100, there are
31 "bad" divisors: 1, 7, 14, 19, 21, 27, 28, 31, 35, 37, 38, 39, 42,
45, 53, 54, 55, 56, 57, 62, 63, 70, 73, 74, 76, 78, 84, 90, 91, 95, and 97.

If your machine does not have `mulhu`,
but it does have `mulhs` (or
signed long multiplication), the trick given in "High-Order Product Signed from/to Unsigned,"
on page 132, might make our method of doing unsigned division by a constant
still useful.

That section gives a seven-instruction sequence for getting `mulhu` from `mulhs`. However, for this application it simplifies, because
the magic number M is known. Thus, the compiler
can test the most significant bit of the magic number, and generate code such
as the following for the operation "`mulhu`
`q,M,n`." Here `t` denotes a temporary register.

?M_{31}= 0牋牋牋牋牋 M_{31}= 1

mulhs q,M,n牋牋牋 mulhs q,M,n

shrsi t,n,31牋牋?shrsi t,n,31

and牋 t,t,M牋牋牋 and牋 t,t,M

add牋 q,q,t牋牋牋 add牋 t,t,n

牋牋牋牋牋牋牋牋牋add牋 q,q,t

Accounting for the other instructions used with `mulhu`, this uses a total of six to eight
instructions to obtain the quotient of unsigned division by a constant, on a
machine that does not have unsigned multiply.

This trick may be inverted, to get `mulhs` in terms of `mulhu`.
The code is the same as that above except the `mulhs`
is changed to `mulhu` and the
final `add` in each column is
changed to `sub`.

Dropping the requirement that the magic number be minimal yields a simpler algorithm. In place of (27) we can use

and then use (26) to compute m, as before.

It should be clear that this algorithm is formally correct
(that is, that the value of m computed does
satisfy equation (22)), because its only difference
from the previous algorithm is that it computes a value of p that, for some values of d,
is unnecessarily large. It can be proved that the value of m computed from (30) and (26) is less than 2^{W} ^{+ 1}. We omit the proof and
simply give the algorithm (Figure 10-3).

struct mu {unsigned M;牋牋 // Magic number,

牋牋牋牋牋int a;牋牋牋牋牋 // "add" indicator,

牋牋牋牋牋int s;};牋牋牋牋 // and shift amount.

struct mu magicu2(unsigned d) {

牋牋牋牋牋牋牋牋牋牋牋牋牋 // Must have 1 <= d <= 2**32-1.

牋爄nt p;

牋爑nsigned p32, q, r, delta;

`牋爏truct mu magu; `

`牋爉agu.a = 0;牋牋牋牋牋牋 // Initialize "add" indicator. `

`牋爌 = 31;牋牋牋牋牋牋牋牋 // Initialize p. `

`牋爍 = 0x7FFFFFFF/d;牋牋牋 // Initialize q = (2**p - 1)/d. `

`牋爎 = 0x7FFFFFFF - q*d;牋 // Init. r = rem(2**p - 1, d). `

`牋燿o {`

`牋牋?p = p + 1; `

`牋牋牋if (p == 32) p32 = 1;牋牋 // Set p32 = 2**(p-32). `

`牋牋牋else p32 = 2*p32; `

`牋牋牋if (r + 1 >= d - r) {`

`牋牋牋牋 if (q >= 0x7FFFFFFF) magu.a = 1; `

`牋牋牋牋爍 = 2*q + 1;牋牋牋牋牋 // Update q. `

`牋牋牋牋爎 = 2*r + 1 - d;牋牋牋 // Update r. `

`牋牋牋} `

`牋牋牋else {`

`牋牋牋牋 if (q >= 0x80000000) magu.a = 1; `

`牋牋牋牋?/span>q = 2*q; `

牋牋牋牋爎 = 2*r + 1;

牋牋牋}

牋牋牋delta = d - 1 - r;

牋?span lang=EN-GB>} while (p < 64 && p32 < delta);

`牋爉agu.M = q + 1;牋牋牋牋 // Magic number and `

`牋爉agu.s = p - 32;牋牋牋?// shift amount to return `

`牋爎eturn magu;牋牋牋牋牋?// (magu.a was set above). `

`} `

Alverson [Alv] gives a much
simpler algorithm, discussed in the next section, but it gives somewhat large
values for m. The point of algorithm magicu2 is that it nearly always gives the minimal
value for m when d
2^{W} ^{- 1}. For W = 32, the smallest divisor for which magicu2 does not give the minimal multiplier is d = 102,807, for which magicu
calculates m = 2,737,896,999 and magicu2 calculates m =
5,475,793,997.

There is an analog of magicu2 for signed division by positive divisors, but it does not work out very well for signed division by arbitrary divisors.