10-11 Miscellaneous Topics (Unsigned)

Theorem DC2U. The least multiplier m is odd if p is not forced to equal W.

Theorem DC3U. For a given divisor d, there is only one multiplier m having the minimal value of p, if p is not forced to equal W.

The proofs of these theorems follow very closely the corresponding proofs for signed division.

The Divisors with the Best Programs (Unsigned)

For unsigned division, to find the divisors (if any) with optimal programs of two instructions to obtain the quotient (li, mulhu), we can do an analysis similar to that of the signed case (see "The Divisors with the Best Programs" on page 175). The result is that such divisors are the factors of 2^W or 2^W + 1, except for d = 1. For the common word sizes, this leaves very few nontrivial divisors that have optimal programs for unsigned division. For W = 16, there are none. For W = 32, there are only two: 641 and 6,700,417. For W = 64, again there are only two: 274,177 and 67,280,421,310,721.

The case d = 2^k, k = 1, 2, ? deserves special mention. In this case, algorithm magicu produces p = W (forced), m = 2^32
- ^k. This is the minimal value of m, but it is not the minimal value of M. Better code results if p = W + k is used, if sufficient simplifications are done. Then, m = 2^W, M = 0, a = 1, and s - k. The generated code involves a multiplication by 0 and can be simplified to a single shift right k instruction. As a practical matter, divisors that are a power of 2 would probably be special-cased without using magicu. (This phenomenon does not occur for signed division, because for signed division m cannot be a power of 2. Proof: For d > 0, inequality (4) combined with (3b) implies that d - 1 < 2^p/m < d. Therefore, 2^p/m cannot be an integer. For d < 0, the result follows similarly from (16) combined with (15b)).

For unsigned division, the code for the case m 2^W is considerably worse than the code for the case m < 2^W, if the machine does not have shrxi. Hence it is of interest to have some idea of how often the large multipliers arise. For W = 32, among the integers less than 100, there are 31 "bad" divisors: 1, 7, 14, 19, 21, 27, 28, 31, 35, 37, 38, 39, 42, 45, 53, 54, 55, 56, 57, 62, 63, 70, 73, 74, 76, 78, 84, 90, 91, 95, and 97.

Using Signed in Place of Unsigned Multiply, and the Reverse

If your machine does not have mulhu, but it does have mulhs (or signed long multiplication), the trick given in "High-Order Product Signed from/to Unsigned," on page 132, might make our method of doing unsigned division by a constant still useful.

That section gives a seven-instruction sequence for getting mulhu from mulhs. However, for this application it simplifies, because the magic number M is known. Thus, the compiler can test the most significant bit of the magic number, and generate code such as the following for the operation "mulhu q,M,n." Here t denotes a temporary register.

?M₃₁ = 0           M₃₁ = 1

mulhs q,M,n       mulhs q,M,n

shrsi t,n,31    ?shrsi t,n,31

and   t,t,M       and   t,t,M

add   q,q,t       add   t,t,n

                  add   q,q,t

Accounting for the other instructions used with mulhu, this uses a total of six to eight instructions to obtain the quotient of unsigned division by a constant, on a machine that does not have unsigned multiply.

This trick may be inverted, to get mulhs in terms of mulhu. The code is the same as that above except the mulhs is changed to mulhu and the final add in each column is changed to sub.

A Simpler Algorithm (Unsigned)

Dropping the requirement that the magic number be minimal yields a simpler algorithm. In place of (27) we can use

Equation 30

and then use (26) to compute m, as before.

It should be clear that this algorithm is formally correct (that is, that the value of m computed does satisfy equation (22)), because its only difference from the previous algorithm is that it computes a value of p that, for some values of d, is unnecessarily large. It can be proved that the value of m computed from (30) and (26) is less than 2^W ^{+ 1}. We omit the proof and simply give the algorithm (Figure 10-3).

Figure 10-3 Simplified algorithm for computing the magic number, unsigned division.

struct mu {unsigned M;     // Magic number,

          int a;           // "add" indicator,

          int s;};         // and shift amount.

struct mu magicu2(unsigned d) {

                           // Must have 1 <= d <= 2**32-1.

   int p;

   unsigned p32, q, r, delta;

   struct mu magu;

   magu.a = 0;             // Initialize "add" indicator.

   p = 31;                 // Initialize p.

   q = 0x7FFFFFFF/d;       // Initialize q = (2**p - 1)/d.

   r = 0x7FFFFFFF - q*d;   // Init. r = rem(2**p - 1, d).

   do {

    ?p = p + 1;

      if (p == 32) p32 = 1;     // Set p32 = 2**(p-32).

      else p32 = 2*p32;

      if (r + 1 >= d - r) {

         if (q >= 0x7FFFFFFF) magu.a = 1;

         q = 2*q + 1;           // Update q.

         r = 2*r + 1 - d;       // Update r.

      else {

         if (q >= 0x80000000) magu.a = 1;

        ?/span>q = 2*q;

         r = 2*r + 1;

      delta = d - 1 - r;

  ?span
lang=EN-GB>} while (p < 64 && p32 < delta);

   magu.M = q + 1;         // Magic number and

   magu.s = p - 32;      ?// shift amount to return

   return magu;          ?// (magu.a was set above).

Alverson [Alv] gives a much simpler algorithm, discussed in the next section, but it gives somewhat large values for m. The point of algorithm magicu2 is that it nearly always gives the minimal value for m when d 2^W ^{- 1}. For W = 32, the smallest divisor for which magicu2 does not give the minimal multiplier is d = 102,807, for which magicu calculates m = 2,737,896,999 and magicu2 calculates m = 5,475,793,997.

There is an analog of magicu2 for signed division by positive divisors, but it does not work out very well for signed division by arbitrary divisors.