For cube roots, Newton's method does not work out very well. The iterative formula is a bit complex:
and there is of course the problem of getting a good starting value x0.
However, there is a hardware algorithm, similar to the hardware algorithm for square root, that is not too bad for software. It is shown in Figure 11-5.
int icbrt(unsigned x) {
牋 int s;
牋爑nsigned y, b;
牋 s = 30;
牋爕 = 0;
牋爓hile(s >= 0) {牋牋牋牋牋牋?// Do 11 times.
牋牋牋y = 2*y;
牋牋牋b = (3*y*(y + 1) + 1) << s;
牋牋牋s = s - 3;
牋牋牋if (x >= b) {
牋牋牋牋 x = x - b;
牋牋牋牋爕 = y + 1;
牋牋牋}
牋爙
牋爎eturn y;
}
The three add's of 1 can be replaced by or's of 1, because the value being incremented is even. Even with this change, the algorithm is of questionable value for implementation in hardware, mainly because of the multiplication y * (y + 1).
This multiplication is easily avoided by applying the compiler optimization of strength reduction to the y-squared term. Introduce another unsigned variable y2 that will have the value of y-squared, by updating y2 appropriately wherever y receives a new value. Just before y = 0 insert y2 = 0. Just before y = 2*y insert y2 = 4*y2. Change the assignment to b to b = (3*y2 + 3*y + 1) << s (and factor out the 3). Just before y = y + 1, insert y2 = y2 + 2*y + 1. The resulting program has no multiplications except by small constants, which can be changed to shift's and add's. This program has three add's of 1, which can all be changed to or's of 1. It is faster unless your machine's multiply instruction takes only two or fewer cycles.
Caution: [GLS1] points out that the code of Figure 11-5, and its strength-reduced derivative, do not work if adapted in the obvious way to a 64-bit machine. The assignment to b can then overflow. This problem can be avoided by dropping the shift left of s from the assignment to b, inserting after the assignment to b the assignment bs = b << s, and changing the two lines if (x >= b) {x = x - b ?to if (x >= bs && b == (bs >> s)) {x = x - bs ?