### 14-2 Coordinates from Distance along the Hilbert Curve

To find the (x, y) coordinates of a point located at a distance s along the order n Hilbert curve, observe that the most significant two bits of the 2n-bit integer s determine which major quadrant the point is in. This is because the Hilbert curve of any order follows the overall pattern of the order 1 curve. If the most significant two bits of s are 00, the point is somewhere in the lower left quadrant, if 01 it is in the upper left quadrant, if 10 it is in the upper right quadrant, and if 11 it is in the lower right quadrant. Thus, the most significant two bits of s determine the most significant bits of the n-bit integers x and y, as follows:

 Most significant two bits of s Most significant bits of (x, y) 00 (0, 0) 01 (0, 1) 10 (1, 1) 11 (1, 0)

In any Hilbert curve, only four of the eight possible U-shapes occur. These are shown in Table 14-1 as graphics and as maps from two bits of s to a single bit of each of x and y.

Observe from Figure 14-2 that in all cases, the U-shape represented by map becomes, at the next level of detail, a U-shape represented by maps B, A, A, or D, depending on whether the length traversed in the first-mentioned map A is 0, 1, 2, or 3, respectively. Similarly, a U-shape represented by map becomes, at the next level of detail, a U-shape represented by maps A, B, B, or C, depending on whether the length traversed in the first-mentioned map B is 0, 1, 2, or 3, respectively.

These observations lead to the state transition table shown in Table 14-2, in which the states correspond to the mappings shown in Table 14-1.

A

B

C

D

00 (0, 0)

00 (0, 0)

00 (1, 1)

00 (1, 1)

01 (0, 1)

01 (1, 0)

01 (1, 0)

01 (0, 1)

10 (1, 1)

10 (1, 1)

10 (0, 0)

10 (0, 0)

11 (1, 0)

11 (0, 1)

11 (0, 1)

11 (1 0)

##### Table 14-2. State transition table for computing (X, Y) from S

If the current state is

and the next (to right) two bits of s are

then append to (x, y)

and enter state

A

00

(0, 0)

B

A

01

(0, 1)

A

A

10

(1, 1)

A

A

11

(1, 0)

D

B

00

(0, 0)

A

B

01

(1, 0)

B

B

10

(1, 1)

B

B

11

(0, 1)

C

C

00

(1, 1)

D

C

01

(1, 0)

C

C

10

(0, 0)

C

C

11

(0, 1)

B

D

00

(1, 1)

C

D

01

(0, 1)

D

D

10

(0, 0)

D

D

11

(1, 0)

A

To use the table, start in state A. The integer s should be padded with leading zeros so that its length is 2n, where n is the order of the Hilbert curve. Scan the bits of s in pairs from left to right. The first row of Table 14-2 means that if the current state is A and the currently scanned bits of s are 00, then output (0, 0) and enter state B. Then, advance to the next two bits of s. Similarly, the second row means that if the current state is A and the scanned bits are 01, then output (0, 1) and stay in state A.

The output bits are accumulated in left-to-right order. When the end of s is reached, the n-bit output quantities x and y are defined.

As an example, suppose n = 3 and

Because the process starts in state A and the initial bits scanned are 11, the process outputs (1, 0) and enters state D (fourth row). Then, in state D and scanning 01, the process outputs (0, 1) and stays in state D. Lastly, the process outputs (1, 1) and enters state C, although the state is now immaterial.

Thus, the output is (101, 011)梩hat is, x = 5 and y = 3.

A C program implementing these steps is shown in Figure 14-5. In this program, the current state is represented by an integer from 0 to 3 for states A through D, respectively. In the assignment to variable row, the current state is concatenated with the next two bits of s, giving an integer from 0 to 15, which is the applicable row number in Table 14-2. Variable row is used to access integers (expressed in hexadecimal) that are used as bit strings to represent the rightmost two columns of Table 14-2; that is, these accesses are in-register table lookups. Left-to-right in the hexadecimal values corresponds to bottom-to-top in Table 14-2.

##### Figure 14-5 Program for computing (x, y) from s.
`void hil_xy_from_s(unsigned s, int n, unsigned *xp, `
`牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋unsigned *yp) {`
` `
`牋 int i; `
`牋爑nsigned state, x, y, row; `
` `
`牋 state = 0;牋牋牋牋牋牋牋牋牋牋牋牋牋 ?/ Initialize. `
`牋爔 = y = 0; `
` `
`牋 for (i = 2*n - 2; i >= 0; i -= 2) {牋 // Do n times. `
`牋牋牋row = 4*state | (s >> i) & 3;牋牋?// Row in table. `
`牋牋牋x = (x << 1) | (0x936C >> row) & 1; `
`牋牋牋y = (y << 1) | (0x39C6 >> row) & 1; `
`牋牋牋state = (0x3E6B94C1 >> 2*row) & 3; // New state. `
`牋爙 `
`牋?xp = x;牋牋牋牋牋牋牋牋牋牋牋牋牋牋?// Pass back `
`牋?yp = y;牋牋牋牋牋牋牋牋牋牋牋牋牋牋?// results. `
`} `

[L&S] give a quite different algorithm. Unlike the algorithm of Figure 14-5, it scans the bits of s from right to left. It is based on the observation that one can map the least significant two bits of s to (x, y) based on the order 1 Hilbert curve, and then test the next two bits of s to the left. If they are 00, the values of x and y just computed should be interchanged, which corresponds to reflecting the order 1 Hilbert curve about the line x = y. (Refer to the curves of orders 1 and 2 shown in Figure 14-1 on page 241.) If these two bits are 01 or 10, the values of x and y are not changed. If they are 11, the values of x and y are interchanged and complemented. These same rules apply as one progresses leftward along the bits of s. They are embodied in Table 14-3 and the code of Figure 14-6. It is somewhat curious that the bits can be prepended to x and y first, and then the swap and complement operations can be done, including these newly prepended bits; the results are the same.

##### Figure 14-6 Lam and Shapiro method for computing (x, y) from s.
`void hil_xy_from_s(unsigned s, int n, unsigned *xp, `
`牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋unsigned *yp) {`
` `
`牋 int i, sa, sb; `
`牋爑nsigned x, y, temp; `
` `
`牋 for (i = 0; i < 2*n; i += 2) {`
`牋牋?sa = (s >> (i+1)) & 1;牋牋?// Get bit i+1 of s. `
`牋牋牋sb = (s >> i) & 1;牋牋牋牋?// Get bit i of s. `
` `
`牋牋?if ((sa ^ sb) == 0) {牋牋牋 // If sa,sb = 00 or 11, `
`牋牋牋牋爐emp = x;牋牋牋牋牋牋牋?// swap x and y, `
`牋牋牋牋爔 = y^(-sa);牋牋牋牋牋牋 // and if sa = 1, `
`牋牋牋牋爕 = temp^(-sa);牋牋牋牋?// complement them. `
`牋牋牋} `
`牋牋牋x = (x >> 1) | (sa << 31);?// Prepend sa to x and `
`牋牋牋y = (y >> 1) | ((sa ^ sb) << 31); // (sa^sb) to y. `
`牋爙 `
`牋?xp = x >> (32 - n);牋牋牋牋牋 // Right-adjust x and y `
`牋?yp = y >> (32 - n);牋牋牋牋牋 // and return them to `
`}牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋 // the caller. `

In Figure 14-6, variables x and y are uninitialized, which might cause an error message from some compilers. But the code functions correctly for whatever values x and y have initially.

The branch in the loop of Figure 14-6 can be avoided by doing the swap operation with the "three exclusive or" trick given in Section 2-19 on page 38.

##### Table 14-3. Lam and Shapiro method for computing (X, Y) from S

If the next (to left) two bits of s are

then

and prepend to (x, y)

00

Swap x and y

(0, 0)

01

No change

(0, 1)

10

No change

(1, 1)

11

Swap and complement x and y

(1, 0)

The if block can be replaced by the following code, where swap and cmpl are unsigned integers:

`swap = (sa ^ sb) - 1;?// -1 if should swap, else 0. `
`cmpl = -(sa & sb);牋牋 // -1 if should compl't, else 0. `
`x = x ^ y; `
`y = y ^ (x & swap) ^ cmpl; `
`x = x ^ y; `

However, this is nine instructions, versus about two or six for the if block, so the branch cost would have to be quite high for this to be a good choice.

The "swap and complement" idea of [L&S] suggests a logic circuit for generating the Hilbert curve. The idea behind the circuit described below is that as you trace along the path of an order n curve, you basically map pairs of bits of s to (x, y) according to map A of Table 14-1. As the trace enters various regions, however, the mapping output gets swapped, complemented, or both. The circuit of Figure 14-7 keeps track of the swap and complement requirements of each stage, uses the appropriate mapping to map two bits of s to (xi, yi), and generates the swap and complement signals for the next stage.

##### Figure 14-7. Logic circuit for incrementing (x, y) by one step along the Hilbert curve.

Assume there is a register containing the path length s and circuits for incrementing it. Then, to find the next point on the Hilbert curve, first increment s and then transform it as described in Table 14-4. This is a left-to-right process, which is a bit of a problem because incrementing s is a right-to-left process. Thus, the time to generate a new point on an order n Hilbert curve is proportional to 2n (for incrementing s) plus n (for transforming s to (x, y)).

Figure 14-7 shows this computation as a logic circuit. In this figure, S denotes the swap signal and C denotes the complement signal.

##### Table 14-4. Logic for computing (X, Y) from S

If the next (to right) two bits of s are

then append to (x, y)

and set

00

(0, 0)[*]

01

(0, 1)[*]

No change

10

(1, 1)[*]

No change

11

(1, 0)[*]

[*] Possibly swapped and/or complemented

The logic circuit of Figure 14-7 suggests another way to compute (x, y) from s. Notice how the swap and complement signals propagate from left to right through the n stages. This suggests that it may be possible to use the parallel prefix operation to quickly (in log2n steps rather than n - 1) propagate the swap and complement information to each stage, and then do some word-parallel logical operations to compute x and y, using the equations in Figure 14-7. The values of x and y are intermingled in the even and odd bit positions of a word, so they have to be separated by the unshuffle operation (see page 107). This might seem a bit complicated, and likely to pay off only for rather large values of n, but let us see how it goes.

A procedure for this operation is shown in Figure 14-8 [GLS1]. The procedure operates on fullword quantities, so it first pads the input s on the left with '01' bits. This bit combination does not affect the swap and complement quantities. Next, a quantity cs (complement-swap) is computed. This word is of the form cscs...cs, where each c (a single bit), if 1, means that the corresponding pair of bits is to be complemented, and each s means that the corresponding pair of bits is to be swapped, following Table 14-3. In other words, these two statements map each pair of bits of s as follows:

 s2i + 1 s2i cs 0 0 01 0 1 00 1 0 00 1 1 11
##### Figure 14-8 Parallel prefix method for computing (x, y) from s.
`void hil_xy_from_s(unsigned s, int n, unsigned *xp, `
`牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋unsigned *yp) {`
`牋 unsigned comp, swap, cs, t, sr; `
` `
`牋 s = s | (0x55555555 << 2*n); // Pad s on left with 01 `
`牋爏r = (s >> 1) & 0x55555555;?// (no change) groups. `
`牋燾s = ((s & 0x55555555) + sr) // Compute complement & `
`牋牋牋牋^ 0x55555555;牋牋牋牋牋 // swap info in two-bit `
`牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋// groups. `
`牋?/ Parallel prefix xor op to propagate both complement `
`牋?/ and swap info together from left to right (there is `
`牋?/ no step "cs ^= cs >> 1", so in effect it computes `
`牋?/ two independent parallel prefix operations on two `
`牋?/ interleaved sets of sixteen bits). `
` `
`牋 cs = cs ^ (cs >> 2); `
`牋燾s = cs ^ (cs >> 4); `
`牋燾s = cs ^ (cs >> 8); `
`牋燾s = cs ^ (cs >> 16); `
`牋爏wap = cs & 0x55555555;牋牋?// Separate the swap and `
`牋燾omp = (cs >> 1) & 0x55555555;?// complement bits. `
` `
`牋 t = (s & swap) ^ comp;牋牋牋 // Calculate x and y in `
`牋爏 = s ^ sr ^ t ^ (t << 1);牋 // the odd & even bit `
`牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋// positions, resp. `
`牋爏 = s & ((1 << 2*n) - 1);牋?// Clear out any junk `
`牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋牋// on the left (unpad). `
` `
`牋 // Now "unshuffle" to separate the x and y bits. `
` `
`牋 t = (s ^ (s >> 1)) & 0x22222222; s = s ^ t ^ (t << 1); `
`牋爐 = (s ^ (s >> 2)) & 0x0C0C0C0C; s = s ^ t ^ (t << 2); `
`牋爐 = (s ^ (s >> 4)) & 0x00F000F0; s = s ^ t ^ (t << 4); `
`牋爐 = (s ^ (s >> 8)) & 0x0000FF00; s = s ^ t ^ (t << 8); `
` `
`牋 *xp = s >> 16;牋牋牋牋牋牋牋 // Assign the two halves `
`牋?yp = s & 0xFFFF;牋牋牋牋牋?// of t to x and y. `
`} `

This is the quantity to which we want to apply the parallel prefix operation. PP-XOR is the one to use, going from left to right, because successive 1-bits meaning to complement or to swap have the same logical properties as exclusive or: Two successive 1-bits cancel each other.

Both signals (complement and swap) are propagated in the same PP-XOR operation, each working with every other bit of cs.

The next four assignment statements have the effect of translating each pair of bits of s into (x, y) values, with x being in the odd (leftmost) bit positions, and y being in the even bit positions. Although the logic may seem obscure, it is not difficult to verify that each pair of bits of s is transformed by the logic of the first two Boolean equations in Figure 14-7. (Suggestion: Consider separately how the even and odd bit positions are transformed, using the fact that t and sr are 0 in the odd positions.)

The rest of the procedure is self-explanatory. It executes in 66 basic RISC instructions (constant, branch-free), versus about 19n + 10 (average) for the code of Figure 14-6 (based on compiled code; includes prologs and epilogs, which are essentially nil). Thus, the parallel prefix method is faster for n 3.