A very old trick is that of exchanging the contents of two registers without using a third [IBM]:

This works well on a two-address machine. The trick also works if is replaced by the logical operation (complement of exclusive or), and can be made to work in various ways with add's and subtract's:

Unfortunately, each of these has an instruction that is unsuitable for a two-address machine, unless the machine has "reverse subtract."

This little trick can actually be useful in the application of double buffering, in which two pointers are swapped. The first instruction can be factored out of the loop in which the swap is done (although this negates the advantage of saving a register):

The problem here is to exchange the contents
of two registers x and y wherever a mask bit m_{i} =
1, and to leave x and y unaltered wherever m_{i} =
0. By "corresponding" fields, we mean that no shifting is required. The
1-bits of m need not be contiguous. The
straightforward method is as follows:

By using "temporaries" for the four and expressions, this can be seen to require seven instructions, assuming that either m or m?/span> can be loaded with a single instruction and the machine has and not as a single instruction. If the machine is capable of executing the four (independent) and expressions in parallel, the execution time is only three cycles.

A method that is probably better (five instructions, but four cycles on a machine with unlimited instruction-level parallelism) is shown in column (a) below. It is suggested by the "three exclusive or" code for exchanging registers.

The steps in column (b) do the same exchange as that of column (a), but column (b) is useful if m does not fit in an immediate field but m?/span> does, and the machine has the equivalence instruction.

Still another method is shown in column (c) above [GLS1]. It also takes five instructions (again assuming one instruction must be used to load m into a register), but executes in only three cycles on a machine with sufficient instruction-level parallelism.

Assume a register x has two fields (of the same length) that are to be swapped, without altering other bits in the register. That is, the object is to swap fields B and D, without altering fields A, C, and E, in the computer word illustrated below. The fields are separated by a shift distance k.

Straightforward code would shift D and B to their new positions, and combine the words with and and or operations, as follows:

Here, m is a mask with 1's in field D (and 0's elsewhere), and m' is a mask with 1's in fields A, C, and E. This code requires nine instructions and four cycles on a machine with unlimited instruction-level parallelism, allowing for two instructions to load the two masks.

A method that requires only seven instructions and executes in five cycles, under the same assumptions, is shown below [GLS1]. It is similar to the code in column (c) on page 39 for interchanging corresponding fields of two registers. Again, m is a mask that isolates field D.

The idea is that t_{1} contains B D in position D (and 0's elsewhere),
and t_{2} contains B D in position B. This code, and the
straightforward code given earlier, work correctly if B
and D are "split fields"梩hat is, if
the 1-bits of mask m are not contiguous.

The exchange methods of the preceding two sections, which are based on exclusive or, degenerate into no-operations if the mask m is 0. Hence, they can perform an exchange of entire registers, or of corresponding fields of two registers, or of two fields of the same register, if m is set to all 1's if some condition c is true, and to all 0's if c is false. This gives branch-free code if m can be set up without branching.