ALGORITHM ALLEY

Genetic Annealing

Kenneth V. Price

Ken holds a BS in physics from Rensselear Polytechnic Institute. He is currently engaged in research on artificial intelligence and system modeling. Ken can be reached at kprice@solano.community.net.

by Bruce Schneier

How do you determine the best machine for a job?

Step #1. Express all the parameters of your machine using some kind of coding machine. Each particular machine will then be expressed as a list of these parameters.

Step #2. Generate some machines with random parameters.

Step #3. Test the machines against each other and select the few best ones.

Step #4. Generate new machines by combining parameters from the ones selected in Step #3, occasionally making random modifications in some of the parameters.

Step #5. Repeat Steps #3 and #4 until you're tired of watching the show.

What I've just described is a genetic algorithm, and it's done a pretty good job with life on this planet. Generation after generation of iterations, selecting the best few to reproduce and occasionally throwing in the odd mutation, has resulted in millions of different species. The end result is that there are lifeforms suited for every niche in every environment.

Genetic algorithms might not be the fastest way to generate the "best" machine for a job (it would be a lot easier to design a sentient lifeform from scratch than to wait for evolution to stumble across a human), but their main advantage is that they require no knowledge about the system. For instance, the traditional way of generating an aircraft wing is to study aerodynamics and spend months calculating airflows, lifts, stresses, and the like. Alternatively, you could just express a generic wing as a string of parameters (size, shape, weight, and so on), create some random wings, test them against each other, and let the fittest reproduce. A weekend of iteration on a supercomputer would probably yield a pretty good design.

Solving problems such as these falls into the category of "combinatorial" or "global" optimization. Other approaches to finding the "best" (according to some predefined criteria) configuration involve simulated annealing, a stochastic-search technique familiar to chip designers, who must determine the best geometrical arrangement for millions of circuits on microprocessors.

DDJ has examined both genetic algorithms and simulated annealing before (see "Genetic Algorithms" by Mike Morrow, April 1991 and "Simulated Annealing" by Michael P. McLaughlin, September 1989). In this month's column, however, Kenneth Price combines the two techniques, resulting in an approach called "genetic annealing" which takes the best from both the genetic and annealing worlds.

Genetic annealing is a hybrid, random-search technique that fuses simulated annealing and genetic methodologies into a more efficient algorithm. Like its genetic counterpart, the genetic-annealing algorithm creates new solutions by exchanging genetic material between members of a population. Instead of using a competitive tournament to decide which members will survive into the next generation, however, genetic annealing uses an adaptive, thermodynamic criterion based on a simple feedback scheme. With this strategy, the genetic-annealing algorithm can outperform the algorithms whose principles it embodies even though it is hardly more complex than running a suite of greedy algorithms in parallel.

Bipartitioning a Necklace

To evaluate the effectiveness of a random-search algorithm, it helps to use a problem whose global minimum is already known. To this end, I have chosen a simple graph-bipartitioning problem to illustrate how genetic annealing works. A graph is a collection of vertices connected by edges. Bipartitioning is the process of assigning each vertex to one of two sets. The goal of the bipartitioning problem is to assign half of a graph's vertices to each set so that the number of edges connecting the two sets is minimized. In many ways, this problem resembles the real-world task of the circuit designer, who wants to divide a set of components equally between two chips so that the number of lines needed to connect the chips is minimal.

The particular graph used in this example is commonly called a "necklace" because of its obvious resemblance to neckwear; see Figure 1(a). An [M,2] necklace is a ring of M vertices ("beads") each of which is connected radially to a companion vertex for a total of 2M vertices. As Figure 1(b) illustrates, any diameter across the necklace constitutes a bipartition since it divides the necklace into two sets, each of which contains M vertices. The bipartition generated by a diameter is optimal because only two edges cross the diameter to connect opposing sides. Because of its rotational symmetry, an [M,2] necklace has M optimal bipartitions in all.

For the purposes of computation, you can represent an [M,2] necklace as a string of 2M bits. Bits at even positions are the beads on the ring itself, and bits at odd positions represent the dangling beads; see Figure 2. The actual binary value assigned to a bead indicates the set to which it belongs.

To count the number of edges that span sets, perform an XOR operation on every pair of bits that represent connected vertices and sum the results. When the vertices at the ends of an edge belong to the same set, the bits they contain will be identical, and the XOR operation will return a 0. Conversely, a pair of connected vertices with different binary values represents an edge with a vertex in each set. The XOR operation counts these spans by returning a 1.

To simulate a heat bath, you will need an entire population of bit strings. The number of bit strings depends on the problem: for this example a population of 20 strings is adequate. For convenience, you can arrange the bit strings into a two-dimensional array so that one coordinate locates a string within the population while the other coordinate gives you the position of a bit within a string.

Next, you initialize the bit strings by filling each one with an equal number of 1s and 0s. In a genetic-annealing experiment, you must randomly select the initial population to ensure that the system starts out like a "white-hot" thermodynamic ensemble. Starting from this condition makes it less likely that the population will prematurely cool to a suboptimal minimum. To randomize a bit string while keeping its populations of 1s and 0s equal, try swapping each bit with another bit from a randomly selected location within the same string.

The number of edges having a vertex in each set measures the quality of a bipartition as a solution. In the context of a genetic algorithm, this number reflects the "fitness" of a bit string to survive as a "gene" in a competitive selection procedure. By contrast, simulated annealing uses the fitness of a configuration-like energy in order to exploit the laws of statistical mechanics. Genetic annealing adopts the annealing metaphor, treating a configuration's fitness-like energy, even though the annealing process itself is driven by population dynamics.

You can also use the genetic-annealing algorithm to "cool" a population of configurations to a condition of maximum energy. All you need to do is replace the "greater than" symbol with a "less than" symbol in the portion of code that determines whether or not a mutant is acceptable. This has the effect of reversing the sign of the spontaneous energy so that during the reheating cycle each threshold is lowered by the amount dE=C*DE/N. This feature lets you use a problem's natural measure of fitness without having to invert it. Figure 3 shows what the maximum energy solution to the necklace problem looks like for M=8.

This energy bank approach to simulated annealing can substantially reduce the time you need to design and execute difficult combinatorial optimizations. Traditional annealing methods invoke the venerable Metropolis algorithm to forge a link between the laws of statistical mechanics and combinatorial optimization. Despite its great utility, the Metropolis algorithm is computationally expensive because the decision of whether or not to accept a mutant usually requires that you generate a random number and compare it to an exponential term. Configurations that are not much worse than their progenitors may be rejected depending on the random number generated. The genetic-annealing algorithm accepts every configuration that is not much worse than its progenitor based on the result of a simple comparison that requires neither a random-number generator nor the use of acceptance probabilities.

Traditional annealing methods also rely on an empirically derived "annealing schedule" to control the rate at which the Metropolis temperature should be lowered. The advantage of using thresholds to track the time-averaged loss of spontaneous energy from individual configurations is that you can maintain equilibrium at any temperature without using an annealing schedule just by restoring energy to each threshold at the same average rate that the ensemble loses it. In most cases, this reduces the researcher's challenge to finding the smallest value of C that will allow the population to maintain equilibrium as it anneals.

Mutations

Choosing the right mutation scheme is just as important to the success of a random search as determining which mutants are acceptable. A mutation can be an elementary operation like flipping a bit, or it can be a more complex procedure like a symmetry operation or crossbreeding. In general, each problem will have its own menu of mutations. Frequently, mutations are subject to constraints based on a problem's symmetries. For example, mutations used in the bipartitioning problem must preserve the equal number of 1s and 0s in each bit string.

Of all the ways you can alter a bit string without changing the number of 1s (or 0s) that it contains, swapping bits is perhaps the easiest. To swap a pair of bits, just randomly select two bits with different binary values and replace each with its one's complement.

While it may be the simplest form of mutation, swapping a single pair of bits provides you with only a limited search capability. You can explore remote regions of solution space more effectively if you occasionally swap more than one pair of bits at a time. For example, you can usually enhance the efficiency of a random search by drawing the number of elements involved in a mutation from an exponential distribution. You can characterize an exponential distribution by its decay constant, EX, where EX is a real number in the half-open interval [0,1). In the case of swapping bits, EX=.5 means that a single pair of bits is swapped half of the time, two pairs of bits are swapped one quarter of the time, three pairs of bits are swapped one eighth of the time, and so on. EX=0 means that you never swap more than one pair of bits. With the right decay constant, an exponential distribution of mutation sizes makes distant points in solution space more accessible without sacrificing the ability to efficiently fine tune a configuration.

Symmetry Operations as Mutations

In most problems, a well-chosen symmetry operation can make a valuable addition to your mutation scheme. When appropriate, operations like rotations and reflections can generate large-scale variations of a configuration without des-troying crucial local relationships. For example, a reflection operation in which the first four bits of the sequence: 0000111100001111 are exchanged with the second four bits, produces the optimal configuration: 0000000011111111. Despite altering the target string on a relatively large scale, the reflection procedure not only conserves the number of 1s, but also maintains the XOR relationships between the bits within the reflected substring. To accomplish the same transformation in a series of random swap operations would require considerable good fortune. Despite their utility, symmetry mutations reduce the novelty of a "random" search, so you should not rely on them exclusively.

Splicing

Perhaps the greatest advantage that cooling a population in parallel affords you is the opportunity to employ crossbreeding as a form of mutation. Actually, instead of mating pairs of bit strings in a separate cross-breeding procedure (as you do in the genetic algorithm), the genetic-annealing algorithm transfers genetic information between bit strings by "splicing" it. Splicing tentatively replaces a portion of the target string with the corresponding section from another bit string chosen at random from the population. In the splicing scenario, the randomly chosen string remains unaltered while it donates a copy of a segment of its "genetic material" for the target string to use as a trial mutation; see Figure 5. By drawing upon the success of other configurations in the population, splicing brings to the genetic-annealing algorithm all of the problem-solving power that crossbreeding imparts to the genetic algorithm.

You must be careful when implementing a splicing procedure for the bipartitioning problem because the number of 1s in the substring being donated must be the same as the number of 1s in the target substring. To ensure that 1s are conserved, the size of the donated substring is allowed to grow until the number of 1s it contains equals the number of 1s in the corresponding target substring and until the two substrings differ in at least one bit position, or until the whole string is used.

The mutation scheme in GENNEAL.C (described later) includes all three forms of mutation: random bit swapping, a reflection operation, and splicing. PS, PR, and PX are the probabilities that a mutation will swap random bits, reflect a substring, or splice a substring, respectively. Of course, PS+PR+PX=1. The size of a swap mutation is controlled by EXS, while EXR determines the size of a reflection mutation.

Putting it All Together

Along with the mutation probabilities and their respective decay constants, other control variables include the population size, N, and the cooling constant, C. Given a necklace of size M (for a minimum energy of 2, make M even), what combination of these variables will repeatedly produce an optimal partition with a minimum of computational effort?

Since run times on a sequential computer are proportional to population size, you will usually want to reduce N until further reductions begin to jeopardize the population's ability to simulate a heat bath. A population containing anywhere from 10 to 40 configurations should be sufficient for most problems. Smaller populations hamper the annealing process by failing to provide energy when it is needed, while larger populations increase execution times while enhancing the thermal simulation only incrementally.

The robust flexibility of the genetic-annealing algorithm makes it easy to experiment with a variety of computational approaches to find the one that works best. You can transform the genetic-annealing algorithm into a greedy algorithm, a suite of annealing programs, or a genetic-style algorithm simply by changing a few control constants. When you approach a new problem, try starting with N=20 and run a suite of greedy algorithms in parallel by setting C=0 and PX=0 (no annealing or splicing). These results will provide you with a convenient performance benchmark.

Results

Although it occasionally turns up an optimal configuration, the greedy algorithm performs poorly because it is not consistently successful. The true performance of the greedy algorithm, like any stochastic search, is more reliably gauged by the average and variance of an ensemble of results.

In a series of ten trials with N=20 and M=80, the greedy version of the genetic-annealing algorithm (in which C, EXS, PR, and PX are all zero), produced an average minimum energy of 6.62 and a variance of 4.63. In each trial, the program was allowed to run for 256,000 generations. When each mutation swapped only one pair of bits (EXS=0), just four out of the total of 200 configurations (10 trialsx20 strings per trial) reached the optimal energy of 2. Increasing EXS produced a modest improvement. A series of ten trials that used EXS=.35 turned up six optimal bit strings and dropped the average final energy to 6.37. For the sake of simplicity, the random-swap procedure uses a constant value of EXS=.35 throughout the remainder of this demonstration.

As expected, results improved significantly once the population was annealed. With N and C both greater than 0 and no splicing, this version of the genetic-annealing algorithm resembles a suite of annealing algorithms running in parallel. Initial trials, which used C=.84, drove the average energy down to 4.83, reduced the variance to 2.61, and produced 21 optimal bit strings. The next set of trials used C=.92, which cooled the population at about half the rate that C=.84 did, because annealing times are roughly proportional to 1/(1--C). With C=.92, the average final energy dropped to 4.51, and 28 bit strings found their way to an optimal state. Cutting the cooling rate in half again gave still better results, but the returns were clearly diminishing. C=.96 lowered the average final energy to 4.43 and produced a total of 38 optimal bit strings. When the cooling rate was halved yet again to C=.98, the average final energy rose to 4.52, indicating that cooling was so slow that even 256,000 generations were not enough to completely quench the population.

In the absence of the reflection mutation, schemes that used only a minuscule fraction of splicing performed best. If splicing is used too frequently, the population will misconverge. Annealing, however, can overcome misconvergence. For example, the "greedy-genetic" scheme: C=0, PX=.0002, PS=.9998 produced optimal results in only five of ten trials with convergence occurring at about 100,000 generations. Annealing with C=.84 took a little longer but misconvergence abated. By 120,000 generations, all 200 bit strings had found an optimal state.

The real surprise comes when you use a scheme that uses both long reflections and splicing. In particular, the scheme EXR=.95, PR=.3, PX=.7 not only produced perfect results in ten out of ten trials without annealing, but also converged on average in only 280 generations! This level of performance is somewhat atypical, due primarily to the reflection mutation's ability to speed equilibration across a bit string when large values of EXR are used. Some annealing is needed when you bipartition larger necklaces and/or use a less effective mutation scheme.

Computational Synergy

In the genetic-annealing approach, a synergy exists between splicing and annealing. Annealing controls the rate of convergence, tolerates error, and can endure high rates of random mutation. All these factors help alleviate the tendency of the genetic algorithm to misconverge. Furthermore, splicing genetic material empowers the annealing algorithm by providing the means to efficiently compare competing bit strings that are widely separated in solution space. Run times benefit from this synthesis, since you can exploit the superior search capabilities of splicing without having to maintain a large population to ensure genetic diversity. Add to this the increase in computational speed made possible by using thresholds instead of the Metropolis algorithm, and you can improve run times substantially when compared to traditional genetic and annealing techniques. Of course, you can also improve performance by running the genetic-annealing algorithm on a parallel computer, especially one using the SIMD architecture, since each configuration can then be assigned its own processor.

The GENNEAL.C Program

GENNEAL.C is a bipartitioning program available electronically; see "Availability," page 3. I've kept code simple for the sake of clarity and as a result, several routines have not been optimized. For instance, GENNEAL.C computes the energy of a mutant by performing an XOR operation on every pair of connected vertices, even though most mutations affect only a small part of the target string. It is faster to compute the energy of a mutant by reevaluating just those links of the target string that have been altered. Similarly, if a mutation fails, you need to restore only that part of the target string affected by the mutation. GENNEAL.C copies the whole bit string and then restores it in its entirety, if necessary.

Conclusion

You should now have enough information about how genetic annealing works to experiment on your own. If you currently use either a traditional genetic or simulated annealing algorithm, give genetic annealing a try. I would enjoy hearing from those of you who do the comparison.

I would also like to acknowledge Margaret E. Burwell for her valuable assistance in the preparation of this article.

Figure 1 (a) The "necklace" graph; (b) any diameter across the necklace constitutes an optimal bipartition.
Figure 2 Bits at even positions are the beads on the ring itself; bits at odd positions represent the dangling beads.
Figure 3 Maximum-energy solution to the necklace problem for M=8.

Figure 4:The genetic-annealing algorithm (minimization version).

 1. Randomly select an initial population of N configurations.
 2. For i=1 to N: Initialize the ith threshold,
    Th[i], with the energy of the ith configuration.
 3. DE=0                       /* Empty the energy bank */.
 4. For i=1 to N:              /* Begin cooling loop */.
 5. Splice or randomly mutate the ith configuration.
 6. Compute the energy, E, of the resulting mutant.
 7. If E>Th[i] then restore the old configuration.
 8. If E <= Th[i] then:
    a.DE=DE+Th[i]--E   /* Add energy difference to DE */.
    b.Th[i]=E                  /* Reset threshold */.
    c. Replace old configuration with successful mutant.
 9. Mutate next configuration or end cooling loop.
10.dE=DE*C/N                   /* Compute reheating increment, dE. 0<C<1 */
11.For i=1 to N:               /* Begin reheating loop */.
12.Th[i]+dE                    /* Add dE to each threshold */.
13.Return to step 3 once all thresholds have been reheated.

Figure 5 The splicing scenario.

Back to Table of Contents

ALGORITHM ALLEY

by Bruce Schneier

Figure 1 (a) The "necklace" graph; (b) any diameter across the necklace constitutes an optimal bipartition. Figure 2 Bits at even positions are the beads on the ring itself; bits at odd positions represent the dangling beads. Figure 3 Maximum-energy solution to the necklace problem for M=8.

Figure 5 The splicing scenario.

Figure 1 (a) The "necklace" graph; (b) any diameter across the necklace constitutes an optimal bipartition.
Figure 2 Bits at even positions are the beads on the ring itself; bits at odd positions represent the dangling beads.
Figure 3 Maximum-energy solution to the necklace problem for M=8.