Thinking In .NET
A place to learn about C# and the .NET Platform by Larry O'Brien. But mostly the obligatory braindump cross-linking that characterizes the blogsphere.
        

C# Genetic Algorithms For Problem-Solving

I had a good time over the holidays just jazzing around; my brother gave me a puzzle for Christmas and I wrote a genetic algorithm in C# that solves it and then I wrote a version of the puzzle for the Pocket PC. The puzzle is a 3 x 3 matrix of tiles; you have to arrange the tiles and rotate them so that all the images match. It's hard to do: the solution space is 95 * 10^9. I wrote a naive brute force approach just to see how long it would take -- about ten days on my machine.

I'm sure there are smarter ways to solve the puzzle than evolving a solution, but I've always had fun with genetic algorithms and have been wanting to do them in C#. Plus, I did a trick I'd thought of but never implemented that monitors the Shannon entropy of the genome and measures it against evolutionary "progress."

My genetic algorithm starts with a population of totally random arrangements of tiles. The number of matched images are counted as the "fitness" of the individual "critters." Typically, a total random arrangement has fewer than 2 matched images. Individual critters are selected to breed based on their relative fitness (for instance, if you had a population of 3, and they had 1, 2, and 3 matches each, the respective odds to be chosen as a parent would be 1/6, 2/6, 3/6).

This differential selection of two parents then need to be combined into a single offspring. But it wouldn't work to just say "Okay, you get the top-left tile from the mother and the top-center from the father, etc." since it's unlikely that the two parents share the same tiles in the same order. The algorithm I use to generate an offspring goes like this:

  1. Set activeParent to the "mother" arrangement
  2. Set activeTile to the center tile
  3. Does activeTile have any matches?
  4. If so, select the next tile from among the matches
  5. If not, are any of its bordering tiles unused?
  6. If so, select the next tile from among these unused tiles
  7. If not, select the next tile randomly
  8. Set activeTile to the next chosen tile
  9. Is a random number less than crossoverProbability?
  10. If so, switch activeParent between the two parents
  11. If there are any tiles left, loop to step 3 above
  12. Is a random number than mutationProbability?
  13. If so, swap two tiles at random and rotate them arbitrarily

Typical values would be a population of 50, crossoverProbability of .2, and mutationProbability of .1. Note that only an average of 5 critters in a population would be mutants and mutations generally produce lower than average fitness. However, mutations are necessary in order to keep the population from being stuck in a good-but-not-perfect solution. Recombination, the commingling of "good partial solutions" from the two parents, is the real driving force of evolution. Look at how fast the most number of matches in an individual (red plot) and the average number (green plot) jumps from the 1-2 of a random arrangement:

However, on a longer timescale, these early rates of increase flatten out, as the "needle in the haystack" of the solution challenges even evolution. The population "discovers" an 11-match solution, just 1 short of the solution, before 1,000 generations have passed, but it take more than 4 times that to solve the puzzle (on this run, the solution was found in generation 4,071):

The small population, small genome, and "needle in the haystack" solution means that the population is in danger of entering a local minima, in which a "pretty good" partial solution of 6 or more matches comes to dominate the genome of the population, even if that partial solution is not part of the ultimate solution. The only way out of such a predicament is for the vanishingly small chance of a fortuitous mutation. To watch for this, I tracked the Shannon entropy of each tile (essentially, the amount of various ways that tile was being used in the population). When the variability in the genome descended below stagnationLevel, I declare the effort to be a "genetic dead-end" and restart from random tiles. I've been using a stagnationLevel of 15% to end an effort (in other words, 85% of the population must share the same tile with the same rotation in the same place). The above graph of a successful effort came after 104 genetic dead-ends. The total number of critters created was approximately 7,000,000. That may seem like a lot, but it is less than 1% of 1% of the solution space.

Those who oppose evolutionary theory often point to the common sense unlikelihood of "random processes" generating highly ordered structure. One of the sadder things in this debate is that biologists defending evolution are generally unaware of the work pioneered by John Holland on the mathematical basis for evolution and fall back on "While most mutations are harmful, over the course of millions or billions of years...." But it's clear from an algorithmic standpoint that evolution is driven by recombination, not mutation. Mutation is necessary to introduce new building blocks for recombination to work with, but the optimization engine is the differential selection process and the recombination process. Evolutionary algorithms can be shockingly fast and efficient. I'm not laying out my genetic approach as the optimal way to solve this particular puzzle, but on the other hand, finding a solution by exploring 0.0073586% of the solution space while using no problem-specific heuristics isn't particularly shameful either.



© Copyright 2003 Larry O'Brien. Click here to send an email to the editor of this weblog.
Last update: 1/2/2003; 2:25:01 PM.