Sex in C++
Some time ago I was clicking around Wikipedia's content on artificial intelligence and encountered the article on genetic algorithms, and decided I would use one to solve a Project Euler problem. The first task was to find a suitable problem. To be amenible to a genetic algorithm, the problem should have certain characteristics:
Read on for specific information about my algorithm for Problem #61, including spoilers if you're inclined to try this approach yourself.
The problem involves "figurate" numbers: triangle numbers like 1, 3, 6, and 10; squares; pentagonals; and so on, up to octagonal numbers. The formulas are given for these number series, and the problem is restricted to four-digit figurate numbers. The problem is to find the unique set of six figurate numbers, one of each shape, which form a "cycle." The cycle pattern occurs when the numbers can be arranged so that the last two digits of each number match the first two digits of the next number -- like the pattern in 1234, 3488, 8890, 9056, 5656, 5612.
The first step is to generate all the figurate values in the range 1000 to 9999, discarding any with a 0 in the tens place.
At this point a traditional strategy might be to start with the first triangle number, then test numbers from the other groups recursively until the solution is found. One enterprising solver dumped the figurate numbers into database tables and wrote a convoluted SQL query to identify the solution. I was determined to use the genetic hammer on the problem, and had to force it into the shape of a suitable nail. These were the steps in my code:
1. Create a base population of possible solutions, each one a set of six numbers, one from each shape, chosen randomly. The size of the population is a significant variable, and numbers as low as 20-30 have been recommended. I wanted to be the master of a larger world, and made a population of 1000 individuals.
2. Score each member of the population, being sure to stop if the solution is found. I gave a point for each two-digit pair at the beginning of a number that matched a two-digit pair at the end of a different number, with the restrictions implied by the problem so a perfect score would be 6. In the first generation, most members scored zero, with some 1s and an occasional 2 sprinkled in.
3. Copy a few of the highest-scoring parents into the next generation. This elitism may help the algorithm home in on the solution.
4. Copy a few of the worst-scoring parents into the next generation, to encourage diversity and avoid having the population converge quickly on a good but incorrect candidate.
5. Select a mother and father from the population to breed, preferring those with higher scores. This is often done with a "roulette" selection method, in which the liklihood of selecting a parent is proportional to its score. My approach was cruder: I set aside a small fraction of the parents with the highest scores and chose randomly among them.
6. Combine the elements of the mother and father to form a member of the next generation. A simple method is to randomly choose a "crossover" point among the six numbers. Any to the left of the crossover point will come from the mother, those to the right will come from the father. (Figurate numbers in candidates are always stored in order -- triangle, square, pentagon, etc.)
7. Repeat steps 5 and 6 to fill a new generation of the same size as the initial population.
8. Mutation: randomly change one number in a randomly selected member of the new generation to a different number of the same shape. This should happen to a small number of members, around 1%.
9. Discard the old generation, replacing it with the new generation, and repeat from step 2 until the solution is found.
After a good deal of tinkering, I had an algorithm that homed in very quickly on candidates that were almost perfect, like 8646 4624 2415 1541 4141 4187, but no winners. After watching millions of generations rise and fall under a variety of starting conditions, I decided to cheat a bit. I wrapped the algorithm in a loop that would annihilate a world if it didn't produce a champion in 200 generations and start from scratch. This modification caused the program to find the solution pretty reliably in at most a few hundred thousand generations.
- It must be possible to rate candidate solutions, with the correct solution getting the highest possible score, and incorrect candidates scoring higher if they are somehow closer to the solution. (Most AI problems, like analyzing chess positions, do not have a single correct answer, so the highest-scoring candidate found in a certain time would be the result.)
- A candidate solution should be expressable as a series of values, something like a chromosome. It helps if all candidates have the same length and format, so the individual parts are easily interchangeable.
Read on for specific information about my algorithm for Problem #61, including spoilers if you're inclined to try this approach yourself.
The problem involves "figurate" numbers: triangle numbers like 1, 3, 6, and 10; squares; pentagonals; and so on, up to octagonal numbers. The formulas are given for these number series, and the problem is restricted to four-digit figurate numbers. The problem is to find the unique set of six figurate numbers, one of each shape, which form a "cycle." The cycle pattern occurs when the numbers can be arranged so that the last two digits of each number match the first two digits of the next number -- like the pattern in 1234, 3488, 8890, 9056, 5656, 5612.
The first step is to generate all the figurate values in the range 1000 to 9999, discarding any with a 0 in the tens place.
At this point a traditional strategy might be to start with the first triangle number, then test numbers from the other groups recursively until the solution is found. One enterprising solver dumped the figurate numbers into database tables and wrote a convoluted SQL query to identify the solution. I was determined to use the genetic hammer on the problem, and had to force it into the shape of a suitable nail. These were the steps in my code:
1. Create a base population of possible solutions, each one a set of six numbers, one from each shape, chosen randomly. The size of the population is a significant variable, and numbers as low as 20-30 have been recommended. I wanted to be the master of a larger world, and made a population of 1000 individuals.
2. Score each member of the population, being sure to stop if the solution is found. I gave a point for each two-digit pair at the beginning of a number that matched a two-digit pair at the end of a different number, with the restrictions implied by the problem so a perfect score would be 6. In the first generation, most members scored zero, with some 1s and an occasional 2 sprinkled in.
3. Copy a few of the highest-scoring parents into the next generation. This elitism may help the algorithm home in on the solution.
4. Copy a few of the worst-scoring parents into the next generation, to encourage diversity and avoid having the population converge quickly on a good but incorrect candidate.
5. Select a mother and father from the population to breed, preferring those with higher scores. This is often done with a "roulette" selection method, in which the liklihood of selecting a parent is proportional to its score. My approach was cruder: I set aside a small fraction of the parents with the highest scores and chose randomly among them.
6. Combine the elements of the mother and father to form a member of the next generation. A simple method is to randomly choose a "crossover" point among the six numbers. Any to the left of the crossover point will come from the mother, those to the right will come from the father. (Figurate numbers in candidates are always stored in order -- triangle, square, pentagon, etc.)
7. Repeat steps 5 and 6 to fill a new generation of the same size as the initial population.
8. Mutation: randomly change one number in a randomly selected member of the new generation to a different number of the same shape. This should happen to a small number of members, around 1%.
9. Discard the old generation, replacing it with the new generation, and repeat from step 2 until the solution is found.
After a good deal of tinkering, I had an algorithm that homed in very quickly on candidates that were almost perfect, like 8646 4624 2415 1541 4141 4187, but no winners. After watching millions of generations rise and fall under a variety of starting conditions, I decided to cheat a bit. I wrapped the algorithm in a loop that would annihilate a world if it didn't produce a champion in 200 generations and start from scratch. This modification caused the program to find the solution pretty reliably in at most a few hundred thousand generations.
The solution is [redacted].
------------------