Genetic Algorithms - An Introduction

Genetic algorithms are one of the best ways to solve a problem for which little is known. They are a very general algorithm and so will work well in any search space. All you need to know is what you need the solution to be able to do well, and a genetic algorithm will be able to create a high quality solution. Genetic algorithms use the principles of selection and evolution to produce several solutions to a given problem.

When are Genetic Algorithms Useful?

There are at least three situations where genetic algorithms are useful:

The objective function is not smooth (i.e., not differentiable).
There are multiple local optima.
There is a large number of parameters (the meaning of "large" keeps changing).

A typical genetic algorithm requires:

1) A genetic representation (e.g. Array of Bits) of the solution domain,
2) A fitness function to evaluate the solution domain. A fitness function is a particular type of objective function that is used to summarize, as a single figure of merit, how close a given design solution is to achieving the set aims.

Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a population of solutions and then to improve it through repetitive application of the mutation, crossover, inversion and selection operators.

Simple Genetic Algorithm Pseudo-code
function SimpleGeneticAlgorithm ()
{
 Initialize population;
 Calculate fitness function;
 While (fitness value != termination criteria)
 {
  Selection;
  Crossover;
  Mutation;
  Calculate fitness function;
 }
}

Genetic Algorithms steps:

1. Initialization 

The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Often, the initial population is generated randomly, allowing the entire range of possible solutions (the search space). Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found.

2. Selection

During each successive generation, a portion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent.

3. Genetic operators

The next step is to generate a second generation population of solutions from those selected through a combination of genetic operators: crossover (also called recombination), and mutation.

a) Crossover (also called as Recombination)

Crossover is a genetic operator used to vary the programming of a chromosome or chromosomes from one generation to the next. It is analogous to reproduction and biological crossover, upon which genetic algorithms are based. Cross over is a process of taking more than one parent solutions and producing a child solution from them.



Single-point crossover

Two-point crossover

Cut and splice


b) Mutation

Mutation alters one or more gene values in a chromosome from its initial state. In mutation, the solution may change entirely from the previous solution. Hence GA can come to a better solution by using mutation. Mutation should allow the algorithm to avoid local minima by preventing the population of chromosomes from becoming too similar to each other, thus slowing or even stopping evolution.
The mutation of bit strings ensue through bit flips at random positions.
Example:
1 0 1 0 0 1 0
1 0 1 0 1 1 0

4. Termination

This generational process is repeated until a termination condition has been reached. Common terminating conditions are:
  • A solution is found that satisfies minimum criteria
  • Fixed number of generations reached
  • Allocated budget (computation time/money) reached
  • The highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results
  • Manual inspection
  • Combinations of the above

Case Study : Roulette Wheel Selection

Parents are selected according to their fitness. The better the chromosomes are, the more chances to be selected they have. Imagine a roulette wheel where all the chromosomes in the population are placed. The size of the section in the roulette wheel is proportional to the value of the fitness function of every chromosome - the bigger the value is, the larger the section is. See the following picture for an example.



A marble is thrown in the roulette wheel and the chromosome where it stops is selected. Clearly, the chromosomes with bigger fitness value will be selected more times.

This process can be described by the following algorithm.

[Sum] Calculate the sum of all chromosome fitnesses in population - sum S.
[Select] Generate random number from the interval (0,S) - r.
[Loop] Go through the population and sum the fitnesses from 0 - sum s. When the sum s is greater then r, stop and return the chromosome where you are. Of course, the step 1 is performed only once for each population.

Python code for Roulette wheel selection 

#!/usr/bin/env python

import sys, time, numpy, random

# Individual has a genome and fitness and knows how to print itself
class Individual:
    def __init__(self, genome):
        if genome is None:
            self.genome = numpy.array(numpy.random.random_integers(0, 1, LEN), dtype='bool')
        else:
            self.genome = genome
        self.fitness = FITNESS(self.genome)
    def __str__(self):
        return "".join(str(int(i)) for i in self.genome)
        
# Uniform crossover
def xover(a, b):
    g, h = a.genome.copy(), b.genome.copy()
    for pt in range(len(g)):
        if numpy.random.random() < 0.5:
            g[pt], h[pt] = h[pt], g[pt]
    return (Individual(g), Individual(h))

# Per-gene bit-flip mutation
def mutate(a):
    g = a.genome.copy()
    for pt in range(len(g)):
        if numpy.random.random() < PMUT:
            g[pt] = not g[pt]
    return Individual(g)

# Print statistics, and return True if we have succeeded already.
def stats(pop, gen):
    best = max(pop, key=lambda x: x.fitness)
    print("{0} {1:.2f} {2} {3}".format(gen, numpy.mean([i.fitness for i in pop]), best.fitness, str(best)))
    return (best.fitness >= SUCCESS_THRESHOLD)

# Roulette-wheel (fitness-proportionate) selection. Tricky but fast.
# http://stackoverflow.com/questions/2140787/select-random-k-elements-from-a-list-whose-elements-have-weights
def roulette(items, n):
    total = float(sum(w.fitness for w in items))
    i = 0
    w, v = items[0].fitness, items[0]
    while n:
        x = total * (1 - numpy.random.random() ** (1.0 / n))
        total -= x
        while x > w:
            x -= w
            i += 1
            w, v = items[i].fitness, items[i]
        w -= x
        yield v
        n -= 1

# Use many tournaments to get parents
def tournament(items, n, tsize=5):
    for i in range(n):
        candidates = random.sample(items, tsize)
        yield max(candidates, key=lambda x: x.fitness)

# Run one generation
def step(pop):
    newpop = []
    parents = SELECTION(pop, len(pop) + 1) # one extra for final xover    
    while len(newpop) < len(pop):
        if numpy.random.random() < CROSSOVER_PROB:
            # crossover and mutate to get two new individuals
            newpop.extend(map(mutate, xover(parents.next(), parents.next())))
        else:
            # clone and mutate to get one new individual
            newpop.append(mutate(parents.next()))
    return newpop
    
def main():
    if len(sys.argv) > 1:
        numpy.random.seed(int(sys.argv[1]))
    print("GENERATIONS {0}, POPSIZE {1}, LEN {2}, CROSSOVER_PROB {3}, PMUT {4}".format(GENERATIONS, POPSIZE, LEN, CROSSOVER_PROB, PMUT))
    pop = [Individual(None) for i in range(POPSIZE)]
    stats(pop, 0)
    for gen in range(1, GENERATIONS):
        pop = step(pop)
        if stats(pop, gen):
            print("Success")
            sys.exit()
    print("Failure")

# parameters
GENERATIONS, POPSIZE, LEN, CROSSOVER_PROB, PMUT = (100, 100, 100, 0.5, 0.1)
FITNESS, SUCCESS_THRESHOLD = (numpy.sum, LEN)
SELECTION = roulette # roulette sucks: tournament is better

main()