Computing and Information Systems
Volume 3 Number 1
We investigate a recently developed abstraction of Genetic Algorithms in which a population of GAs in any epoch is represented by a single vector whose elements are the probabilities of the corresponding bit positions being equivalent to 1. The process of evolution is represented by learning the elements of the probability vector. In this paper we extend this to the use of such methods in representing homeotic genes which are environmentally driven and turn on and off other genes. We use this method to simultaneously optimise conflicting criteria within a single population. |
Genetic algorithms are widely used in function approximation problems. Much attention has recently been given to the problem of learning solutions in non-stationary environments(e.g. [4,8]). It is usually found that the rate of change in the environment determines the success of the algorithm in finding optimal solutions: the less stable the environment, the poorer the optimisation results. GAs which use diploid genes have shown some success in problems whose parameters do not change too quickly.
In addition, there is the problem of multi-objective optimisation when such optimisation involves conflicting criteria. This type of problem is of growing importance in e.g. classifier systems which require a set of rules which must be collectively optimised to perform a task. It is well known that, with a finite population, the simple GA settles on a single optimum. Strategies such as fitness sharing, mating restrictions and progressively setting preferences have been shown to be effective but each requires some form of global information outwith the confines of the simple GA.
We review the Structured Genetic Algorithm (SGA) which was shown to substantially outperform the simple GA in tasks where the environment changed abruptly from one steady state to another. We apply the new method of Population-based Incremental Learning(PBIL) to the SGA and derive an algorithm which will learn optimal solutions to 2 different problems even when the problems are presented to the algorithm in a random manner i.e. with no steady-state period of learning one problem followed by a state of learning the other. We further extend the method to deal with multi-modal function approximateion using standard problems.
Our extension is based on applying standard ANN methods to the PBIL algorithm.
The process is initialised with a probability vector each of whose elements is 0.5 and terminated when each element of the vector approaches 1 or 0. The update of the probability vector's elements is done using a supervised learning method
p_{i} = ( E_{best}(chromosome_{i} ) - p_{i} )
where E_{best}(chromosome_{i} ) is the mean value of the i^{th }chromosome bit taken over the fittest chromosome(s) and p_{i} is the probability of a 1 in position i in the current generation. is a learning rate. We note that this representation of evolution uses a learning rule for the probability vector which is very similar to the competitive learning rules in Artificial Neural Networks.
Therefore to transmit information from generation to generation we have a single vector - the probability vector - from which we can generate instances (chromosomes) from the appropriate distribution. Notice that
p_{i} -> 0 => p_{i} -> E_{best}( chromosome_{i} )
Therefore if all the fittest chromosomes in each generation have a 1(0) in the i^{th}position, the probability vector will also tend to have a 1(0) in that position.
The Structured Genetic Algorithm [3] envisaged this structure as a two-layered genetic algorithm such as shown in Figure 1.
Genes at either level can be either active or passive; high level genes either activate or deactivate the sets of lower level genes i.e. only if gene a1 is active will bits (a11 a12 a13) determine a phenotype. Therefore a single change in a gene at a high level represents multiple changes at the second level in terms of genes which are active. Genes which are not active (passive genes) do not, however, disappear since they remain in the chromosome structure and are carried invisibly to subsequent generations with the individual's string of genes. Therefore a single change at the top level of the network can create a huge difference in the expressed phenotype whereas with the simple GA we would require a number of changes in the chromosome to create the same effect. As the number of changes required increases, the probability of such a set of changes becomes vanishingly small. Note one drawback of the algorithm is that we must ensure that one and only one of the top level genes is ``on'' at any one time.
Figure 1: The Structured GA
In the simple knapsack problem, we are only allowed one of each type of object and so we can model this as a binary string which has value 1 in the ith bit if object i is selected for the knapsack and 0 otherwise. An example 7-object knapsack problem is shown in Table 1.
Object | Value | Weight | Best vector (W=50) |
1 | 70 | 31 | 1 |
2 | 20 | 10 | 0 |
3 | 39 | 20 | 0 |
4 | 37 | 19 | 1 |
5 | 7 | 4 | 0 |
6 | 5 | 3 | 0 |
7 | 10 | 6 | 0 |
Total | 188 | 93 | 2 |
1 The figure of 93kg was chosen as such a weight enables all objects to be placed in the knapsack i.e. the solution completely dominates the solution for the 50kg criterion
The Structured Genetic Algorithm has been shown to perform well[4] in dynamic situations i.e. when the environment is changing within the course of an experiment.
For example, in the knapsack problem let us allow the simulation to converge (so that the population has converged on some solution of the problem (maximisation of value) under the constraint that the weight in the knapsack does not exceed 50Kg). Now change the constraint so that individuals must now optimise their value under the constraint that the weights do not exceed 93Kg1. If we allow such changes to happen regularly, the simple GA will not perform well since the individuals of the population have been optimised for one environment and will not perform well in a different environment. The Structured GA, on the other hand, retains enough genetic diversity within its string to quickly converge to optimal values. Further if we set up an experiment so that the the environment changes systematically and regularly, the structured GA will quickly adapt to these conditions since even the phenotypes which are fittest with respect to the 50Kg constraints may retain (as passive genes) genetic material which would make their offspring fit with respect to the 93Kg constraints (and recall that these genes can be switched on with a single change at the top level) and vice-versa.
Let us consider a vector in which element p_{0} corresponds to the homeotic gene; we consider it to be a simple switch but with a stochastic element in that the value of p_{0} is the probability that the phenotype corresponding to gene 1 (in bit positions p_{1} to p_{m} ) being expressed and 1- p_{0} is the probability that that corresponding to gene 2 in bit positions p_{m+1} to p_{2m} is expressed. Then we generate a random number from the uniform distribution in [0,1] and if it exceeds p_{0} evaluate the fitness of the gene corresponding to positions p_{1} to p_{m} otherwise we evaluate the fitness of the gene corresponding to positions p_{m+1} to p_{2m}. The learning rule is
p_{0} = _{0} ( E_{best}(chromosome_{0} ) - p_{0} )
p_{i} = ( E_{best}(chromosome_{i} ) - p_{i} ) * E_{best}(chromosome_{0} ), 1 < i < m
p_{i} = ( E_{best}(chromosome_{i} ) - p_{i} ) * ( 1 - E_{best}(chromosome_{0} )), m+1 < i < 2m
where and _{0} correspond to small learning rates which can be adjusted during the course of the simulation. The last two equations are similar to the equation for the PBIL. The sole difference is that each change in the probabilities is gated by the expected value that the appropriate gene's phenotype was expressed during that generation.
Notice that the fact that different genes are using different rules is not implausible from a biological perspective; in particular homeotic genes are believed to be relatively more stable in most cases than other expressed genes since their effect can be so major.
The first equation causes the homeotic gene's probabilities to be changed. Notice that unless this gene's probabilities reach 0 or 1, either of the two "lower-level" genes has a non-zero probability of being expressed. It is simple to ensure that this is always the case by putting bounds on the maximum and minimum values of p_{0} , however it has been found that a better solution is to have a lower learning rate _{0} for this gene which allows it to track the environment but at a slower rate.
The average fitness of each chromosome is shown in Figure 2. A comparison with the single chromosome PBIL algorithm shows that the SPBIL algorithm outperforms it on this task. We can see the reason for this if we look at the best realised chromosome in each generation, we see that this chromosome exists in each generation (see Figure 3).
Figure 2: Average fitness of the population over 1000 generations.
Figure 3: Fitness of the best gene over 1000 generations.
SIMULTANEOUS OPTIMISATION A further advantage of the SPBIL is that we can perform simultaneous optimisation of more than one criterion.: let us attempt to simultaneously optimise the knapsack value under the pair of criteria used in the last section by randomly in each generation chosing which of the criteria to use in fitness evaluation.
We show, in Figure 4 the convergence of the first two bit positions in each of gene 1 and gene 2 towards their final value of 1. We can clearly see the interference between the two criteria's respective populations. This time each niche population is being formed at the same time as the other: the actual weight criterion for which the value was optimised was selected randomly each generation from the possible values. This corresponds to a dynamically changing environment where the measure of success in the environment is chosen randomly. We see here a simultaneous optimisation of both criteria.
Figure 4: Probabilities of the first bit position of each of the realisable genes. Typically gene 1 is optimised first while the second finds a niche when the first is too specialised.
F1(x) = sin^{6} (5 x)
This function has 5 peaks when defined for -1 <= x <= 1 which occur at 0.1,0.3,0.5,0.7 and 0.9 as shown in Figure 6. We follow Ryan's[8] coding: a 30-bit chromosome is used to represent x. Ryan follows the usual practise of initialising the chromosomes equally within the space of possible values, however such a method is not guaranteed to work since, in real problems, several solutions can be arbitrarily close to one another. Also, as the dimensionality of the data increases, the method's expense increases exponentially. We simply wish to begin with the probability vector of all the realisable genes (the lowest level genes) initialised with a common probability of 0.5. Further we do not wish to build in pre-knowledge of the solution in that the number of realisable genes are equal to the number of optimal solutions (5 in this case). One possible extension to the SPBIL is shown in Figure 7. We now consider a multi-layered sequence of homeotic genes.
Figure 5: The number of individuals in a population of 100 during each generation who reached the peak fitness level.
Figure 6: The graph of F1(x) = sin^{6} (5 x)
Figure 7: Homeotic gene 0 determines whether homeotic genes 1 or 2 will be realised. They in turn determine which of genes 3-6 will be realised. Genes 3-6 determine which of the lowest level genes will be realised.
The top level represents bit 0 which determines whether bit 1 or bit 2 is turned on. Both bit 1 and bit 2 are further homeotic genes so that bit 1 determines whether bits 3 or 4 are turned on and bit 2 determines whether bits 5 or 6 are turned on. Similarly bits 3 - 6 are also homeotic genes and determine which of the realisable genes (each of 30 bits) are activated. There are therefore 8 sets of realisable genes each of which may determine the fitness of the chromosome. The learning rules are as above:
p_{0} = _{0} ( E_{best}(chromosome_{0} ) - p_{0} )
p_{1} = _{1} ( E_{best}(chromosome_{1} ) - p_{1} ) * E_{best}(chromosome_{0} )
p_{2} = _{1} ( E_{best}(chromosome_{2} ) - p_{2} ) * (1-E_{best}(chromosome_{0} ))
p_{3} = _{2} ( E_{best}(chromosome_{3} ) - p_{3} ) * E_{best}(chromosome_{1} )
p_{4} = _{2} ( E_{best}(chromosome_{4} ) - p_{4} ) * (1-E_{best}(chromosome_{1} ))
p_{i} = _{3} ( E_{best}(chromosome_{i} ) - p_{i} ) * E_{best}(chromosome_{3} ), 8 <= i < 38
p_{i} = _{3} ( E_{best}(chromosome_{i} ) - p_{i} ) * (1-E_{best}(chromosome_{3} )), 38 <= i < 68
where all expectations, E(), are taken over the fittest chromosomes. The obvious extension is made to the other genes. We use lower learning rates for the higher level homeotic genes. With this set of learning rules, we found that convergence to the optimal solution was possible but that it required a very low learning rate i.e. there is a very slow convergence to the optima. If the learning rate increases a subset of the probability vectors converge to the optimal values but since that subset will typically have very accurate realisations on each generation, the whole population will become dominated by the niche solution which is most accurate. This fairly quickly feeds back to the homeotic genes and predisposes them to always select the most fit sub-population.
In order to maintain diversity in the population, we introduce an environmental feedback mechanism:
when an instance of the realised genes is one of the fittest chromosomes i.e. one used in upgrading the probability vector, we added a small bias which is used to handicap the gene when its fitness is next calculated. This models the fact that competition for resources in a population is greatest between similar individuals. So in each generation we calculate
bias_{i} = - 0.016, for the best genes
bias_{i} = +0.002, i
bias_{i} = bias_{i} + bias_{i}
fitness = F1(x) + bias_{i}
Using this method we can have an extremely high learning rate and accurate convergence to all optima. At the optima, probability elements representing the homeotic genes will be approximately 0.5 while all probability elements representing realisable genes will either be tending to 0 or 1. Using a training run of only 200 iterations and a learning rate _{3} of 0.2 we have the results of Table 2.
Gene | 1 | 2 | 3 | 4 |
Value | 0.001 | 0.500 | 0.300 | 0.700 |
Gene | 5 | 6 | 7 | 8 |
Value | 0.300 | 0.902 | 0.900 | 0.900 |
Table 2: Converged values represented by the probability vectors after only 100 generations when the learning rate, was 0.2
The similarity between the use of a bias and that method known as a "conscience mechanism" in Artificial Neural Networks is obvious.
Figure 8: Deb's F2 function has decreasing peaks. The fitness function is F2(x) = exp(-2log(2) * (x-0.1)^{2}/ 0.64) * sin^{6} (5 x)
F2(x) = exp(-2log(2 ) * (x-0.1)^{2}/ 0.64) * sin^{6} (5 x)
Experimental findings have shown that with the system defined above the fitness of the highest peak is predominant: all realised genes converge to that peak leaving the other peaks unexplored. What we wish to do is to allow only one niche population to occupy the highest peak and force the other populations to occupy the lower peaks. To do so we use a system very similar to Kohonen's Learning Vector Quantization(LVQ) [6]. At each update of the probability vector, we increase the probability vector relative to the best vector as before but also include a means of moving all (including the winning one) away from the best:
p_{i} = -_{0} ( E_{best}(chrom_{i} ) - p_{i} ), i
p_{i} = ( E_{best}(chrom_{i} ) - p_{i} ), for the realised chromosome.
Clearly we require _{0} < or we would get no convergence to any peak. A ratio of 0.1 was found sufficient in the experiments described herein.
With this simple addition, all peaks were consistently found.
Functions F3 and F4 did provide more of a challenge to the algorithm and are shown diagrammatically below.
Figure 9: The F3() function
Figure 10: The F4() function
F3(x) = sin^{6} (5 (x^{¾} -0.05))
The F4 function has unevenly spaced and decreasing peaks determined by
F4(x) = exp(-2log(2 ) * (x-0.08)^{2}/ 0.854) * sin^{6} (5 x)
Functions F3 and F4 proved more difficult for the algorithm in that we often found more than one population occupying the broader-based peaks and such populations interfered with one another causing poorer accuracy in the converged results.
It was found that the simple expedient of causing the learning rate to decrease to 0 during the course of the experiment was sufficient to give any required degree of accuracy. This method is obviously suggested by the needs of many stochastic approximation algorithms such as the Robbins-Monroe stochastic approximation algorithm[7] which requires the learning rate to satisfy
(k) >= 0, _{k} (k) = , _{k} ^{2}(k) <
It is interesting that such a learning rate provides a population which has found its niche - it is no longer capable of evolving which agrees with Huxley's ([5],p11) comment "there is no certain case on record of a line showing a high degree of specialization giving rise to a new type''.
We find ourselves in some difficulty in providing a numerical report of our results in that
We will have to content ourselves with stating that using the methods described in this paper we can cause convergence of the realised genes to the correct optimal values to any degree of accuracy required.
The SPBIL algorithm also allowed simultaneous optimisation of conflicting criteria. Simple extensions of the SPBIL were used when developing a multi-layered gene in order that the niche populations represent all optima.
Finally when we use a population of more than 1 to update the probability vector, it is possible to use a weighted expected value in the learning equations in which the weights are proportional to the fitness of the appropriate phenotypes.
Future work will investigate the convergence of structured PBIL simulations on such standard problems as the Travelling Salesman Problem.
References
Dr. C Fyfe is a Lecturer at the University of Paisley