How Much Should You Select for Evolvability?

We consider whether selection for evolvability leads to greater adaptive progress than selection for adaptedness alone. Our treatment bears on longstanding discussions of selection for evolvability in the literature, which have been largely limited to conceptual and qualitative arguments to date. We study a simple mathematical model of a population of individuals whose adaptedness and evolvability (here modelled as the standard deviation of mutations affecting adaptedness) are both under selective forces. In the special case of a population of size two, we show that the optimal amount of selection for evolvability depends on the ratio between the initial evolvability and the amount that evolvability can increase in the time given. Our result shows that to maximize the amount of adaptation it never pays off to select for evolvability more than to select for adaptedness itself. We have not answered the question of to what degree evolvability is selected for in nature, however we have made a small step in quantitative modelling of the evolution of evolvability and proved the existence of conditions under which selection for evolvability has a demonstrably positive effect.


Introduction
The definition of evolvability has been hard to pin down (Kirschner and Gerhart, 1998). One popular definition is that it is a property of an individual, or an aggregate property of a lineage, that increases the expected rate of adaptation of the lineage. For example, for Hansen (2006) evolvability is "the ability of the genetic system to produce and maintain potentially adaptive genetic variants." All measures of this type of evolvability depend in some way on the probability distribution of fitness effects of mutations (Altenberg, 1995). Proposed measures include the likelihood of a mutation being beneficial (Smith et al., 2002) and the variance of the fitness of offspring prior to selection (Gallagher, 2009).
The evolution of evolvability has become a popular research topic for biologists (Dawkins, 2003;Pigliucci, 2008), researchers of evolutionary computation (Altenberg, 1994;Reisinger and Miikkulainen, 2006), and in the artificial life community (McMullin, 2012;Webb and Knowles, 2014). The questions being asked fall into two broad categories:

Has evolvability increased through evolution in life on
Earth? If so, by what mechanisms?
2. How can we encourage the evolution of evolvability in evolutionary algorithms and artificial life simulations?
In this paper, we ask to what extent we should select for evolvability in simulated evolution in order to maximize the degree of adaptation overall. To our knowledge, this has not been done before. We have in mind a scenario in which adaptedness and evolvability are simultaneously under selective forces, and where the two are inextricably linked such that there is a trade-off; an individual can't inherit its adaptedness from one parent and its evolvability from another. This would be the case, for example, if an individual's evolvability were determined by the way in which its traits are encoded in the genetic material that undergoes variation. If one individual is well adapted, but another, less well adapted, individual is more evolvable due to having a more suitable encoding, in general it is not straightforward to reencode the more adapted individual using the better encod- ing.
An answer to the question of to what extent we should select for evolvability will primarily be useful in evolutionary computation and artificial life simulations, where often the goal is to maximize the degree of adaptation. It may be less useful in answering questions about natural evolution, where there is no such forward planning, though it still might provide a useful piece in a larger puzzle.
Our contributions are as follows.
• We introduce a simple model of a population in which adaptedness and evolvability are simultaneously under selective forces, and in which we control the relative importance of adaptedness and evolvability during selection.
• For the special case that the population is of size two, we answer exactly to what extent evolvability should be selected for (to the exclusion of selecting for adaptedness) to maximize the total amount of adaptation.
• We list the ways in which the model might be extended in order to better reflect realistic scenarios.

The Model
We have a population of N individuals, each with two traits A and B. The A value of an individual represents adaptedness to some environment, and as such it is a value to be maximized. The B value of an individual represents the "evolvability with respect to trait A"; it is the key factor in determining the rate of increase of A in the course of evolution. Here, that means that the Bs determine the standard deviations of mutations affecting the As.
In each generation, we rank the population by the value γA + (1 − γ)B, where the As and Bs have first been normalized by dividing by the standard deviations of those traits in the population. The weighting parameter γ is under our control, and takes values in the range [0, 1]. This parameter represents a trade-off between selecting for adaptedness and evolvability. The top proportion p become the parents of the next generation; we select with replacement from the set of parents to form the next generation.
The reason we include the normalization step is that, without it, as the variance of one of the traits in the population becomes large, evolution stops acting on the other trait, regardless of the value of γ. This is illustrated in Figs. 1 and 2. These show, for a large population with normally distributed A and B traits, the expected increase in the population mean values of A and B if we select the top half by value A + B, where the standard deviation of the A values is 1 and the standard deviation of the B values is parameterized by β. Without normalization, as Fig. 1 shows, the expected increase of each trait is a nonlinear function of both of the trait standard deviations. As the standard deviation of the Bs, β, tends to infinity, the expected increase in trait A due to selection tends to zero. With normalization, as Fig. 2 shows, the expected increase of each trait is proportional to the standard deviation of just that trait.
After the selection step, we mutate the A and B values as follows. We add Gaussian noise to each individual's B value with a constant standard deviation, β. We add Gaussian noise to each individual's A value with standard deviation αB (i.e., a constant times that individual's B value). Since standard deviations must be positive, we prevent the Bs from taking negative values; when a mutation makes a B value negative, we set it to a small positive value ǫ.
Algorithm 1 shows the process of evolution in this model. Table 1 lists the parameters and their roles.
Our question, stated in terms of the model, is as follows. What value of the parameter γ maximizes, at some particular future time t end , the expected mean value of A? We answer this question exactly in the special case that the population size N = 2 and the proportion selected as the parents of the next generation p = 1/2.

Derivation of the Result
We restrict ourselves to the special case that the population size N = 2 and the proportion kept during selection p = 1/2. We can restate the problem so that we only have to keep track of one A value and one B value in each generation; let A(t), B(t) be the A and B values of the parent for generation t, with In each generation, we duplicate the parent, mutate both copies, and then, after normalizing by dividing the As and Bs by their population standard deviations, we select as the parent for the next generation the individual with the maximum value of γA+(1−γ)B. Since the parents are identical before mutation, we are essentially selecting between mutation events. The normalization step means that, as far as the selection operator is concerned, all mutations have a standard deviation of 1.
We can achieve the same result by drawing four 'normalized' mutations from the standard normal distribution N (0, 1). M A1 and M A2 are the normalized mutations af-Algorithm 1 Our model of simultaneous evolution of adaptedness and evolvability.
1: Initialize vector A(t = 0) with N elements of value A 0 2: Initialize vector B(t = 0) with N elements of value B 0 3: for each t ← 0..t end do 4: if B(t + 1) n < 0 then 14: fecting the As, and M B1 and M B2 are the normalized mutations affecting the Bs. We will then select the pair of mutations with the largest value of γM A + (1 − γ)M B , and multiply each by the desired mutational standard deviation, undoing the normalization step. We then apply these mutations to the parent to get the A and B values of the parent of the next generation. Let The two components in the sum in (1) are distributed as Using the results (21), (22) from the appendix, we obtain the expected values of these components in the pair with the maximum sum, which are From one generation to the next, A will increase by αB(t)M + A , and B will increase by βM + B . These are the 'unnormalized' mutations, which have the expected values (taking the expectation over the possible values of M + A , M + B ) The nonlinear trade-off between the rates of increase of A and B, determined by the parameter γ, is shown in Fig. 3, and is given by Since αB(t)M + A and βM + B represent the change in A and B from generation t to t+1, we have the recurrence relations and the expected value of A(t) and B(t) (now taking the expectation over the mutation events in every generation) satisfy the recurrence relations Solving the second recurrence relation gives the expected value of B at time t, which is and solving the first recurrence relation gives the expected value of A at time t, which is To find a value of γ that maximizes the expected value of A at time t end , we set the derivative of the above expression with respect to γ equal to zero. The solution γ * in the range [0, 1] that satisfies this equation is The optimal value of the trade-off parameter, γ * , depends only on B 0 , β, and t end , and is an increasing function of B 0 /(β(t end − 1)). Figure 4 shows γ * as a function of B 0 /(β(t end − 1)), while Fig. 5 shows γ * as a function of the reciprocal β(t end − 1)/B 0 . Both are shown so that both asymptotes are clear.
That γ * depends on this quantity makes sense; it is the ratio between the initial evolvability B 0 and βt end , which is related to the amount that evolvability can increase in the course of evolution in the time given. If the initial evolvability is large compared to the capacity to increase evolvability, then it pays off to focus more on increasing the trait A. As Figure 4: The optimal value of the trade-off parameter γ as an increasing function of B 0 /(β(t end − 1)). As B 0 becomes large, the optimal value asymptotically approaches 1. Figure 5: The optimal value of the trade-off parameter γ as a decreasing function of β(t end −1)/B 0 . As β or t end become large, the optimal value asymptotically approaches 1/2.
we look further to the future and t end becomes large, the initial evolvability value has less of an effect and γ * tends towards 1/2. The optimal value γ * is never less than 1/2; it never pays off to select more for evolvability than for adaptedness. Figure 6 shows the optimal value of γ found by numerical methods for a range of values of B 0 , β, and t end . For each setting of the parameters, we plot the value of γ with the highest mean value of A at time t end measured over one hundred thousand trials (the low population size make the outcome noisy). The numerical results closely agree with the answer obtained here, verifying the result 1 . Figure 7 shows, for a particular setting of the parameters, the expected value of A over time for three strategies; setting γ = 1 (so that only A is selected for), setting γ = 1/2 (so that we select equally for A and B), and setting γ = γ * (the optimal value). It can be seen that the γ * strategy dominates. Figure 8 shows the same with a different setting of the parameter β. Note that for the γ * strategy, the plots Figure 6: A comparison between the optimal value of γ obtained by numerical methods and the exact result.
do not show A over time for a particular value of γ; for each time t, the plot shows the expected value of A when using the (constant) value of γ that maximizes A(t).

Larger Population Sizes
With a larger population size, the model is harder to analyze for two reasons. The first is that the trait variances in the population depend on mutations accumulated over multiple generations; because more than one parent is selected in each generation there will be residual variation from the previous generation. Moreover, this residual (post-selection) variation will not be normally distributed. The result is that the traits A and B will no longer be normally distributed, but will be skewed by an amount depending on the tradeoff parameter γ and the proportion kept during selection p. The trait distributions will change over time, approaching an equilibrium shape.
The second problem is that, because more than one parent is selected in each generation, correlation builds up between the A and B values in the population; individuals selected for having high A values are likely to have inherited large B values. The result of this correlation is that there is indirect selection for trait B when selecting for trait A. Figure 9 shows (with N = 100, p = 1/2), as functions of γ, the measured mean increase in traits A and B during selection in generation 10, in units of the mutational standard deviation of A and B, respectively. Figure 10 shows the same in generation 50. The asymmetry is due to indirect selection for trait B, and the mean increase of each trait due to selection changes over time because both the trait distributions and the correlation between the traits are changing over time. Compare these with the stationary (in time) and symmetric functions giving the expected per-generation increase of traits A and B for a population of size two, shown in Fig. 3. Without an exact expression for the expected increase of traits A and B due to selection in a larger population, we cannot deduce the optimal value of the trade-off parameter γ.   Figure 11 shows the optimal value of γ found by numerical methods, with a population size N = 100 and p = 1/2. The exact result from the population size N = 2 case is shown for comparison.

Related Work
Here we review research related to selection for evolvability from the research fields of evolutionary biology, evolutionary computation, and artificial life. There has been much debate about how to define evolvability, and about whether it is a property of individuals or populations. In our work we have followed Conrad (1972), Altenberg (1994), Kirschner and Gerhart (1998), and others in defining evolvability to be a property of individuals, related to their amenability to adaptive evolution, or capacity to produce potentially more well-adapted offspring.
Even amongst those who agree with this definition, there is no consensus about exactly how to measure evolvability. Proposed measures include the likelihood of a beneficial mutation, the expected fitness of offspring after selection, and the expected variance in fitness of offspring prior to selection. Gallagher (2009) gives a summary of these and  other measures. It is this last measure, the pre-selection variance of offspring, which is closest to ours; in our model, the evolvability of an individual is the standard deviation of mutations affecting the individual's offspring.
In biology, questions have been asked about whether evolvability has evolved in life on Earth, and if it has, whether it evolved by natural selection or by some additional mechanisms (Pigliucci, 2008). The first question is motivated by the fact that evolvability confers a future, rather than present, advantage, and it's not obvious that evolution has or is able to select for evolvability even if doing so would be beneficial in the long term (Kirschner and Gerhart, 1998). Amongst those who believe evolvability has evolved, proposed mechanisms include indirect selection due to correlation between fitness and evolvability (Altenberg, 1994), direct selection for evolvability as an adaptation or as a byproduct of selection for environmental robustness (Hansen, 2006;Visser et al., 2003), higher level Figure 11: A comparison between the optimal value of γ obtained by numerical methods (with population size N = 100) and the exact result (with N = 2). selection between clades and groups (Alberch, 1991), and a process of repeated extinctions and radiations (Dawkins, 2003).
In recent experiments in artificial life, the underlying encodings of self-replicators (i.e., the way in which the replicators are encoded in their heritable genetic information) has been allowed to evolve. The hope is that more evolvable encodings (with respect to the environment) will emerge. Many of these simulations were implemented in Avida and its variants. Avida is an artificial life platform in which assembly-like computer programs self replicate (Ofria and Wilke, 2004). Baugh and McMullin (2013) and Hasegawa and Mc-Mullin (2013) have, respectively, designed replicators for the Tierra and Avida self-replication platforms in which part of the self-replicating program is interpreted as genetic information, and another part is interpreted as a decoding mechanism, that decodes the genetic information. By allowing both to evolve, the way that the replicators are encoded in the genetic portion can change over time.
Egri-Nagy and Nehaniv (2003) have implemented a variant of Avida in which each replicating program has its own, different, instruction set, which itself can evolve. The goal is similar; the way that a replicator's behaviour is encoded can change over time, and more evolvable encodings might be discovered. Webb and Knowles (2014) aimed to study the differing capacities to evolve evolvability between 'non-selfencoding' and 'self-encoding' replicators. In both cases, each replicator implement a decoder that interprets its genetic information, and the decoder itself can evolve over time. In 'self-encoders', the decoder determines the way in which it itself is encoded in the genetic information. The authors concluded that there may have been insufficient selection for evolvability in their simulations to distinguish between the two types of replicator.
In the evolutionary computation field, there is a general concern with evolvability in the sense of mechanisms that might improve the capacity for adaptive evolution. Placing the degree of mutational variation under evolutionary control has been studied by Eiben et al. (1999), mostly from an empirical perspective, though important theoretical work in this area has also been done (Rudolph, 2001). Reisinger and Miikkulainen (2006) list some ways in which evolvability has been allowed to evolve in evolutionary algorithms. For example, Ebner et al. (2002) study the evolution of evolvability in neutral networks, in which neighbouring genotypes can encode the same phenotype. Neutral mutations, whose relevance to evolvability was first outlined by Maynard Smith (1970), are those that leave the phenotype unchanged, while possibly changing the phenotypic neighbourhood. Reisinger and Miikkulainen give as other examples evolutionary algorithms with indirect encodings (Stanley and Miikkulainen, 2003) and Estimation-of-Distribution algorithms (Pelikan et al., 2002). Altenberg (1994) shows that in genetic programming, evolvability can evolve by the implicit selection of blocks of code for what he calls their 'constructional selection', or their ability to improve programs in the population when inserted into them.

Conclusion and Future Work
We have introduced a model of a population simultaneously evolving an adaptive trait A and a trait B that is the "evolvability with respect to A", with the aim of answering to what extent we should select for evolvability in order to maximize the total amount of adaptation over a given time period.
We have answered this question exactly in the special case that the population size is two, with one individual selected as the parent of the next generation. We find that the optimal weighting parameter γ is never less than 1/2 and is an increasing function of B 0 /(β(t end − 1)), asymptotically approaching 1, where B 0 is the initial evolvability, β is the standard deviation of mutations affecting B, and t end is the time at which we want to maximize A with respect to γ.
The model is straightforward to analyze with a population size of two, because the two individuals only ever differ in the mutations that happen within the current generation, and the mutations are independent and normally distributed; there are no residual differences between individuals from earlier mutations and there is no correlation between the A and B values within the population. As a result, the expected increase in A due to selection is a function of γ times the standard deviation of A mutations, and the expected increase in B is the same function of 1 − γ times the standard deviation of B mutations (see Fig. 3). For larger populations, the trait distributions are not normal, change over time, and become correlated, making analysis more difficult.
The applicability of our result is limited by the assumptions of the model, which are as follows.
• "Evolvability" is determined wholly by the B value, and is independent of time, environmental variables, and where each individual is on the A-landscape.
• The As and the Bs can increase without limit.
• We have perfect knowledge of the Bs. In practice, we would likely have to rely on estimates of the evolvability of an individual or lineage, derived from observations of the effects of past mutations.
In future work we aim to extend the model to overcome one or more of these limitations.

Appendix
If we have two random variables A and B both distributed as N (0, σ 2 ), then the maximum of A and B has the expected value where φ(x) is the pdf of the distribution. This can be understood as follows. We integrate over the possible values of A, multiplying the probability of getting that value by the probability that the B value is less than it (i.e., the probability that the A is the maximum of the pair). For each possibility we multiply by the value of A to get the expected value. We then do the same thing for the case where the B value is the greater of the pair. Because A and B have the same distributions, these integrals are equal, so we evaluate it once and double the result. Evaluating the integral gives the result In this paper we make use of the following result giving the expected values of the pair of numbers (out of two pairs) that has the maximum sum. Suppose we have four normal random variables distributed as A 2 ∼ N (0, σ 2 A ), B 2 ∼ N (0, σ 2 B ).
Let A m , B m be the A, B pair with the maximum sum. That is, m = argmax i∈{1,2} The expected value of A m is given by where Φ(a + b) = a+b −∞ φ A+B (c) dc , and φ A (x) is the pdf of each of A 1,2 , φ B (x) is the pdf of each of B 1,2 , and φ A+B is the pdf of each of A 1 + B 1 and A 2 + B 2 .
In words, we integrate over the possible values of A 1 and B 1 , multiplying the joint probability of getting those values by the probability that the sum of the other A, B pair takes a value less than A 1 + B 1 , and we multiply by the A 1 value to get its expected value. We then integrate over the possible values of A 2 and B 2 (for the case where the sum of the second pair is greater than the sum of the first), which gives the same integral again. Adding the two integrals together gives the expected value of A m .
Evaluating the above integral gives the value and by symmetry the expected value of B m is