TL;DR
This paper demonstrates that for the OneMax problem, strategies that maximize expected progress at each step are not always optimal, especially at certain fitness levels, and that more risk-tolerant approaches can lead to better overall performance.
Contribution
It proves that drift maximization is not always optimal for OneMax, revealing that more risk-tolerant mutation strategies can outperform drift-maximizing ones.
Findings
Optimal mutation strengths are larger than drift-maximizing ones at certain fitness levels.
Risk-tolerant strategies outperform expected progress maximization in some cases.
Optimal mutation strengths can be even, unlike drift-maximizing strategies.
Abstract
It may seem very intuitive that for the maximization of the OneMax problem the best that an elitist unary unbiased search algorithm can do is to store a best so far solution, and to modify it with the operator that yields the best possible expected progress in function value. This assumption has been implicitly used in several empirical works. In [Doerr, Doerr, Yang: Optimal parameter choices via precise black-box analysis, TCS, 2020] it was formally proven that this approach is indeed almost optimal. In this work we prove that drift maximization is not optimal. More precisely, we show that for most fitness levels between and the optimal mutation strengths are larger than the drift-maximizing ones. This implies that the optimal RLS is more risk-affine than the variant maximizing the step-wise expected progress. We show similar results for the…
| problem dimension | ||||||||||
| Algorithm | 100 | 500 | 1,000 | 1,500 | 2,000 | 2,500 | 3,000 | 3,500 | 4,000 | 4,500 |
| RLS | 433.4 | 2,975.0 | 6,645.0 | 10,576.7 | 14,678.4 | 18,906.4 | 23,235.2 | 27,647.7 | 32,131.9 | 36,678.8 |
| RLS | 433.6 | 2,975.3 | 6,645.2 | 10,576.9 | 14,678.6 | 18,906.7 | 23,235.5 | 27,648.0 | 32,132.2 | 36,679.0 |
| \cdashline2-11 RLS | 450 | 3,051 | 6,793 | 10,797 | 14,971 | 19,272 | 23,673 | 28,158 | 32,714 | 37,333 |
| (1+1) EA | 437 | 2,979 | 6,665 | 10,606 | 14,717 | 18,954 | 23,292 | 27,714 | 32,207 | 36,763 |
| (1+1) EA | 534 | 3,700 | 8,321 | 13,270 | 18,441 | 23,775 | 29,239 | 34,813 | ||
| (1+1) EA | 663 | 4,666 | 10,548 | 16,865 | 23,473 | 30,298 | 37,296 | 44,438 | ||
| \cdashline2-11 (1+1) EA | 437 | 2,989 | 6,682 | 10,648 | 14,795 | |||||
| (1+1) EA | 534 | 3,711 | 8,322 | 13,271 | 18,442 | 23,776 | 29,242 | 34,814 | ||
| (1+1) EA | 664 | 4,682 | 10,549 | 16,865 | 23,473 | 30,298 | 37,297 | 44,438 | ||
| \cdashline2-11 (1+1) EA | 550 | 3,781 | 8,458 | 13,475 | 18,712 | 24,113 | 29,644 | 35,284 | ||
| (1+1) EA | 679 | 4,751 | 10,684 | 17,066 | 23,740 | 30,631 | 37,695 | 44,902 | 52,233 | 59,672 |
| EA | 1,006 | 7,189 | 16,254 | 26,031 | 36,269 | 46,850 | 57,705 | 68,788 | 80,065 | |
| EA | 1,006 | 7,189 | 16,254 | 26,031 | 36,269 | 46,850 | 57,706 | 68,788 | 80,066 | 91,513 |
| (1+1) EA | 1,008 | 7,203 | 16,283 | 26,075 | 36,328 | 46,925 | 57,795 | 68,893 | 80,186 | 91,649 |
| \cdashline2-11 (1+1) EA | 1,058 | 7,461 | 16,807 | 26,867 | 37,389 | 48,257 | 59,398 | |||
| (1+1) EA | 1,071 | 7,510 | 16,896 | 26,992 | 37,550 | 48,451 | 59,626 | 71,028 | 82,625 | 94,392 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Maximizing Drift is Not Optimal for Solving OneMax
Nathan Buskulic1 and Carola Doerr1
( 1Sorbonne Université, CNRS, Laboratoire d’Informatique de Paris 6, Paris, France
)
Abstract
It seems very intuitive that for the maximization of the OneMax problem the best that an elitist unary unbiased search algorithm can do is to store a best so far solution, and to modify it with the operator that yields the best possible expected progress in function value. This assumption has been implicitly used in several empirical works. In [Doerr, Doerr, Yang: Optimal parameter choices via precise black-box analysis, TCS, 2020] it was formally proven that this approach is indeed almost optimal.
In this work we prove that drift maximization is not optimal. More precisely, we show that for most fitness levels between and the optimal mutation strengths are larger than the drift-maximizing ones. This implies that the optimal RLS is more risk-affine than the variant maximizing the step-wise expected progress. We show similar results for the mutation rates of the classic (1+1) Evolutionary Algorithm (EA) and its resampling variant, the (1+1) EA*>0*.
As a result of independent interest we show that the optimal mutation strengths, unlike the drift-maximizing ones, can be even.
1 Introduction
It is well understood that iterative optimization heuristics like local search variants, evolutionary algorithms, estimation of distribution algorithms, etc. can benefit from non-static choices of the parameters that determine their search radius, population size, or selective pressure. The question how to select these parameters dynamically is the subject of parameter control, which studies different techniques to achieve a good fit between suggested and optimal parameter values.
Complementing a diverse body of empirical works demonstrating advantages of parameter control mechanisms [KHE15, AM16], there is an increasing interest in proving such benefits by mathematical means [DD20]. Among the significant advances in this direction are, in chronological order (with respect to the conference announcements), the analysis of a success-based adaptation strategy for the choice of the offspring population size of the EA in distributed models of computation [LS11], the self-adjusting Genetic Algorithm (GA) using the one-fifth success rule [DD18], a learning-based selection of the search radii in Randomized Local Search [DDY16], and the self-adjusting [DGWY19] and self-adaptive [DWY18a] mutation rates in a and Evolutionary Algorithm (EA), respectively. All these references consider the optimization of OneMax, the problem of maximizing the counting-ones function . Only few theoretical results analyzing algorithms with adaptive parameters consider different functions, e.g., [LOW20, DLOW18, DDK18] (see [DD20] for a complete list of references). OneMax also plays a prominent role in empirical research on parameter control. In both communities, it is argued that the consideration of OneMax provides a “sterile EC-like environment” [FCSS08], in which the optimal parameter values are well understood.
In light of the existing literature it is interesting to note that most works, implicitly or explicitly, assume that for the considered algorithms the optimal strategy for the maximization of OneMax is a greedy selection of the best so far solution, and the variation of the same by the mutation rate/step size that maximizes the expected gain in function value [Bäc92, Bäc93, FCSS08, FCSS09]. Thierens [Thi09] explicitly argues that a particularly useful property of OneMax, which makes this problem a very suitable benchmark for adaptive operator selection, is the fact that the reward of an operator can be computed exactly. He then proceeds by comparing the step-wise expected fitness gains made by different operators, and ranks operators by this value. He thus uses as underlying assumption that drift-maximization is optimal.
That this widely believed-to-be-optimal drift-maximizing strategy is indeed almost optimal was formally proven in [DDY20]. More precisely, it is shown in [DDY20] that the best unary unbiased black-box algorithm for OneMax cannot be better by more than an additive term than the RLS variant that flips in each iteration the drift-maximizing number of bits in a best-so-far solution. Both algorithms have an expected optimization time , for a constant between and .
It was conjectured in [DW18, Section 3.1] that the drift-maximizing RLS is not only “almost” optimal, but indeed optimal. As mentioned, this conjecture was also—explicitly or implicitly—made in the empirical works cited above (and several other works on the OneMax function). We show in this work that this conjecture is false. More precisely, we show that maximizing drift is not optimal neither for RLS nor for the (1+1) EA nor for its resampling variant, the EA*>0*, suggested in [CD18b].
We explain where the difference between optimal and drift-maximizing strategies comes from, define precisely how to obtain the optimal mutation rates, numerically compute these for some selected dimensions up to , and analyze the differences between drift-maximizing and optimal mutation rates. We also compare the performances of optimal and drift-maximizing algorithms, and show that the differences in mutation rates/step sizes—albeit significant—result only in marginal differences in terms of overall running time. Given the above-mentioned results in [DDY20], the last statement is not surprising. The main contribution of our work is therefore not to be found in tremendous performance gains, but in new structural insights for the optimization of OneMax, the arguably most widely used benchmark for parameter control and adaptive operator selection mechanisms.
We note that the argument why drift-maximization is not optimal is quite easy to understand. Basically, our result is built upon the observation that the drift-maximizer values a potential fitness progress of by exactly this gain. More precisely, in the computation of the drift, the probability of creating an offspring of is multiplied by the difference , for each possible offspring . The optimal algorithms, however, value a fitness gain of by the gain in the expected remaining running time. Since this difference in expected remaining running time is much larger than the fitness difference, the optimal RLS and EA variants use mutation rates that are larger than the drift-maximizing ones. Put differently, they trade a smaller expected progress for a slightly larger probability of making a larger fitness gain. That is, the optimal algorithms are more risk-affine than the drift-maximizing ones. This quite intuitive fact seems to have been overlooked in the evolutionary computation (EC) community.
Our work has recently been extended to -type RLS and EAs [BD20]. In that work, not only the optimal mutation rates are computed, but also the expected remaining running times for sub-optimal mutation rates – information that can be used to identify weak spots of parameter control mechanisms.
Precise Running Time Bounds. While we focus in this work on very precise running time bounds for concrete problem dimensions, which we compute numerically, we note that there exists a significant body of related theoretical works, which focus on asymptotically optimal mutation rates and running times. In addition to the works mentioned above, which all deal with adaptive parameter schemes, we consider the following ones particularly interesting in the context of our study. For the classic RLS variant, which always flips exactly one bit in each iteration, the expected running time on OneMax was computed very precisely in [DD16]. For the EA with static mutation rate , the best known bounds are proven in [HPR*+*18] and in the recent work [HW19], which are precise up to an additive and term, respectively. For other static mutation rates, the best known results are available in [Wit13].
Online Repository. Codes and details for the here-described algorithms can be found on the GitHub page of this project at https://github.com/NathanBuskulic/OneMaxOptimal. The interested reader can find there not only the performance data, but also the drift-maximizing and optimal step sizes/mutation rates of the algorithms discussed below, for problem dimensions up to .
2 The OneMax Problem
OneMax, also referred to as counting-ones problem in the early works on evolutionary computation, is the problem of maximizing the function
[TABLE]
which simply assigns to each bit string the number of ones in it. OneMax is considered to be one of the “easiest” non-trivial benchmark problems, for two reasons. Firstly, a number of results exist that show that for several (classes of) algorithms the expected optimization time on OneMax is not bigger than that on any other unimodal function of the same dimension, cf. [DJW12, Sud13, CHJ*+*17] for examples. A second reason to declare OneMax as “easy”, yet useful, benchmark problem is its (presumably) simple structure, which allows us to understand well the optimization process of classical optimization heuristics. One structural property that is particularly useful in runtime analyses is the perfect fitness-distance correlation; i.e., whenever for two search points and , then the distance of to the optimum is strictly smaller than that of .
For readers wondering about the usefulness of a single benchmark instance, we note that for most evolutionary algorithms (EAs) and local search variants such as Randomized Local Search (RLS), Simulated Annealing, etc. the OneMax problem is identical to the problem of maximizing any of the functions , since for any the Hamming distance problem has a fitness landscape that is isomorphic to that of OneMax, and the mentioned algorithms are oblivious of the exact problem representation. That is, OneMax is essentially just one representative of the class of Hamming distance problems.
OneMax is often termed the “drosophila of EC”, because of the vast amount of literature studying this problem, both in empirical and in theoretical works. In the context of our study in particular the works [Bäc92, Bäc93, FCSS08, FCSS09, Thi09, BLS14, DD18, DDY20, DDY16, DGWY19, DWY18a, dPdLDD15, DW18] are worth mentioning, as they all study the benefits of non-static parameter choices on this problem, for different local search variants and evolutionary algorithms. Among these works, the empirical ones focus on operators that maximize the expected progress (“drift”) per each round, either without further justifying it, or explicitly mentioning that drift-maximization is optimal (an assumption that we will refute in Section 4). Among the theoretical works, most are interested in deriving asymptotic results only, with the only exception of [DDY20, DDY16], where very precise bounds for the optimization time of two adaptive RLS variants are proven. Most relevant to our work is the mentioned result from [DDY20] which proves that the drift-maximizing strategy mentioned above is indeed almost optimal. When we show in the next sections that the best possible RLS variant is not the drift-maximizing one, we know by the result from [DDY20] that the gain in expected optimization time cannot be more than an additive term.
3 Elitist (1+1) Unbiased Algorithms
We are concerned in this work with algorithms following the blueprint given in Algorithm 1. These algorithms start the optimization in a randomly chosen solution . In each iteration exactly one offspring is sampled by first copying and then flipping the entries of randomly chosen, pairwise different positions . The parent is replaced by its offspring if and only if , i.e., if and only if the offspring is at least as good as . Algorithms adhering to this scheme are referred to in the theory of EA literature as elitist unary unbiased black-box algorithms [DL17].
Elitist unary unbiased black-box algorithms differ only in the choice of the mutation strength . The two most commonly studied classes of algorithms are Randomized Local Search (RLS) variants, which use a deterministic choice of , and (1+1) Evolutionary Algorithms (EAs), which sample from , i.e., from a binomial distribution with trials and success rate . We note that traditionally constant choices, for RLS and for the EA, are studied, but here in this work we focus on non-static mutation strengths and mutation rates . More precisely, we study fitness-dependent choices and , which take into account the function value (fitness) of the current-best solution. In the terminology proposed in [DD20] such parameter control schemes classify as state-dependent, since the parameter value depends only on the current-best solution but not on any other information about the optimization process. The objective of our work is to identify the functions and that minimize the expected running time of RLS and the EA, respectively, when optimizing OneMax.
We add to our investigation the ** EA*>0***, which samples from a conditional binomial distribution , which is defined by and for . That is, the probability of the EA*>0* to flip bits equals that of the EA conditional on flipping at least one bit. The EA*>0* was suggested in [CD18b] as an algorithm that more closely resembles common implementations of the EA, cf. also discussions in [CD18a]. The EA*>0* can be seen as an intermediate algorithm between the RLS variant always flipping one bit and the (unconditional) EA, since for converging to 0 the distribution concentrates on 1, so that for small the behavior of the EA*>0* “converges” against that of RLS.
We note that other elitist unary unbiased black-box algorithms have been recently introduced. The fast Genetic Algorithm (GA) suggested in [DLMN17] samples the mutation strength from a power-law distribution, and is sampled from a normal distribution in the normalized EA studied in [YDB19]. We will nevertheless focus in this work on RLS and (1+1) EA variants only, simply because they are still the most commonly studied algorithms in evolutionary computation. We note though that an extension of our work in particular to results covering the normalized EAs would be interesting, since this algorithm class can be seen as a meta-model between the class of RLS algorithms and the class of EAs.
4 Maximizing Drift is Not Optimal
As mentioned in Section 3, our main interest is in identifying the functions , , and for which the following three algorithms have a best possible expected optimization time:
- •
RLS, the RLS variant flipping in each iteration exactly bits (i.e., using the deterministic mutation strength ),
- •
** EA,** the EA variant using standard bit mutation with mutation rate (i.e., the algorithm sampling the mutation strength from the binomial distribution ), and
- •
** EA,** the EA*>0* variant using conditional standard bit mutation flipping at least one bit with mutation rate (i.e., sampling the mutation strength from the conditional binomial distribution .
Note that, formally, we should write , , and , since these functions depend on the dimension. However, we shall often omit the explicit mention of the dimensions in order to ease the notation. The same applies to the corresponding functions , , and .
It may be surprising that, after so many years of research on the OneMax problem, none of the three algorithms above has been explicitly computed. As mentioned in the introduction, there are two main reasons explaining this situation. Firstly, it is widely believed that the functions , , and , which maximize in each step the expected fitness gain (drift) of flipping , , and bits, respectively, are optimal. As already discussed, such claims can be quite frequently found in the literature [Bäc92, FCSS08, DW18]. We will show in this section that these claims are not correct, by presenting examples which demonstrate that better expected optimization times can be achieved by choosing , , and , respectively. In Section 5 we will quantify the discrepancies between drift-maximizing and optimal (i.e., time-minimizing) functions for dimensions up to . Section 6 discusses the impact of these differences on the overall running time.
4.1 for
We first show that . That is, we study the drift-maximizing and the time-minimizing variants of RLS, which we call RLS and RLS in the following, and show that they are not identical. Interestingly, it suffices to regard for an example for which the two functions differ. The following table summarizes for the functions , , and the expected remaining running times and for RLS and RLS, respectively, when starting in a solution of fitness . In column we list the probability that a random initial solution has fitness value . Since uniform random initialization is used, . The last line provides the overall expected optimization time of both algorithms. Note that, by the law of total probability,
[TABLE]
where the “+1”-term accounts for the evaluation of the initial solution.
[TABLE]
As we see from the last line, the overall expected running time of RLS is 3.375 and thus strictly smaller than that of RLS, which is 3.75. We briefly explain how the entries in this table are computed.
Computation of RLS. We start our explanation with the computation of and . The function was defined above to be the one that maps each fitness value to the number of bits that need to be flipped in order to maximize the expected progress in fitness value, i.e., is defined to be the value of that maximizes the expression
[TABLE]
where we use in the last line the fact that flipping of the previously incorrect bits implies that we flip of the previously correct bits, which results in a fitness increase of . This event occurs with probability , since there are different ways of choosing previously incorrect bits, ways of choosing previously correct bits, and ways of choosing pairwise different bit positions. When two or more values exist that minimize this expression, we follow the convention made in [DDY20] and chose in all our computations below the smallest of these drift-maximizing mutation strengths, i.e., formally, k_{\text{drift}}(n,\ell)=\min\big{\{}\arg\max_{k}\mathbb{E}[\Delta(\ell,k)]\big{\}}.111In light of the results presented in this paper, it seems likely that for the better choice would be , but given the small discrepancies in the resulting running times (cf. Section 6) we do not investigate this question further.
It is easily seen that that for and flipping one bit is optimal, since this is the only mutation strength yielding positive drift. With this value of the expected remaining time to find the optimal solution is 3. For , the expected progress of , i.e., of flipping one bit, is , the expected progress of equals (note here that no fitness gain of one is possible), and the expected progress of is , since in this case we deterministically obtain an offspring of fitness . The best drift is thus obtained by operator , which implies . With this choice of the mutation strength, the expected remaining optimization time equals . In general, can be computed as
[TABLE]
where is the event that and . When then flipping all bits, i.e., applying is optimal, since it directly produces the optimal solution. Note here that, more generally, the function value of the bitwise complement of a solution equals .
With these values, the expected running time of RLS on is equal to .
Computation of RLS. We next discuss how to compute and . This time, we start our investigation by recalling that, by the law of total probability, the expected optimization time of RLS equals . The best-possible RLS algorithm is hence the one using at each fitness level the mutation strength which minimizes the expected remaining optimization time of flipping bits, which is equal to
[TABLE]
Formally, we set again . Note that the expression in (2) requires to know the values for . In order to compute one therefore has to start with fitness level . Once and are known, and can be computed, and one continues in this way until eventually reaching for which holds.
Applying these computations to our example with , we first easily obtain and , as in the drift maximizing case analyzed above. Given that , the only interesting case is fitness level . The expected remaining time equals . Since , a simple algebraic transformation shows . When flipping two bits, we either obtain the optimal solution (this happens with probability ) or we remain at the current fitness level, which shows that . Thus, . Finally, we compute that . We therefore see that and . With these values, we obtain that the expected optimization time of RLS on the 3-dimensional OneMax is .
Optimal Mutation Strengths Need Not be Uneven. With this example, we not only prove that , but we also make another interesting observation, which concerns the parity of the values . It was proven in [DDY20] that takes only odd values, since for every the drift of flipping bits is strictly smaller than that of flipping bits.
The example above shows that the situation is different for . More precisely, we have seen that in the situation and flipping 2 bits is optimal.
4.2 for
The only difference between the EA and RLS is the random choice of the mutation strength , which the EA samples from a binomial distribution . The EA is defined by choosing the fitness-dependent mutation rate which maximizes the expected progress
[TABLE]
where we recall that had been defined in equation (1).
Following the same arguments as in the definition of RLS in Section 4.1, the EA is defined by choosing in each fitness level the mutation rate which minimizes the expected remaining time, i.e., the expression
[TABLE]
where we abbreviate by the event that , , and .
As above, we thus need to determine first the values of and , then progress with the computation of and , etc.
For we obtain the following values, which prove that, like for RLS, drift-maximization is also not optimal for the EA.
[TABLE]
Another interesting observation that we can make by comparing this table with the corresponding one of RLS (Sec. 4.1) is that and . We will discuss this effect in more detail in Section 5.2.
4.3 for
EA and EA are defined by replacing in all definitions in Section 4.2 the binomial distribution by the conditional binomial distribution , and by replacing the formulas accordingly. We omit a detailed definition for reasons of space. All replacements are straightforward, the only particularity to pay attention to is that both the drift and the expected remaining time may be better for ever smaller values of . This happens when flipping one bit deterministically is better in terms of drift or expected running time, respectively, than using standard bit mutation. In this case we can either use the convention that the conditional standard bit mutation with mutation rate is to be interpreted as the operator (i.e., we set and for all ), or we set a lower bound for the mutation rate. The effects of the lower bound will be discussed in Section 6. When using , the situation for the EA*>0* for OneMax in dimension is given by the following table.
[TABLE]
5 Optimal RLS and (1+1) EA Variants
Using the formulas provided in Section 4 we can compute the optimal RLS, EA, and EA*>0* algorithms, as well as their drift-maximizing counterparts. Note, though, that the numerical evaluation of the binomial coefficients, as well as the optimization required to determine and is not straightforward. For the latter, we have used the bounded method of the scipy optimization module [JOP*+*]. The overall expected running times are summarized in Table 1, which can be found at the end of this paper.
5.1 Optimal Mutation Strengths
We start our comparison by considering the differences between the drift-maximizing and the optimal mutation strengths for RLS. Figure 1 plots the interesting region of and for ; the overall picture is very similar across all dimensions . In particular it holds for all that the curves cross at fitness level . For smaller values, the optimal mutation strengths are smaller or identical to drift-maximizing ones, and the situation is reversed for fitness levels . This can be explained by the formulas given in Section 4. While the drift-maximizer values a potential progress of by this same value, regardless of the current fitness level, the same potential progress is valued by . RLS is thus more risk-averse than RLS. Put differently, the latter makes use of the fact that an unlikely large fitness gain results in a larger reduction of the expected remaining optimization time than a more likely small fitness increase. RLS therefore accepts a smaller probability of an improving move, at the benefit of a potentially larger fitness increase. This observation also explains why the extreme-valued parameter adaptation method proposed in [FCSS08] showed better performance on OneMax than update schemes based on average gains.
It was proven in [DDY20] that an approximated drift-maximizer always flips only one bit when . For the actual drift-maximizer this has not been formally proven, but in all our numerical evaluations for dimensions up to we have for .
For dimension , we see from Fig. 1 that for and for . In this regime it is thus beneficial to invest one iteration to obtain, deterministically, a search point with function value .
The difference between the two functions becomes negligible for .
We do not plot the comparison of with nor that of vs. ; their curves, however, are similar to those of RLS.
5.2 Comparison of and
We have observed in Section 4.2 that for the values of were identical to . Likewise, we had observed that in this example . Figure 2 plots and for and Figure 3 plots and for ; the overall picture is the same for drift-maximizing and optimal functions in both cases.
While Figure 2 gives the global picture, Figures 3 zooms into the region in which is between 3 and 47. We observe that the mutation strength is always smaller, but very close to times the respective mutation rate. At the points at which and change value the difference between and is smallest.
6 Running Times
We now discuss the impact of the differences in mutation strengths and rates on the overall expected running times.
We start our comparison with the RLS and the EA*>0* variants. Figure 4 plots the by normalized optimization times of five different algorithms for 10 different problem dimensions between 100 and . We denote here and in the following by RLS the traditional RLS variant using static mutation strength . We see that there is practically no difference between RLS and RLS, and this despite the significant differences in the mutation strengths and . While the asymptotic result from [DDY20] guarantees that the absolute difference is bounded by , the absolute difference between the two algorithms is even less than 1 across all tested problem dimensions. The normalized running times of both algorithms increase from around for to around for . As we know from the theoretic result [DDY20] these values converge to 1 for growing dimension .
The differences between the EA and the EA to RLS are very small. The difference between the first two algorithms seems to be more significant than between drift-maximizing and optimal RLS variants, with a numerical difference between EA and EA of around for . We do not have an explanation for this comparatively large difference, but it may be caused by the numerical precision at which the results have been computed. More details about the EA*>0* will be discussed in Section 6.1.
Our next chart, Figure 5, compares the expected running times of different EA variants. We first note that we plot two different static versions, one using the asymptotically optimal static mutation rate , and the other one using the optimal static mutation rate per each dimension. The latter is slightly larger than , as was already proven in [CWA14]. Since they only computed the optimal static rates for , we also had to compute these for larger dimensions (using a direct computation, not the there-suggested matrix-based approach). Alternatively, we could have used the approximations suggested in [GW18], which extend the results of [CWA14] to the EA and to larger dimensions. The relative advantage over is not very pronounced, and decreases from around for to around for . The curves of the EA and the EA*>0* are practically indistinguishable in this plot. Like for RLS the absolute difference between the expected running time of the two algorithms is less than 1 for all tested dimensions, again despite significant differences in the functions and . We add to this chart a comparison with the EA using the fitness-dependent mutation rate (for ) suggested in [Bäc93]; we use for . Bäck obtained this mutation rate from numerical evaluations of in small dimensions . His algorithm performs only slightly worse than the true drift-maximizing EA, and, thus, as the EA.
6.1 Influence of on the EA*>0*
We have briefly mentioned in Section 4.3 that for the EA*>0* one needs to specify a lower bound for the mutation probability, since in some situations the optimal mutation rate is zero (when using the convention that deterministically returns one). For practical applications such small mutation rates may be undesirable, e.g., when using multiplicative success-based updates rules as suggested in [DW18]. We therefore investigate the influence of this lower bound on the expected running times. These normalized running times are plotted for six different algorithms in Figure 6. The drift-maximizing variants would be indistinguishable in this plot from the optimal ones, and are therefore omitted, except for the case , which we have already discussed in Figure 4. Note that the EA*>0* with optimal static mutation rate uses , and is therefore equal to RLS. The relative disadvantage of increasing to increases from around in dimension to around in dimension 3,500, both for the static and the adaptive variants. Further increasing to results in a relative disadvantage of for the static and from for the dynamic variants.
6.2 Anytime Performance
Fixed-Budget Results. While we have focused above on expected optimization times we will now follow the suggestion made in [DDY20] and provide a more detailed analysis of the anytime behavior of the algorithms. More precisely, we regard fixed-budget performance of RLS, RLS, and RLS. Only RLS and RLS are plotted in Figure 7, the curves of RLS and RLS are practically indistinguishable. Note that the numbers underlying the plot in Figures 7 and 8 (discussed in the next section) are the only ones in this paper that are not derived from theoretical bounds. We have performed a simulation of 500 independent runs of the three algorithms instead, and we used IOHprofiler [DWY*+*18b] to analyze the runtime data. We show not only the mean value, but also the standard deviation. The curves are well separated even when considering these, for all budgets up to around . Analyzing the data in more detail, we observe that the relative advantage in average function value decreases from 10% for budget 100 to 1% for budget . For larger budgets, the average fitness value is less than 1% larger for RLS than for RLS. However, as proven to hold in an asymptotic sense for the RLS in [DDY20], the average distance to the optimum is constantly about better for RLS than for RLS, for budgets up to . The average function values at this budget ( function evaluations) are slightly smaller than 990 for all three algorithms, RLS, RLS, and RLS. For larger budgets, the distance to the optimum is hence very small. This, in combination with the variance of our simulation, results in inconsistent relative advantages in terms of distance to the optimum for budgets greater than .
Fixed-Target Results. Using the same runtime data for the 500 runs, we can also compute fixed-target results, i.e., the function mapping each fitness level to the expected time needed to reach a solution of fitness . These values, of course, could also easily be computed theoretically from the results presented in Section 5.1, but we feel that the precision of the simulation suffices to demonstrate the main effects. The results are plotted in Figure 8.
It is not difficult to see that RLS is not optimal for minimizing the expected first hitting time of targets , simply because overshooting the target are disadvantageous for this optimization goal. For a similar reason, RLS is also not optimal in terms of maximizing the expected function value at a given budget of , i.e., when the budget is less than the expected overall optimization time of RLS.
6.3 Remaining Optimization Times
Finally, we take a look at the evolution of the expected remaining optimization time per each fitness level. These values, derived from our numerical evaluation of the theoretical bounds presented in Section 4, are plotted in Figure 9. While the algorithms with static mutation rates and strength are not able to profit from the fact that for each , we see an almost symmetric behavior for the adaptive algorithms. We also see again the influence of the lower bound in the EA*>0* variants, which are quite significant.
From this figure we can also compute the weights by which the RLS starting in a search point of fitness values a potential fitness progress of . We plot in Figure 10 the gradient of the curves plotted in Figure 9. That is, for every we plot the values for RLS and RLS. We recall that RLS values a potential fitness progress of by the same value . We thus clearly see that RLS gives much more importance to large fitness gains, and hence uses the already discussed more risky strategy aiming at potentially larger fitness gains, at the cost of a larger probability of creating an offspring that will be discarded.
6.4 Best Unary Unbiased Algorithms for OneMax
Note that plot in Figure 9 also raises the question how much the algorithms lose in performance by being forced to be elitist. Note that slightly better algorithms are possible when allowing them to first decrease the function value to 0 and then inverting the bit string. For the adaptive algorithms, this would clearly bring more flexibility, and a provable positive advantage over the elitist algorithms studied in this work. Put differently, the best unary unbiased black-box algorithm for OneMax is slightly better than RLS. The almost perfect symmetric shape of the algorithms in Figure 9, however, indicates that the advantage is very small. A rigorous quantification, which we consider to be of rather philosophical benefit, is left for future work.
7 Discussion
We have shown that the assumption that drift-maximization is optimal for solving the OneMax problem is not correct, neither for RLS, nor the EA, nor the EA*>0*. A more risky strategy turns out to be optimal. However, while the differences in the drift-maximizing and the optimal mutation rates are significant (Figure 1), the difference in expected running time is negligibly small already for very small dimensions. The structural findings made here for the OneMax problem also applies in a broader sense to the optimization of non-deceptive problems. Already for linear functions like BinVal, the difference between drift-maximizing and optimal RLS and EA variants may be more substantial than for OneMax. We also note that, while we have restricted ourselves to (1+1)-type algorithms, similar effects also hold for population-based EAs.
The computation of the drift-maximizing and time-minimizing mutation strengths and rates are quite tedious and require several days of computing time already for moderate dimension. In order to obtain valid baseline algorithms for larger dimensions, it would be desirable to derive closed formula expressions that approximate these functions sufficiently well. Note that the formula provided by Bäck for the drift-maximizer (cf. discussion in Section 6) seems to allow to derive quite reliable predictions for the drift-maximizing (1+1) EA as seen in Figure 5.
Acknowledgments.
We thank Thomas Bäck for several valuable discussions on the history of adaptive parameter settings.
Our research benefited from the support of the Paris Ile-de-France Region, a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH, in a joint call with Gaspard Monge Program for optimization, operations research and their interactions with data sciences, and COST Action CA15140 on ’Improving Applicability of Nature-Inspired Optimisation by Joining Theory and Practice (ImAppNIO)’ supported by COST (European Cooperation in Science and Technology).
[FIGURE:]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AM 16] Aldeida Aleti and Irene Moser. A systematic literature review of adaptive parameter control methods for evolutionary algorithms. ACM Computing Surveys , 49:56:1–56:35, 2016.
- 2[Bäc 92] Thomas Bäck. The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm. In Proc. of Parallel Problem Solving from Nature (PPSN’92) , pages 87–96. Elsevier, 1992.
- 3[Bäc 93] Thomas Bäck. Optimal mutation rates in genetic search. In Proc. of the 5th International Conference on Genetic Algorithms (ICGA’93) , pages 2–8. Morgan Kaufmann, 1993.
- 4[BD 20] Maxim Buzdalov and Carola Doerr. Optimal mutation rates for the ( 1 + λ ) 1 𝜆 (1+\lambda) EA on One Max. In Proc. of Parallel Problem Solving from Nature (PPSN’20) , volume 12270 of LNCS , pages 574–587. Springer, 2020.
- 5[BLS 14] Golnaz Badkobeh, Per Kristian Lehre, and Dirk Sudholt. Unbiased black-box complexity of parallel search. In Proc. of Parallel Problem Solving from Nature (PPSN’14) , volume 8672 of Lecture Notes in Computer Science , pages 892–901. Springer, 2014.
- 6[CD 18a] Eduardo Carvalho Pinto and Carola Doerr. A simple proof for the usefulness of crossover in black-box optimization. In Proc. of Parallel Problem Solving from Nature (PPSN’18) , volume 11102 of Lecture Notes in Computer Science , pages 29–41. Springer, 2018. Full version available at http://arxiv.org/abs/1812.00493 .
- 7[CD 18b] Eduardo Carvalho Pinto and Carola Doerr. Towards a more practice-aware runtime analysis of evolutionary algorithms. Co RR , abs/1812.00493, 2018.
- 8[CHJ + 17] Dogan Corus, Jun He, Thomas Jansen, Pietro Simone Oliveto, Dirk Sudholt, and Christine Zarges. On easiest functions for mutation operators in bio-inspired optimisation. Algorithmica , 78:714–740, 2017.
