On the Robustness of Median Sampling in Noisy Evolutionary Optimization
Chao Bian, Chao Qian, Yang Yu, Ke Tang

TL;DR
This paper introduces median sampling into evolutionary algorithms to improve robustness against noise, demonstrating theoretically that it can exponentially reduce runtime under certain noise conditions, with practical guidance for its application.
Contribution
The paper presents the novel use of median sampling in EAs, providing theoretical analysis of its advantages and limitations compared to mean sampling under noisy conditions.
Findings
Median sampling reduces expected runtime exponentially under onebit noise.
Median sampling outperforms mean sampling when the noise's 2-quantile increases with true fitness.
Median sampling may fail when the noise's 2-quantile does not increase with true fitness.
Abstract
Evolutionary algorithms (EAs) are a sort of nature-inspired metaheuristics, which have wide applications in various practical optimization problems. In these problems, objective evaluations are usually inaccurate, because noise is almost inevitable in real world, and it is a crucial issue to weaken the negative effect caused by noise. Sampling is a popular strategy, which evaluates the objective a couple of times, and employs the mean of these evaluation results as an estimate of the objective value. In this work, we introduce a novel sampling method, median sampling, into EAs, and illustrate its properties and usefulness theoretically by solving OneMax, the problem of maximizing the number of 1s in a bit string. Instead of the mean, median sampling employs the median of the evaluation results as an estimate. Through rigorous theoretical analysis on OneMax under the commonly used onebit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications
\ArticleType
RESEARCH PAPER \Year2020 \Month \Vol \No \DOI \ArtNo \ReceiveDate \ReviseDate \AcceptDate \OnlineDate
On the Robustness of Median Sampling in Noisy Evolutionary Optimization
\AuthorMark
Bian C
\AuthorCitation
Bian C, Qian C, Yu Y, et al
On the Robustness of Median Sampling
in Noisy Evolutionary Optimization
Chao BIAN
Chao QIAN
Yang YU
Ke TANG
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
Shenzhen Key Laboratory of Computational Intelligence, Department of Computer Science and Engineering,
Southern University of Science and Technology, Shenzhen 518055, China
Abstract
Evolutionary algorithms (EAs) are a sort of nature-inspired metaheuristics, which have wide applications in various practical optimization problems. In these problems, objective evaluations are usually inaccurate, because noise is almost inevitable in real world, and it is a crucial issue to weaken the negative effect caused by noise. Sampling is a popular strategy, which evaluates the objective a couple of times, and employs the mean of these evaluation results as an estimate of the objective value. In this work, we introduce a novel sampling method, median sampling, into EAs, and illustrate its properties and usefulness theoretically by solving OneMax, the problem of maximizing the number of 1s in a bit string. Instead of the mean, median sampling employs the median of the evaluation results as an estimate. Through rigorous theoretical analysis on OneMax under the commonly used onebit noise, we show that median sampling reduces the expected runtime exponentially. Next, through two special noise models, we show that when the 2-quantile of the noisy fitness increases with the true fitness, median sampling can be better than mean sampling; otherwise, it may fail and mean sampling can be better. The results may guide us to employ median sampling properly in practical applications.
keywords:
Evolutionary algorithms, noisy optimization, median sampling, computational complexity, runtime analysis
1 Introduction
As a kind of general-purpose optimization algorithms, evolutionary algorithms (EAs) [1] have wide applications in practical optimization problems [2, 3]. During the optimization procedure, the obtained objective (i.e., fitness) value is usually inaccurate because of noise [4]. For example, in machine learning, the estimated performance of a prediction model usually deviates from the true performance because the model is evaluated on a limited amount of data; in aerodynamic design, the computational fluid dynamics (CFDs) simulation is needed to evaluate the performance of a given structure, which is usually computationally expensive and approximated, leading to noisy fitness. The existence of noise may mislead the search direction and deteriorate the efficiency of EAs. Therefore, it is important to handle noise in fitness evaluation during evolutionary optimization.
The sampling strategy independently evaluates the fitness times, where is the sample size, and then the mean of these samples is used to estimate the exact fitness. Sampling is very popular to tackle noise, because it has a -fold reduction in the variance of the noisy evaluation. Meanwhile, it also has a -fold increase in the computation time, thus some variants are proposed, including adaptive sampling [5, 6] and sequential sampling [7, 8], which decide the value of dynamically in each generation. However, there has been a great lack of the theoretical understanding for sampling.
Runtime analysis, an important theoretical aspect for EAs, has achieved a lot of progresses [9, 10, 11, 12, 13, 14] recently. However, they mainly consider exact environments, and the results on noisy evolutionary optimization is rare. Noise increases the randomness in the optimization procedure, making the analysis more difficult. As a representative evolutionary algorithm, (1+1)-EA maintains one solution in the population, and generates a new solution in each iteration by mutating the parent solution. It was first studied on two frequently-used pseudo-Boolean problems, OneMax (OM) and LeadingOnes (LO). The goal of OM is maximizing the number of 1s in a solution, while the goal of LO is maximizing the number of continuous 1s from the first bit in a solution. Runtime analysis for the two problems under various noise models [15, 16, 17, 18, 19, 20] showed that only if the noise level is low, (1+1)-EA can quickly find the optimum. For instance, onebit noise is a frequently-used noise model in theoretical analysis. With probability , it changes a uniformly selected bit in a solution before evaluation, leading to a random fitness value. For OM of size under onebit noise, the expected runtime (ERT) of (1+1)-EA is superpolynomial if . There are also some studies concerning the effectiveness of various strategies to tackle noise, e.g., threshold selection [19, 21, 22], populations [16, 18, 20, 23, 24] and sampling [25, 26, 27]. For instance, if , the ERT of -EA optimizing OM under onebit noise is polynomial for any (note that denotes the noise probability). Several works also show the robustness of the compact genetic algorithm [28] and a simple ant colony optimization algorithm [29, 30, 31, 32] against noise.
The above mentioned runtime analyses concerning sampling [25, 26, 27] revealed that the exponential runtime under high noise levels can be turned to be polynomial by sampling, and the sample size may be critical to the effectiveness of sampling. Moreover, Akimoto et al. [33] showed that optimization under unbiased noise can perform like exact optimization, if the sample size is large enough. In these works, the sampling strategy utilizes the mean of the samples as an approximation of the true fitness. Then a natural question is whether other information of the samples can be used to make EAs more robust against noise.
Note that mean is actually a measure of central tendency, and thus, it is straightforward to use another widely known measure median. Compared to mean, median has the advantage of being insensitive to outliers. For example, “breakdown point” [34, 35] is a commonly used indicator for insensitivity, which denotes the minimum ratio of variables that need to be contaminated to make the estimator become infinite (i.e., cause breakdown). The breakdown point of mean is close to 0 because a single bad observation can make the mean become infinite, whereas the breakdown point of median is 0.5 because median becomes infinite only if more than 50% of the variables become infinite. In fact, economists use the sample median frequently when reporting statistics concerning certain economic measures, e.g., household income [36].
In this paper, we introduce the sampling strategy using median (called median sampling) into EAs and theoretically examine its effectiveness. Instead of taking the mean, median sampling takes the median of the samples as an estimate for the fitness. In order to better distinguish the two sampling strategies, we call the original sampling strategy mean sampling in the following context. We will consider (1+1)-EA solving noisy OM, and derive the ERT for reaching the optimum (with respect to the exact objective). Following is our main results:
- •
For OM under onebit noise with any , we prove that the ERT of (1+1)-EA is polynomial when median sampling with is used. Previous analysis [17] has proved that the ERT of (1+1)-EA is polynomial only if . Thus, the result shows the robustness of median sampling against noise.
- •
For OM under segmented noise, we show that the ERT of (1+1)-EA using median sampling is polynomial, while the ERT of (1+1)-EA using mean sampling is exponential. The results show that median sampling can be a better choice, if the 2-quantile of the noisy fitness increases with the true fitness. Note that the noisy fitness is a random variable, and the 2-quantile of a random variable is the value satisfying and (i.e., the median of ).
- •
For OM under partial noise, we show that (1+1)-EA employing median sampling fails, while (1+1)-EA employing mean sampling works. The results suggest that it would be better to choose other strategies if the 2-quantile of the noisy fitness doesn’t increase with the true fitness.
Note that in parallel with our work, Doerr and Sutton [37] showed that median sampling can handle the negative impact of noise for an integer valued objective function , if satisfies the -concentrate condition, that is, and , where denotes the noisy objective value of . They also considered two specific cases to show the superiority of median sampling over mean sampling. For OM under additive Cauchy noise with parameter , they showed that the runtime of (1+1)-EA is superpolynomial w.h.p. (with high probability) if mean sampling is used, and the runtime is polynomial w.h.p. if median sampling is used. For LO under bitwise noise [27] satisfying , they showed a superpolynomial ERT for (1+1)-EA if mean sampling with is used, and the runtime is polynomial w.h.p. if median sampling with is used.
The remaining paper is presented as follows. First, Section 2 presents preliminaries. Then, Section 3 analyzes the effectiveness of median sampling. Next, Sections 4 and 5 compare median sampling with mean sampling, and Section 6 provides some guidance for employing median sampling in practice. Finally, Section 7 makes a conclusion.
2 Preliminaries
We first present the OM problem as well as (1+1)-EA which will be considered in this paper. Next, we present the sampling strategy. The analysis tool is presented in the end.
2.1 OneMax Problem
We consider the frequently-used pseudo-Boolean function OM. Its goal is maximizing the number of 1s (namely, the bits with value 1) in a solution. Note that 11…1 (denoted as ) is the unique optimal solution. The ERT of (1+1)-EA solving OM (without noise) is [38]. For notational convenience, will be used to represent the number of 0s (namely the bits with value 0) of .
Definition 2.1**.**
The goal of the OM Problem with size is finding a binary string to maximize .
2.2 (1+1) Evolutionary Algorithm
(1+1)-EA reflects the general structure of EAs, and is widely analyzed to theoretically understand the behavior of EAs. Different from exact optimization, only a noisy fitness value can be obtained in noisy environments, and the value is a random variable because the noise may disturb the solution or the objective value randomly. For example, there are two kinds of widely used noise models: posterior and prior. The posterior noise comes from the variation on the fitness of a solution, e.g., , where is randomly drawn from some distribution. The prior noise comes from the variation on a solution, i.e., , where is generated from by random perturbations. Therefore, line 5 in Algorithm 1 changes from the true fitness “” to the noisy fitness “”. In the optimization process, reevaluation strategy, which evaluates both the offspring and parent solutions in each generation, is used as in [17, 18, 29]. In the optimization procedure of an EA, fitness evaluations are the most time-consuming part, thus we will simply define its runtime as the number of objective evaluations. Termination condition is the finding of the optimum w.r.t. the exact objective [17, 18, 33]. In this work, we consider maximizing .
2.3 Median Sampling
Mean sampling has often been used in noisy evolutionary optimization to tackle noise [5, 7]. As described in Definition 2.2, it uses the mean of independent evaluations to approximate the true fitness , where is called the sample size. By mean sampling, the output is close to the mathematical expectation of . As described in Definition 2.3, median sampling takes the median of independent evaluations to approximate the true fitness . By median sampling, the output is close to the 2-quantile of , namely .
Definition 2.2** (Mean Sampling).**
The objective value of is evaluated independently times, then
[TABLE]
is output, where denote noisy fitness values.
Definition 2.3** (Median Sampling).**
The objective value of is evaluated independently times, then
[TABLE]
is output, where denote the ordered noisy fitness values.
When mean (or median) sampling is used, line 5 in Algorithm 1 becomes “” (or “”). For both of the sampling strategies, means that sampling is not used.
2.4 Analysis Tool
It is straightforward to model the evolutionary optimization procedure as a Markov chain , because the subsequent procedure only depends on the current state. For (1+1)-EA optimizing OM, we can simply set the chain’s state space as and the optimal state as (namely and ). The first hitting time (FHT) of is . If the chain’s initial state is , then its expected FHT (EFHT) is denoted as . If obeys a distribution (denoted as ), then its EFHT is defined as . For (1+1)-EA, the initial solution is evaluated once, then in each iteration, the parent solution and the offspring solution both need to be evaluated. Note that the initial solution is generated randomly from , thus the ERT of (1+1)-EA is , where denotes the uniform distribution. For (1+1)-EA using sampling, the ERT becomes , because it needs to perform independent evaluations for each solution.
As presented in Theorem 2.4, the additive drift theorem aims to derive upper bounds of EFHT. To use the approach, first we need to design a function as a measurement for the difference between a state and the optimal state, and should satisfy for any optimal and otherwise. Next, we need deriving a lower bound for , i.e., the progress towards in each generation. Finally, we can upper bound EFHT through dividing by . When the context is clear, will be briefly denoted as .
Theorem 2.4** (Additive drift [39]).**
Given and , if such that and with ,
[TABLE]
then .
3 The Robustness of Median Sampling Against Onebit Noise
Onebit noise is commonly used in theoretical analyses [17, 18, 26, 27]. With probability , it changes a uniformly selected bit in before is evaluated. For OM under such noise model, the ERT of (1+1)-EA is superpolynomial for [17]; the ERT is polynomial if using mean sampling with [27]. Theorem 3.4 shows that the ERT is polynomial if using median sampling with , which illustrates that median sampling can efficiently tackle noise.
Definition 3.1** (Onebit Noise).**
Suppose denotes the noisy/true objective function. Then
[TABLE]
where and is derived by changing a uniformly selected bit in .
To prove Theorem 3.4, we present Lemma 3.2 to analyze under onebit noise by taking a sample size of . It intuitively means is close to the 2-quantile of w.h.p.
Lemma 3.2**.**
Under onebit noise, if median sampling with is used, then
- (i)
if , then ; 2. (ii)
if , then ; 3. (iii)
if , then ; furthermore, if also holds, then .
Proof 3.3**.**
First we consider (i). Suppose and for a constant . Suppose denotes the number of noisy evaluations satisfying in independent noisy evaluations. Observe that in each evaluation, , thus . Then we get
[TABLE]
where the last inequality is derived according to Hoeffding’s inequality. Therefore,
[TABLE]
*By the definition of median sampling, , thus the claim holds. We can similarly prove (ii).
Now we consider (iii). Under onebit noise, can take at most three values (i.e., , , ), thus can only take one of the three values by the definition of median sampling and . Note that , then similar to case (i), .
Then we consider the “furthermore” clause. By , we also derive . Thus, , i.e., the claim holds.
Combining the above analysis, the Lemma holds. ∎*
Theorem 3.4**.**
For OM under onebit noise, the ERT of (1+1)-EA employing median sampling with is polynomial.
Proof 3.5**.**
The main idea is applying Theorem 2.4. We consider three cases for and in each case, we will design a distance function and we need to examine for . Suppose , . For ease of notation, let ( is mutated from ), and . For ease of analysis, the drift is divided into and . That is,
[TABLE]
where
[TABLE]
[TABLE]
*(1) . is designed to be , namely the number of 0s in .
For , we consider mutating only one zero bit in (namely ), and its probability is . Then will replace if and . Conditions of Lemma 3.2-(iii) hold because , then*
[TABLE]
Thus,
[TABLE]
*where the last inequality is by is large enough.
For , we consider the increase of 0s. For satisfying , accepting it implies or . Note that conditions of (iii) in Lemma 3.2 are satisfied, then we have*
[TABLE]
Thus,
[TABLE]
*where the last equality is by is large enough.
Subtract from , we get*
[TABLE]
where the last equation derives from large enough . Therefore, by Theorem 2.4, , because . Note that each iteration needs fitness evaluations, we can derive a polynomial ERT.
(2) . The proof procedure is similar to case (1), but the is more complicated because the effect of the noise on a solution may vary as changes. The distance function is as follows:
[TABLE]
*We consider five cases for .
(2a) .
For , we also consider mutating only one zero bit in , then . Note that , thus . By (ii) in Lemma 3.2,*
[TABLE]
*If , then ; else . Thus, and .
Now we consider . For satisfying , . Thus, by (ii) in Lemma 3.2,*
[TABLE]
*implying . Accordingly, .
(2b) . Note that .
First we consider the positive drift . By , there always exists some such that and such can be mutated from by flipping at most seven 0s. Thus,*
[TABLE]
Note that and , thus by (iii) in Lemma 3.2. Therefore, will replace with probability . Moreover,
[TABLE]
*we have .
For , we consider with . If , Eq. (12) holds and we have , then we get*
[TABLE]
*where the last equality is by is large enough.
If , then any satisfying will never be accepted under onebit noise. For satisfying , we have , thus . Then . Combining the two cases for , we get .
(2c) . First we examine . Note that and , thus by (iii) in Lemma 3.2.
For , we consider mutating only one zero bit in , namely . Note that and , we can derive Eq. (6) and .
For , we consider two cases for satisfying . If , accepting implies . Thus, . If , then and , thus . Then Eq. (8) still holds, i.e., . Combining the two cases, .
(2d) .
First we consider the positive drift . Note that*
[TABLE]
* can generate an offspring with by flipping at most seven bits, whose probability is at least . Then we examine . Because*
[TABLE]
we derive by (iii) in Lemma 3.2. Note that always holds under onebit noise, we get . If , we get
[TABLE]
else and
[TABLE]
*Thus, .
For , it is only necessary to take satisfying into account because with will be rejected. If , we have . Thus,*
[TABLE]
and
[TABLE]
*which implies that Eq. (8) holds by (iii) in Lemma 3.2. If , we have . By , we have . Combining the two cases, .
(2e) . If , then , thus we only need to consider that , namely .
For , we consider mutating only one zero bit in , i.e., . Similar to the above analysis, . Note that , we derive by (i) in Lemma 3.2. Thus, . Note that , thus .
For , it is only necessary to take with into account. Note that , thus by Lemma 3.2, , implying that . Then we have .
Combining the five cases, we have and . By subtracting from , Eq. (10) becomes*
[TABLE]
and we can also derive a polynomial ERT.
(3) . The effect of the noise changes when the level of the noise changes. Accordingly, we need to design a new distance function:
[TABLE]
*Next we consider three cases for .
(3a) . The proof procedure is the same as case (2a), except that “” changes to . We derive and .
(3b) . Note that .
First we consider the positive drift . There exists some with and such can be mutated from by flipping at most 0s. Thus,*
[TABLE]
If , then . Thus, by (i) in Lemma 3.2. If , it can be verified that will always be accepted under onebit noise. Note that
[TABLE]
*Thus, we have .
For , the proof procedure is the same as that of case (2b), except that “” changes to . Thus, we get .
(3c) . The analysis for and is the same as that of case (2e), then we have and .
Combining the three cases, we have and . Subtract from , we get*
[TABLE]
and we can also derive a polynomial ERT. ∎
By the above proof, we can give an intuitive explanation for the effectiveness of median sampling. For and which satisfy , when the 2-quantile of is larger than that of , will be estimated better than by median sampling w.h.p., implying a correct comparison.
4 Cases Where Median Sampling is Better than Mean Sampling
For OM under segmented noise (Definition 4.1), we show that (1+1)-EA equipped with median sampling can do better than (1+1)-EA using mean sampling. The segmented noise is from [25], but we make a little modification to simplify the analysis. As presented in Definition 4.1, the noisy evaluation of a solution can be divided into three segments. The objective evaluation is accurate in the first segment, but inaccurate in other segments because of noise. We show that for OM under segmented noise, the ERT of (1+1)-EA using mean sampling is exponential (i.e., Theorem 4.2); and the ERT of (1+1)-EA employing median sampling with is polynomial (i.e., Theorem 4.6). The analyses show that median sampling can be better if the 2-quantile increases with the true fitness.
Definition 4.1**.**
*, its noisy objective is defined as follows:
(1) if , ;
(2) if ,*
[TABLE]
(3) if ,
[TABLE]
where .
Theorem 4.2 shows that mean sampling fails under segmented noise and the reason is similar to that found in [25]. Consider and satisfying . In segment (2), a small sample cannot eliminate the impact of noise, and is still very large. In segment (3), the expected gap between and is positive. Therefore, a larger sample size will enlarge and performs worse; moreover, no medium sample size makes a good tradeoff. Therefore, mean sampling fails. Its rigorous proof can be derived directly from Theorem 5.2 in [25], because the change of noise doesn’t affect the proof.
Theorem 4.2**.**
For OM under segmented noise, the ERT of (1+1)-EA employing mean sampling is exponential.
To prove Theorem 4.6, Lemma 4.3 is used. This lemma can upper bound the runtime, when the true better solution has a large probability to be recognized as better. Note that denotes some solution with 0s, and denotes the estimated fitness of a solution.
Lemma 4.3** ([18]).**
The EFHT of (1+1)-EA solving noisy OM is polynomial if
[TABLE]
We also present Lemma 4.4 to analyze under segmented noise by taking a sample size of .
Lemma 4.4**.**
Under segmented noise, if median sampling with is used, then if ; if .
Proof 4.5**.**
The main procedure is analogous to Lemma 3.2. If , suppose there are noisy evaluations where in independent noisy evaluations. Then Eq. (2) also holds and . If , we similarly have . Thus, the lemma holds. ∎
Theorem 4.6**.**
For OM under segmented noise, the ERT of (1+1)-EA employing median sampling with is polynomial.
Proof 4.7**.**
*The main idea is applying Lemma 4.3. Given , let . To analyze , we consider four cases for .
(1) . Note that and is larger by considering and , respectively. Therefore, we get .
(2) . If , we have , because and . If , by Lemma 4.4, we have*
[TABLE]
(3) . The analysis is analogous to case (2). If , then . If , then
[TABLE]
Combining the three cases, we have shown for sufficiently large . Then, by Lemma 4.3, the EFHT is polynomial. In each iteration, the algorithm needs evaluations, thus the ERT is polynomial. ∎
5 Cases Where Mean Sampling is Better than Median Sampling
For OM under partial noise (Definition 5.1), we show that (1+1)-EA using median sampling is sometimes worse than (1+1)-EA using mean sampling. For partial noise presented in Definition 5.1, a false objective value is returned when . We prove that for OM under partial noise, the ERT of (1+1)-EA employing mean sampling with is polynomial (i.e., Theorem 5.2); and the ERT of (1+1)-EA employing median sampling is exponential (i.e., Theorem 5.5). The analyses suggest that median sampling may fail if the 2-quantile of the noisy fitness doesn’t increase with the true objective value, and it is better to choose other strategies.
Definition 5.1**.**
*, its noisy objective is defined as follows:
(1) if , ;
(2) if ,*
[TABLE]
Theorem 5.2**.**
For OM under partial noise, the ERT of (1+1)-EA employing mean sampling with is polynomial.
Proof 5.3**.**
*The main idea is applying Lemma 4.3. Given , let . To analyze , we classify into two cases.
(1) . We have and , thus .
(2) . First we need to derive . Note that . We classify into two cases. (a) If , then , thus*
[TABLE]
(b) If , then and . Thus, we have . Then we have
[TABLE]
*where the second inequality holds by and Hoeffding’s inequality.
Similar to the discussion at the end of Theorem 4.6, the ERT is polynomial. ∎*
From the proof, we can derive an intuitively explanation for the effectiveness of mean sampling. For and satisfying (i.e., is worse), the expectation of is larger than . Then, there is a small enough probability to accept if using mean sampling. Thus, the search direction of (1+1)-EA will not be misled and the optimal solution can be quickly found.
To prove Theorem 5.5, we use Lemma 5.4 [18], which intuitively means that if a true worse solution (i.e., a solution with more 0s) is estimated better than a true better solution with some probability, then we can derive the lower bound for the runtime.
Lemma 5.4** ([18]).**
If there exists a real number satisfying
[TABLE]
then w.h.p., the FHT of (1+1)-EA solving noisy OM is .
Theorem 5.5**.**
For OM under partial noise, the ERT of (1+1)-EA employing median sampling is exponential.
Proof 5.6**.**
*We use Lemma 5.4 for the proof. Given , let . First we show that for . Suppose denotes the number of noisy evaluations satisfying in independent noisy evaluations. We classify into 2 cases.
(1) is even. Let*
[TABLE]
Note that sum of the three items is 1, and ,
[TABLE]
*Thus, . By definition of median sampling, we derive .
(2) is odd. We have*
[TABLE]
*Thus, . By the definition of median sampling, we can derive that .
To make , it is sufficient that since it always holds that . Thus, for . Then, the condition of Lemma 5.4 holds by setting . Thus, the EFHT is , i.e., exponential. ∎*
From the analysis, we can give an intuitive explanation for the failure of median sampling. Consider and satisfying (that is, is worse), the 2-quantile of is larger than that of , and will be estimated better than by median sampling w.h.p., implying a wrong comparison.
6 Application Illustration
In this section, we provide some guidance for employing median sampling in practice. The theoretical results have revealed that if the 2-quantile of the noisy fitness increases with the true fitness, we can use median sampling to tackle noise. Inspired by this finding, we may use the following three steps to check the effectiveness of median sampling in practice.
Find a sequence of solutions with increasing true objective values. Note that the solution space can be very large, and we only need to find some representative solutions. The true fitness of a solution can be obtained by conducting evaluation accurately, instead of using an approximation. For example, a prediction model in machine learning can be evaluated using a large amount of data, and a structure in aerodynamic design can be evaluated by CFDs simulation. Note that the number of representative solutions is very limited, and the evaluation process can be easily parallelized, thus the computational cost is usually acceptable. 2. 2.
Find an appropriate sample size , such that the 2-quantile of the noisy fitness increases with the sequence. If such sample size doesn’t exist or the sample size is too large, it would be better to choose other strategies. 3. 3.
If finding such a sample size, evaluate each solution times independently and output the median of the objective values as the estimated fitness during the optimization procedure.
As an application illustration, we use (1+1)-EA to solve OM under onebit noise. It has been known that the ERT of (1+1)-EA solving OM under onebit noise is super-polynomial if the noise probability , thus we set . We set the problem size and use as the sequence of solutions with increasing true fitness. We select the sample size from , such that the 2-quantile of the noisy fitness increases with the sequence. Figure 2 shows that it holds when . Thus, using a sample size is probably enough to reduce the negative effect of noise.
To show the effectiveness of median sampling, we next compare the ERT of (1+1)-EA with and without median sampling for the problem size . For each , we run (1+1)-EA 100 times independently. In each run, we record the number of fitness evaluations until an optimal solution with respect to the true fitness function is found for the first time. The total number of evaluations of the 100 runs are averaged as the estimation of the ERT. The results are shown in Figure 2. It can be observed that though using median sampling needs to evaluate a solution times for estimating the fitness, the total number of evaluations required by (1+1)-EA to find the optimum is decreased drastically.
7 Conclusion
In this paper, we introduce median sampling into EAs to handle noise and theoretically analyze the effectiveness of median sampling. We first consider one classical case, i.e., OM under onebit noise, and show that median sampling can reduce the ERT of (1+1)-EA from exponential to polynomial. Next, by two illustrative examples, we show that when the 2-quantile of the noisy fitness increases with the true objective value, median sampling is better than the commonly used mean sampling; otherwise, it is worse. The results provide us with some guidance to employ median sampling in practice. In the future, it would be interesting to analyze the effect of median sampling on real-world noisy optimization problems.
\Acknowledgements
The authors want to thank the editor and anonymous reviewers for their helpful comments and suggestions, and one reviewer of our work [25], whose comments motivate this work. This work was supported by the National Key Research and Development Program of China (2017YFB1003102), the NSFC (62022039, 61672478, 61876077), and the MOE University Scientific-Technological Innovation Plan Program.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bäck T. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford, UK, 1996
- 2[2] Xu P, Liu X, Cao H, et al. An efficient energy aware virtual network migration based on genetic algorithm. Front Comput Sci, 2019, 13(2): 440-442
- 3[3] Yuan Q, Tang H, You W, et al. Virtual network function scheduling via multilayer encoding genetic algorithm with distributed bandwidth allocation. Sci China Inf Sci, 2018, 61(9): 092107
- 4[4] Jin Y, Branke J. Evolutionary optimization in uncertain environments—A survey. IEEE Trans Evol Comput, 2005, 9(3): 303-317
- 5[5] Aizawa A, Wah B. Scheduling of genetic algorithms in a noisy environment. Evol Comput, 1994, 2(2): 97-122
- 6[6] Stagge P. Averaging efficiently in the presence of noise. In: Proceedings of the 5th International Conference on Parallel Problem Solving from Nature, Amsterdam, The Netherlands, 1998. 188-197
- 7[7] Branke J, Schmidt C. Selection in the presence of noise. In: Proceedings of the 5th ACM Conference on Genetic and Evolutionary Computation, Chicago, IL, 2003. 766-777
- 8[8] Branke J, Schmidt C. Sequential sampling in noisy environments. In: Proceedings of the 8th International Conference on Parallel Problem Solving from Nature, Birmingham, UK, 2004. 202-211
