On the Robustness of Median Sampling in Noisy Evolutionary Optimization

Chao Bian; Chao Qian; Yang Yu; Ke Tang

arXiv:1907.13100·cs.NE·November 29, 2022

On the Robustness of Median Sampling in Noisy Evolutionary Optimization

Chao Bian, Chao Qian, Yang Yu, Ke Tang

PDF

Open Access

TL;DR

This paper introduces median sampling into evolutionary algorithms to improve robustness against noise, demonstrating theoretically that it can exponentially reduce runtime under certain noise conditions, with practical guidance for its application.

Contribution

The paper presents the novel use of median sampling in EAs, providing theoretical analysis of its advantages and limitations compared to mean sampling under noisy conditions.

Findings

01

Median sampling reduces expected runtime exponentially under onebit noise.

02

Median sampling outperforms mean sampling when the noise's 2-quantile increases with true fitness.

03

Median sampling may fail when the noise's 2-quantile does not increase with true fitness.

Abstract

Evolutionary algorithms (EAs) are a sort of nature-inspired metaheuristics, which have wide applications in various practical optimization problems. In these problems, objective evaluations are usually inaccurate, because noise is almost inevitable in real world, and it is a crucial issue to weaken the negative effect caused by noise. Sampling is a popular strategy, which evaluates the objective a couple of times, and employs the mean of these evaluation results as an estimate of the objective value. In this work, we introduce a novel sampling method, median sampling, into EAs, and illustrate its properties and usefulness theoretically by solving OneMax, the problem of maximizing the number of 1s in a bit string. Instead of the mean, median sampling employs the median of the evaluation results as an estimate. Through rigorous theoretical analysis on OneMax under the commonly used onebit…

Equations87

\overset{ˉ}{f} (x) = i = 1 \sum m \frac{f _{i}^{n} ( x )}{m}

\overset{ˉ}{f} (x) = i = 1 \sum m \frac{f _{i}^{n} ( x )}{m}

\displaystyle\hat{f}(x)=\begin{cases}f^{\mathrm{n}}_{i_{(m+1)/2}}(x)&\text{if $m$ is odd},\\ \big{(}f^{\mathrm{n}}_{i_{m/2}}(x)+f^{\mathrm{n}}_{i_{m/2+1}}(x)\big{)}/2&\text{if $m$ is even}.\end{cases}

\displaystyle\hat{f}(x)=\begin{cases}f^{\mathrm{n}}_{i_{(m+1)/2}}(x)&\text{if $m$ is odd},\\ \big{(}f^{\mathrm{n}}_{i_{m/2}}(x)+f^{\mathrm{n}}_{i_{m/2+1}}(x)\big{)}/2&\text{if $m$ is even}.\end{cases}

E (V_{t} - V_{t + 1} ∣ ξ_{t}) \geq c,

E (V_{t} - V_{t + 1} ∣ ξ_{t}) \geq c,

P (f^{n} (x) = f (x)) = 1 - p, P (f^{n} (x) = f (y)) = p,

P (f^{n} (x) = f (x)) = 1 - p, P (f^{n} (x) = f (y)) = p,

P (s \leq m /2)

P (s \leq m /2)

\leq P (∣ s - E (s) ∣ \geq c n^{2}) \leq 2 e^{- 2 c^{2} n^{4} / m} = e^{- Ω (n)},

P (s > m /2) \geq 1 - e^{- Ω (n)} .

P (s > m /2) \geq 1 - e^{- Ω (n)} .

E (V_{t} - V_{t + 1} ∣ ξ_{t} = x) = E_{1} - E_{2}, \vspace - 0.3 e m

E (V_{t} - V_{t + 1} ∣ ξ_{t} = x) = E_{1} - E_{2}, \vspace - 0.3 e m

E_{1} = ∣ z ∣_{0} < i \sum P_{mut} (x, z) \cdot P_{acc} (x, z) \cdot (V (x) - V (z)), \vspace - 0.5 e m

E_{1} = ∣ z ∣_{0} < i \sum P_{mut} (x, z) \cdot P_{acc} (x, z) \cdot (V (x) - V (z)), \vspace - 0.5 e m

E_{2} = ∣ z ∣_{0} > i \sum P_{mut} (x, z) \cdot P_{acc} (x, z) \cdot (V (z) - V (x)) .

E_{2} = ∣ z ∣_{0} > i \sum P_{mut} (x, z) \cdot P_{acc} (x, z) \cdot (V (z) - V (x)) .

P_{acc} (x, z) \geq P (\hat{f} (z) = n + 1 - i, \hat{f} (x) = n - i) \geq 1 - e^{- Ω (n)} .

P_{acc} (x, z) \geq P (\hat{f} (z) = n + 1 - i, \hat{f} (x) = n - i) \geq 1 - e^{- Ω (n)} .

E_{1} \geq \frac{i}{e n} \cdot (1 - e^{- Ω (n)}) \geq \frac{1}{3 n},

E_{1} \geq \frac{i}{e n} \cdot (1 - e^{- Ω (n)}) \geq \frac{1}{3 n},

P_{acc} (x, z) \leq e^{- Ω (n)} .

P_{acc} (x, z) \leq e^{- Ω (n)} .

E_{2} \leq (n - i) \cdot e^{- Ω (n)} \leq e^{l o g n} \cdot e^{- Ω (n)} = e^{- Ω (n)},

E_{2} \leq (n - i) \cdot e^{- Ω (n)} \leq e^{l o g n} \cdot e^{- Ω (n)} = e^{- Ω (n)},

E (V_{t} - V_{t + 1} ∣ ξ_{t} = x) \geq Ω (\frac{1}{n}),

E (V_{t} - V_{t + 1} ∣ ξ_{t} = x) \geq Ω (\frac{1}{n}),

V (x) = ⎩ ⎨ ⎧ i \frac{n}{2 p} n - \frac{n}{2 p} + 2 if i > \frac{n}{2 p} + 3 or n - \frac{n}{2 p} + 3 < i < \frac{n}{2 p} - 3 or i < max {1, n - \frac{n}{2 p} - 3}, if \frac{n}{2 p} - 3 \leq i \leq \frac{n}{2 p} + 3, if max {1, n - \frac{n}{2 p} - 3} \leq i \leq n - \frac{n}{2 p} + 3 .

V (x) = ⎩ ⎨ ⎧ i \frac{n}{2 p} n - \frac{n}{2 p} + 2 if i > \frac{n}{2 p} + 3 or n - \frac{n}{2 p} + 3 < i < \frac{n}{2 p} - 3 or i < max {1, n - \frac{n}{2 p} - 3}, if \frac{n}{2 p} - 3 \leq i \leq \frac{n}{2 p} + 3, if max {1, n - \frac{n}{2 p} - 3} \leq i \leq n - \frac{n}{2 p} + 3 .

P_{acc} (x, z) \geq P (\hat{f} (z) = n + 2 - i, \hat{f} (x) = n + 1 - i) \geq 1 - e^{- Ω (n)} .

P_{acc} (x, z) \geq P (\hat{f} (z) = n + 2 - i, \hat{f} (x) = n + 1 - i) \geq 1 - e^{- Ω (n)} .

P (\hat{f} (z) = n + 1 - ∣ z ∣_{0}, \hat{f} (x) = n + 1 - i) \geq 1 - e^{- Ω (n)},

P (\hat{f} (z) = n + 1 - ∣ z ∣_{0}, \hat{f} (x) = n + 1 - i) \geq 1 - e^{- Ω (n)},

P_{mut} (x, z) \geq (7 i) (1 - \frac{1}{n})^{n - 7} (\frac{1}{n})^{7} = Ω (1) .

P_{mut} (x, z) \geq (7 i) (1 - \frac{1}{n})^{n - 7} (\frac{1}{n})^{7} = Ω (1) .

V (x) - V (z) \geq n / (2 p) - ∣ z ∣_{0} > n / (2 p) - (n / (2 p) - 3) = 3,

V (x) - V (z) \geq n / (2 p) - ∣ z ∣_{0} > n / (2 p) - (n / (2 p) - 3) = 3,

E_{2} \leq (n - i) \cdot e^{- Ω (n)} \leq e^{l o g n} \cdot e^{- Ω (n)} = e^{- Ω (n)},

E_{2} \leq (n - i) \cdot e^{- Ω (n)} \leq e^{l o g n} \cdot e^{- Ω (n)} = e^{- Ω (n)},

n - \frac{n}{2 p} + 3 - max {1, n - \frac{n}{2 p} - 3} \leq n - \frac{n}{2 p} + 3 - (n - \frac{n}{2 p} - 3) \leq 6,

n - \frac{n}{2 p} + 3 - max {1, n - \frac{n}{2 p} - 3} \leq n - \frac{n}{2 p} + 3 - (n - \frac{n}{2 p} - 3) \leq 6,

\frac{p i}{n} \leq p (n - \frac{n}{2 p} + 3) \frac{1}{n} = p \frac{n + 3}{n} - \frac{1}{2} \leq \frac{n + 3}{n + 7} - \frac{1}{2} = \frac{1}{2} - \frac{4}{n + 7},

\frac{p i}{n} \leq p (n - \frac{n}{2 p} + 3) \frac{1}{n} = p \frac{n + 3}{n} - \frac{1}{2} \leq \frac{n + 3}{n + 7} - \frac{1}{2} = \frac{1}{2} - \frac{4}{n + 7},

V (x) - V (z) \geq n - \frac{n}{2 p} + 2 - (n - \frac{n}{2 p} - 3) \geq 5;

V (x) - V (z) \geq n - \frac{n}{2 p} + 2 - (n - \frac{n}{2 p} - 3) \geq 5;

V (x) - V (z) \geq n - \frac{n}{2 p} + 2 > 1.

V (x) - V (z) \geq n - \frac{n}{2 p} + 2 > 1.

p \cdot \frac{n - ∣ z ∣ _{0}}{n} < p \cdot \frac{n - i}{n} \leq \frac{1}{2} - \frac{p}{n}

p \cdot \frac{n - ∣ z ∣ _{0}}{n} < p \cdot \frac{n - i}{n} \leq \frac{1}{2} - \frac{p}{n}

p \cdot \frac{i}{n} < p \cdot \frac{∣ z ∣ _{0}}{n} \leq \frac{( n + 5 ) p}{n} - \frac{1}{2} < \frac{n + 5}{n + 7} - \frac{1}{2} = \frac{1}{2} - \frac{2}{n + 7},

p \cdot \frac{i}{n} < p \cdot \frac{∣ z ∣ _{0}}{n} \leq \frac{( n + 5 ) p}{n} - \frac{1}{2} < \frac{n + 5}{n + 7} - \frac{1}{2} = \frac{1}{2} - \frac{2}{n + 7},

E (V_{t} - V_{t + 1} ∣ ξ_{t} = x) = Ω (\frac{1}{n ^{7}}),

E (V_{t} - V_{t + 1} ∣ ξ_{t} = x) = Ω (\frac{1}{n ^{7}}),

V (x) = {i \frac{n}{2} if i > \frac{n}{2 p} + 3 or i < n - \frac{n}{2 p} - 3, if n - \frac{n}{2 p} - 3 \leq i \leq \frac{n}{2 p} + 3 .

V (x) = {i \frac{n}{2} if i > \frac{n}{2 p} + 3 or i < n - \frac{n}{2 p} - 3, if n - \frac{n}{2 p} - 3 \leq i \leq \frac{n}{2 p} + 3 .

P_{mut} (x, z) \geq (14 i) (1 - \frac{1}{n})^{n - 14} (\frac{1}{n})^{14} = Ω (1) .

P_{mut} (x, z) \geq (14 i) (1 - \frac{1}{n})^{n - 14} (\frac{1}{n})^{14} = Ω (1) .

V (x) - V (z) \geq n /2 - ∣ z ∣_{0} \geq n /2 - (n - n / (2 p) - 3) \geq 3.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Advanced Multi-Objective Optimization Algorithms · Evolutionary Algorithms and Applications

Full text

\ArticleType

RESEARCH PAPER \Year2020 \Month \Vol \No \DOI \ArtNo \ReceiveDate \ReviseDate \AcceptDate \OnlineDate

On the Robustness of Median Sampling in Noisy Evolutionary Optimization

[email protected]

\AuthorMark

Bian C

\AuthorCitation

Bian C, Qian C, Yu Y, et al

On the Robustness of Median Sampling

in Noisy Evolutionary Optimization

Chao BIAN

Chao QIAN

Yang YU

Ke TANG

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

Shenzhen Key Laboratory of Computational Intelligence, Department of Computer Science and Engineering,

Southern University of Science and Technology, Shenzhen 518055, China

Abstract

Evolutionary algorithms (EAs) are a sort of nature-inspired metaheuristics, which have wide applications in various practical optimization problems. In these problems, objective evaluations are usually inaccurate, because noise is almost inevitable in real world, and it is a crucial issue to weaken the negative effect caused by noise. Sampling is a popular strategy, which evaluates the objective a couple of times, and employs the mean of these evaluation results as an estimate of the objective value. In this work, we introduce a novel sampling method, median sampling, into EAs, and illustrate its properties and usefulness theoretically by solving OneMax, the problem of maximizing the number of 1s in a bit string. Instead of the mean, median sampling employs the median of the evaluation results as an estimate. Through rigorous theoretical analysis on OneMax under the commonly used onebit noise, we show that median sampling reduces the expected runtime exponentially. Next, through two special noise models, we show that when the 2-quantile of the noisy fitness increases with the true fitness, median sampling can be better than mean sampling; otherwise, it may fail and mean sampling can be better. The results may guide us to employ median sampling properly in practical applications.

keywords:

Evolutionary algorithms, noisy optimization, median sampling, computational complexity, runtime analysis

1 Introduction

As a kind of general-purpose optimization algorithms, evolutionary algorithms (EAs) [1] have wide applications in practical optimization problems [2, 3]. During the optimization procedure, the obtained objective (i.e., fitness) value is usually inaccurate because of noise [4]. For example, in machine learning, the estimated performance of a prediction model usually deviates from the true performance because the model is evaluated on a limited amount of data; in aerodynamic design, the computational fluid dynamics (CFDs) simulation is needed to evaluate the performance of a given structure, which is usually computationally expensive and approximated, leading to noisy fitness. The existence of noise may mislead the search direction and deteriorate the efficiency of EAs. Therefore, it is important to handle noise in fitness evaluation during evolutionary optimization.

The sampling strategy independently evaluates the fitness $m$ times, where $m$ is the sample size, and then the mean of these samples is used to estimate the exact fitness. Sampling is very popular to tackle noise, because it has a $m$ -fold reduction in the variance of the noisy evaluation. Meanwhile, it also has a $m$ -fold increase in the computation time, thus some variants are proposed, including adaptive sampling [5, 6] and sequential sampling [7, 8], which decide the value of $m$ dynamically in each generation. However, there has been a great lack of the theoretical understanding for sampling.

Runtime analysis, an important theoretical aspect for EAs, has achieved a lot of progresses [9, 10, 11, 12, 13, 14] recently. However, they mainly consider exact environments, and the results on noisy evolutionary optimization is rare. Noise increases the randomness in the optimization procedure, making the analysis more difficult. As a representative evolutionary algorithm, (1+1)-EA maintains one solution in the population, and generates a new solution in each iteration by mutating the parent solution. It was first studied on two frequently-used pseudo-Boolean problems, OneMax (OM) and LeadingOnes (LO). The goal of OM is maximizing the number of 1s in a solution, while the goal of LO is maximizing the number of continuous 1s from the first bit in a solution. Runtime analysis for the two problems under various noise models [15, 16, 17, 18, 19, 20] showed that only if the noise level is low, (1+1)-EA can quickly find the optimum. For instance, onebit noise is a frequently-used noise model in theoretical analysis. With probability $p$ , it changes a uniformly selected bit in a solution before evaluation, leading to a random fitness value. For OM of size $n$ under onebit noise, the expected runtime (ERT) of (1+1)-EA is superpolynomial if $p=\omega(\log n/n)$ . There are also some studies concerning the effectiveness of various strategies to tackle noise, e.g., threshold selection [19, 21, 22], populations [16, 18, 20, 23, 24] and sampling [25, 26, 27]. For instance, if $\mu=\Theta(\log n)$ , the ERT of $(\mu+1)$ -EA optimizing OM under onebit noise is polynomial for any $p\in[0,1]$ (note that $p$ denotes the noise probability). Several works also show the robustness of the compact genetic algorithm [28] and a simple ant colony optimization algorithm [29, 30, 31, 32] against noise.

The above mentioned runtime analyses concerning sampling [25, 26, 27] revealed that the exponential runtime under high noise levels can be turned to be polynomial by sampling, and the sample size may be critical to the effectiveness of sampling. Moreover, Akimoto et al. [33] showed that optimization under unbiased noise can perform like exact optimization, if the sample size $m$ is large enough. In these works, the sampling strategy utilizes the mean of the samples as an approximation of the true fitness. Then a natural question is whether other information of the samples can be used to make EAs more robust against noise.

Note that mean is actually a measure of central tendency, and thus, it is straightforward to use another widely known measure median. Compared to mean, median has the advantage of being insensitive to outliers. For example, “breakdown point” [34, 35] is a commonly used indicator for insensitivity, which denotes the minimum ratio of variables that need to be contaminated to make the estimator become infinite (i.e., cause breakdown). The breakdown point of mean is close to 0 because a single bad observation can make the mean become infinite, whereas the breakdown point of median is 0.5 because median becomes infinite only if more than 50% of the variables become infinite. In fact, economists use the sample median frequently when reporting statistics concerning certain economic measures, e.g., household income [36].

In this paper, we introduce the sampling strategy using median (called median sampling) into EAs and theoretically examine its effectiveness. Instead of taking the mean, median sampling takes the median of the samples as an estimate for the fitness. In order to better distinguish the two sampling strategies, we call the original sampling strategy mean sampling in the following context. We will consider (1+1)-EA solving noisy OM, and derive the ERT for reaching the optimum (with respect to the exact objective). Following is our main results:

•

For OM under onebit noise with any $p\in[0,1]$ , we prove that the ERT of (1+1)-EA is polynomial when median sampling with $m=2n^{3}+1$ is used. Previous analysis [17] has proved that the ERT of (1+1)-EA is polynomial only if $p=O(\log n/n)$ . Thus, the result shows the robustness of median sampling against noise.

•

For OM under segmented noise, we show that the ERT of (1+1)-EA using median sampling is polynomial, while the ERT of (1+1)-EA using mean sampling is exponential. The results show that median sampling can be a better choice, if the 2-quantile of the noisy fitness increases with the true fitness. Note that the noisy fitness is a random variable, and the 2-quantile of a random variable $X$ is the value $a$ satisfying $\mathrm{P}(X\leq a)\geq 0.5$ and $\mathrm{P}(X\geq a)\geq 0.5$ (i.e., the median of $X$ ).

•

For OM under partial noise, we show that (1+1)-EA employing median sampling fails, while (1+1)-EA employing mean sampling works. The results suggest that it would be better to choose other strategies if the 2-quantile of the noisy fitness doesn’t increase with the true fitness.

Note that in parallel with our work, Doerr and Sutton [37] showed that median sampling can handle the negative impact of noise for an integer valued objective function $f$ , if $f^{\mathrm{n}}(x)$ satisfies the $\epsilon$ -concentrate condition, that is, $\mathrm{P}(f^{\mathrm{n}}(x)-f(x)\geq 0.5)\leq 0.5-\epsilon$ and $\mathrm{P}(f^{\mathrm{n}}(x)-f(x)\leq-0.5)\leq 0.5-\epsilon$ , where $f^{\mathrm{n}}(x)$ denotes the noisy objective value of $x$ . They also considered two specific cases to show the superiority of median sampling over mean sampling. For OM under additive Cauchy noise with parameter $\gamma\geq 0.5$ , they showed that the runtime of (1+1)-EA is superpolynomial w.h.p. (with high probability) if mean sampling is used, and the runtime is polynomial w.h.p. if median sampling is used. For LO under bitwise noise $(p,q)$ [27] satisfying $p=0.5-\epsilon\wedge q=\Omega(1)$ , they showed a superpolynomial ERT for (1+1)-EA if mean sampling with $m=O(n/\log^{2}n)$ is used, and the runtime is polynomial w.h.p. if median sampling with $m=O(\log n)$ is used.

The remaining paper is presented as follows. First, Section 2 presents preliminaries. Then, Section 3 analyzes the effectiveness of median sampling. Next, Sections 4 and 5 compare median sampling with mean sampling, and Section 6 provides some guidance for employing median sampling in practice. Finally, Section 7 makes a conclusion.

2 Preliminaries

We first present the OM problem as well as (1+1)-EA which will be considered in this paper. Next, we present the sampling strategy. The analysis tool is presented in the end.

2.1 OneMax Problem

We consider the frequently-used pseudo-Boolean function OM. Its goal is maximizing the number of 1s (namely, the bits with value 1) in a solution. Note that 11…1 (denoted as $1^{n}$ ) is the unique optimal solution. The ERT of (1+1)-EA solving OM (without noise) is $\Theta(n\log n)$ [38]. For notational convenience, $|x|_{0}$ will be used to represent the number of 0s (namely the bits with value 0) of $x$ .

Definition 2.1.

The goal of the OM Problem with size $n$ is finding a binary string $x^{*}$ to maximize $f(x)=\sum^{n}_{i=1}x_{i}$ .

2.2 (1+1) Evolutionary Algorithm

(1+1)-EA reflects the general structure of EAs, and is widely analyzed to theoretically understand the behavior of EAs. Different from exact optimization, only a noisy fitness value $f^{\mathrm{n}}(x)$ can be obtained in noisy environments, and the value is a random variable because the noise may disturb the solution or the objective value randomly. For example, there are two kinds of widely used noise models: posterior and prior. The posterior noise comes from the variation on the fitness of a solution, e.g., $f^{\mathrm{n}}(x)=f(x)+\delta$ , where $\delta$ is randomly drawn from some distribution. The prior noise comes from the variation on a solution, i.e., $f^{\mathrm{n}}(x)=f(x^{\prime})$ , where $x^{\prime}$ is generated from $x$ by random perturbations. Therefore, line 5 in Algorithm 1 changes from the true fitness “ $f(\cdot)$ ” to the noisy fitness “ $f^{\mathrm{n}}(\cdot)$ ”. In the optimization process, reevaluation strategy, which evaluates both the offspring and parent solutions in each generation, is used as in [17, 18, 29]. In the optimization procedure of an EA, fitness evaluations are the most time-consuming part, thus we will simply define its runtime as the number of objective evaluations. Termination condition is the finding of the optimum w.r.t. the exact objective [17, 18, 33]. In this work, we consider maximizing $f:\{0,1\}^{n}\rightarrow\mathbb{R}$ .

2.3 Median Sampling

Mean sampling has often been used in noisy evolutionary optimization to tackle noise [5, 7]. As described in Definition 2.2, it uses the mean of $m$ independent evaluations to approximate the true fitness $f(x)$ , where $m$ is called the sample size. By mean sampling, the output $\bar{f}(x)$ is close to the mathematical expectation of $f^{\mathrm{n}}(x)$ . As described in Definition 2.3, median sampling takes the median of $m$ independent evaluations to approximate the true fitness $f(x)$ . By median sampling, the output $\hat{f}(x)$ is close to the 2-quantile of $f^{\mathrm{n}}(x)$ , namely $\mathrm{P}(f^{\mathrm{n}}(x)\leq\hat{f}(x))\approx 1/2$ .

Definition 2.2 (Mean Sampling).

The objective value of $x$ is evaluated independently $m$ times, then

[TABLE]

is output, where $f^{\mathrm{n}}_{1}(x),f^{\mathrm{n}}_{2}(x),\ldots,f^{\mathrm{n}}_{m}(x)$ denote $m$ noisy fitness values.

Definition 2.3 (Median Sampling).

The objective value of $x$ is evaluated independently $m$ times, then

[TABLE]

is output, where $f^{\mathrm{n}}_{i_{1}}(x)\leq f^{\mathrm{n}}_{i_{2}}(x)\leq\ldots\leq f^{\mathrm{n}}_{i_{m}}(x)$ denote the ordered noisy fitness values.

When mean (or median) sampling is used, line 5 in Algorithm 1 becomes “ $\bar{f}(\cdot)$ ” (or “ $\hat{f}(\cdot)$ ”). For both of the sampling strategies, $m=1$ means that sampling is not used.

2.4 Analysis Tool

It is straightforward to model the evolutionary optimization procedure as a Markov chain $\{\xi_{t}\}^{+\infty}_{t=0}$ , because the subsequent procedure only depends on the current state. For (1+1)-EA optimizing OM, we can simply set the chain’s state space as $\{0,1\}^{n}$ and the optimal state as $1^{n}$ (namely $\xi_{t}\in\mathcal{X}=\{0,1\}^{n}$ and $\mathcal{X}^{*}=\{1^{n}\}$ ). The first hitting time (FHT) of $\{\xi_{t}\}^{\infty}_{t=0}$ is $\tau=\min\{t\mid\xi_{t}\in\mathcal{X}^{*},t\geq 0\}$ . If the chain’s initial state is $\xi_{0}=x$ , then its expected FHT (EFHT) is denoted as $\mathrm{E}(\tau\mid\xi_{0}=x)=\sum^{\infty}_{t=0}t\cdot\mathrm{P}(\tau=t\mid\xi_{0}=x)$ . If $\xi_{0}$ obeys a distribution $\pi_{0}$ (denoted as $\xi_{0}\!\sim\!\pi_{0}$ ), then its EFHT is defined as $\mathrm{E}(\tau\mid\xi_{0}\!\sim\!\pi_{0})\!=\!\sum_{x\in\mathcal{X}}\pi_{0}(x)\mathrm{E}(\tau\mid\xi_{0}=x)$ . For (1+1)-EA, the initial solution is evaluated once, then in each iteration, the parent solution and the offspring solution both need to be evaluated. Note that the initial solution is generated randomly from $\{0,1\}^{n}$ , thus the ERT of (1+1)-EA is $1+2\cdot\mathrm{E}(\tau\mid\xi_{0}\sim\pi_{0})$ , where $\pi_{0}$ denotes the uniform distribution. For (1+1)-EA using sampling, the ERT becomes $m+2m\cdot\mathrm{E}(\tau\mid\xi_{0}\sim\pi_{0})$ , because it needs to perform $m$ independent evaluations for each solution.

As presented in Theorem 2.4, the additive drift theorem aims to derive upper bounds of EFHT. To use the approach, first we need to design a function $V(\cdot)$ as a measurement for the difference between a state and the optimal state, and $V(\cdot)$ should satisfy $V(x)=0$ for any optimal $x$ and $V(x)>0$ otherwise. Next, we need deriving a lower bound $c$ for $\mathrm{E}(V(\xi_{t})-V(\xi_{t+1})|\xi_{t})$ , i.e., the progress towards $\mathcal{X^{*}}$ in each generation. Finally, we can upper bound EFHT through dividing $V(\xi_{0})$ by $c$ . When the context is clear, $V(\xi_{t})/V(\xi_{t+1})$ will be briefly denoted as $V_{t}/V_{t+1}$ .

Theorem 2.4 (Additive drift [39]).

Given $\{\xi_{t}\}_{t=0}^{+\infty}$ and $V(\cdot)$ , if $\exists c>0$ such that $\forall t\geq 0$ and $\forall\xi_{t}$ with $V_{t}>0$ ,

[TABLE]

then $\mathrm{E}(\tau|\xi_{0})\leq V(\xi_{0})/c$ .

3 The Robustness of Median Sampling Against Onebit Noise

Onebit noise is commonly used in theoretical analyses [17, 18, 26, 27]. With probability $p$ , it changes a uniformly selected bit in $x$ before $x$ is evaluated. For OM under such noise model, the ERT of (1+1)-EA is superpolynomial for $p=\omega(\log n/n)$ [17]; the ERT is polynomial $\forall p\in[0,1]$ if using mean sampling with $m=4n^{3}$ [27]. Theorem 3.4 shows that the ERT is polynomial if using median sampling with $m=2n^{3}+1$ , which illustrates that median sampling can efficiently tackle noise.

Definition 3.1 (Onebit Noise).

Suppose $f^{\mathrm{n}}(\cdot)/f(\cdot)$ denotes the noisy/true objective function. Then

[TABLE]

where $p\in[0,1]$ and $y$ is derived by changing a uniformly selected bit in $x$ .

To prove Theorem 3.4, we present Lemma 3.2 to analyze $\hat{f}(x)$ under onebit noise by taking a sample size of $2n^{3}+1$ . It intuitively means $\hat{f}(x)$ is close to the 2-quantile of $f^{\mathrm{n}}(x)$ w.h.p.

Lemma 3.2.

Under onebit noise, if median sampling with $m=2n^{3}+1$ is used, then

(i)

if $p\cdot\frac{n-|x|_{0}}{n}\geq\frac{1}{2}+\frac{\Omega(1)}{n}$ , then $\mathrm{P}(\hat{f}(x)=n-|x|_{0}-1)\geq 1-e^{-\Omega(n)}$ ; 2. (ii)

if $p\cdot\frac{|x|_{0}}{n}\geq\frac{1}{2}+\frac{\Omega(1)}{n}$ , then $\mathrm{P}(\hat{f}(x)=n-|x|_{0}+1)\geq 1-e^{-\Omega(n)}$ ; 3. (iii)

if $p\cdot\frac{|x|_{0}}{n}\leq\frac{1}{2}-\frac{\Omega(1)}{n}$ , then $\mathrm{P}(\hat{f}(x)\leq n-|x|_{0})\geq 1-e^{-\Omega(n)}$ ; furthermore, if $p\cdot\frac{n-|x|_{0}}{n}\leq\frac{1}{2}-\frac{\Omega(1)}{n}$ also holds, then $\mathrm{P}(\hat{f}(x)=n-|x|_{0})\geq 1-e^{-\Omega(n)}$ .

Proof 3.3.

First we consider (i). Suppose $|x|_{0}=i$ and $(n-i)p/n\geq 1/2+c/n$ for a constant $c$ . Suppose $s$ denotes the number of noisy evaluations satisfying $f^{\mathrm{n}}(x)=n-1-i$ in $m$ independent noisy evaluations. Observe that in each evaluation, $\mathrm{P}(f^{\mathrm{n}}(x)=n-1-i)=(n-i)p/n$ , thus $\mathrm{E}(s)=m(n-i)p/n\geq m(1/2+c/n)\geq m/2+cn^{2}$ . Then we get

[TABLE]

where the last inequality is derived according to Hoeffding’s inequality. Therefore,

[TABLE]

*By the definition of median sampling, $\mathrm{P}(\hat{f}(x)=n-|x|_{0}-1)\geq 1-e^{-\Omega(n)}$ , thus the claim holds. We can similarly prove (ii).

Now we consider (iii). Under onebit noise, $f^{\mathrm{n}}(x)$ can take at most three values (i.e., $n-1-|x|_{0}$ , $n-|x|_{0}$ , $n+1-|x|_{0}$ ), thus $\hat{f}(x)$ can only take one of the three values by the definition of median sampling and $m=2n^{3}+1$ . Note that $\mathrm{P}(f^{\mathrm{n}}(x)\leq n-|x|_{0})=1-\mathrm{P}(f^{\mathrm{n}}(x)=n+1-|x|_{0})=1-|x|_{0}p/n\geq 1/2+\Omega(1)/n$ , then similar to case (i), $\mathrm{P}(\hat{f}(x)\leq n-|x|_{0})\geq 1-e^{-\Omega(n)}$ .

Then we consider the “furthermore” clause. By $\mathrm{P}(f^{\mathrm{n}}(x)\geq n-|x|_{0})=1-\mathrm{P}(f^{\mathrm{n}}(x)=n-1-|x|_{0})=1-(n-|x|_{0})p/n\geq 1/2+\Omega(1)/n$ , we also derive $\mathrm{P}(\hat{f}(x)\geq n-|x|_{0})\geq 1-e^{-\Omega(n)}$ . Thus, $\mathrm{P}(\hat{f}(x)=n-|x|_{0})=1-\mathrm{P}(\hat{f}(x)>n-|x|_{0})-\mathrm{P}(\hat{f}(x)<n-|x|_{0})\geq 1-e^{-\Omega(n)}$ , i.e., the claim holds.

Combining the above analysis, the Lemma holds. ∎ $\blacksquare$ *

Theorem 3.4.

For OM under onebit noise, the ERT of (1+1)-EA employing median sampling with $m=2n^{3}+1$ is polynomial.

Proof 3.5.

The main idea is applying Theorem 2.4. We consider three cases for $p$ and in each case, we will design a distance function $V(x)$ and we need to examine $\mathrm{E}(V_{t}-V_{t+1}\mid\xi_{t}=x)$ for $x\neq 1^{n}$ . Suppose $|x|_{0}=i$ , $1\leq i\leq n$ . For ease of notation, let $\mathrm{P}_{\rm mut}(x,z)=\mathrm{P}$ ( $z$ is mutated from $x$ ), and $\mathrm{P}_{\rm acc}(x,z)=\mathrm{P}(\hat{f}(z)\geq\hat{f}(x))$ . For ease of analysis, the drift is divided into $\mathrm{E}_{1}$ and $\mathrm{E}_{2}$ . That is,

[TABLE]

where

[TABLE]

*(1) $p\leq n/(2(n+1))$ . $V(x)$ is designed to be $|x|_{0}$ , namely the number of 0s in $x$ .

For $\mathrm{E}_{1}$ , we consider mutating only one zero bit in $x$ (namely $|z|_{0}=i-1$ ), and its probability is $i/n\cdot(1-1/n)^{n-1}\geq i/(en)$ . Then $z$ will replace $x$ if $\hat{f}(z)=n+1-i$ and $\hat{f}(x)=n-i$ . Conditions of Lemma 3.2-(iii) hold because $p\leq n/(2(n+1))=1/2-1/(2(n+1))$ , then*

[TABLE]

Thus,

[TABLE]

*where the last inequality is by $n$ is large enough.

For $\mathrm{E}_{2}$ , we consider the increase of 0s. For $z$ satisfying $|z|_{0}>i$ , accepting it implies $\hat{f}(z)\neq n-|z|_{0}$ or $\hat{f}(x)\neq n-i$ . Note that conditions of (iii) in Lemma 3.2 are satisfied, then we have*

[TABLE]

Thus,

[TABLE]

*where the last equality is by $n$ is large enough.

Subtract $\mathrm{E}_{2}$ from $\mathrm{E}_{1}$ , we get*

[TABLE]

where the last equation derives from large enough $n$ . Therefore, by Theorem 2.4, $\mathrm{E}(\tau|\xi_{0})\leq n/\Omega(1/n)=O(n^{2})$ , because $V(x)\leq n$ . Note that each iteration needs $2m=4n^{3}+2$ fitness evaluations, we can derive a polynomial ERT.

(2) $n/(2(n+1))<p<n/(n+7)$ . The proof procedure is similar to case (1), but the $V(x)$ is more complicated because the effect of the noise on a solution $x$ may vary as $|x|_{0}$ changes. The distance function is as follows:

[TABLE]

*We consider five cases for $i$ .

(2a) $i>n/(2p)+3$ .

For $\mathrm{E}_{1}$ , we also consider mutating only one zero bit in $x$ , then $\mathrm{P}_{\rm mut}(x,z)\geq i/(en)$ . Note that $|z|_{0}>n/(2p)+2$ , thus $pi/n>p|z|_{0}/n\geq 1/2+2p/n$ . By (ii) in Lemma 3.2,*

[TABLE]

*If $|z|_{0}>n/(2p)+3$ , then $V(x)-V(z)=1$ ; else $V(x)-V(z)=i-n/(2p)>3$ . Thus, $V(x)-V(z)\geq 1$ and $\mathrm{E}_{1}=\Omega(1/n)$ .

Now we consider $\mathrm{E}_{2}$ . For $z$ satisfying $|z|_{0}>i$ , $p|z|_{0}/n>pi/n>1/2+3p/n\geq 1/2+3/(2(n+1))$ . Thus, by (ii) in Lemma 3.2,*

[TABLE]

*implying $\mathrm{P}_{\rm acc}(x,z)\leq e^{-\Omega(n)}$ . Accordingly, $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

(2b) $n/(2p)-3\leq i\leq n/(2p)+3$ . Note that $i\geq n/(2p)-3\geq(n+7)/2-3=\Omega(n)$ .

First we consider the positive drift $\mathrm{E}_{1}$ . By $n/(2p)-3-(n-n/(2p)+3)=n/p-n-6>1$ , there always exists some $z$ such that $n-n/(2p)+3<|z|_{0}=\lceil n/(2p)-3\rceil-1$ and such $z$ can be mutated from $x$ by flipping at most seven 0s. Thus,*

[TABLE]

Note that $p(n-|z|_{0})/n<1/2-3p/n$ and $p|z|_{0}/n<1/2-3p/n$ , thus $\mathrm{P}(\hat{f}(z)=n-i)\geq 1-e^{-\Omega(n)}$ by (iii) in Lemma 3.2. Therefore, $z$ will replace $x$ with probability $1-e^{-\Omega(n)}$ . Moreover,

[TABLE]

*we have $\mathrm{E}_{1}=\Omega(1)$ .

For $\mathrm{E}_{2}$ , we consider $z$ with $|z|_{0}>i$ . If $i>n/(2p)+1$ , Eq. (12) holds and we have $\mathrm{P}_{\rm acc}(x,z)\leq e^{-\Omega(n)}$ , then we get*

[TABLE]

*where the last equality is by $n$ is large enough.

If $i\leq n/(2p)+1$ , then any $z$ satisfying $|z|_{0}\geq i+3$ will never be accepted under onebit noise. For $z$ satisfying $|z|_{0}\leq i+2$ , we have $|z|_{0}\leq n/(2p)+3$ , thus $V(z)=n/(2p)=V(x)$ . Then $\mathrm{E}_{2}=0$ . Combining the two cases for $z$ , we get $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

(2c) $n-n/(2p)+3<i<n/(2p)-3$ . First we examine $\hat{f}(x)$ . Note that $p(n-i)/n<1/2-3p/n$ and $pi/n<1/2-3p/n$ , thus $\mathrm{P}(\hat{f}(x)=n-i)\geq 1-e^{-\Omega(n)}$ by (iii) in Lemma 3.2.

For $\mathrm{E}_{1}$ , we consider mutating only one zero bit in $x$ , namely $|z|_{0}=i-1>n-n/(2p)+2$ . Note that $p(n-|z|_{0})/n<1/2-2p/n$ and $p|z|_{0}/n<1/2-3p/n$ , we can derive Eq. (6) and $\mathrm{E}_{1}=\Omega(1/n)$ .

For $\mathrm{E}_{2}$ , we consider two cases for $z$ satisfying $|z|_{0}>i$ . If $|z|_{0}\geq i+2$ , accepting $z$ implies $\hat{f}(x)=n-i-1$ . Thus, $\mathrm{P}_{\rm acc}(x,z)\leq e^{-\Omega(n)}$ . If $|z|_{0}=i+1$ , then $p(n-|z|_{0})/n<1/2-4p/n$ and $p|z|_{0}/n<1/2-2p/n$ , thus $\mathrm{P}(\hat{f}(z)=n-|z|_{0})\geq 1-e^{-\Omega(n)}$ . Then Eq. (8) still holds, i.e., $\mathrm{P}_{\rm acc}(x,z)\leq e^{-\Omega(n)}$ . Combining the two cases, $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

(2d) $\max\{1,n-n/(2p)-3\}\leq i\leq n-n/(2p)+3$ .

First we consider the positive drift $\mathrm{E}_{1}$ . Note that*

[TABLE]

$x$ * can generate an offspring $z$ with $|z|_{0}<\max\{1,n-n/(2p)-3\}$ by flipping at most seven bits, whose probability is at least $\Omega(1/n^{7})$ . Then we examine $\mathrm{P}_{\rm acc}(x,z)$ . Because*

[TABLE]

we derive $\mathrm{P}(\hat{f}(x)\leq n-i)\geq 1-e^{-\Omega(n)}$ by (iii) in Lemma 3.2. Note that $\hat{f}(z)\geq n-i$ always holds under onebit noise, we get $\mathrm{P}_{\rm acc}(x,z)\geq 1-e^{-\Omega(n)}$ . If $1\leq n-n/(2p)-3$ , we get

[TABLE]

else $|z|_{0}=0$ and

[TABLE]

*Thus, $\mathrm{E}_{1}=\Omega(1/n^{7})$ .

For $\mathrm{E}_{2}$ , it is only necessary to take $z$ satisfying $|z|_{0}\leq i+2$ into account because $z$ with $|z|_{0}\geq i+3$ will be rejected. If $i\geq n-n/(2p)+1$ , we have $n-n/(2p)+1\leq i<|z|_{0}\leq n-n/(2p)+5$ . Thus,*

[TABLE]

and

[TABLE]

*which implies that Eq. (8) holds by (iii) in Lemma 3.2. If $i<n-n/(2p)+1$ , we have $|z|_{0}<n-n/(2p)+3$ . By $V(x)=V(z)=n-n/(2p)+2$ , we have $\mathrm{E}_{2}=0$ . Combining the two cases, $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

(2e) $i<\max\{1,n-n/(2p)-3\}$ . If $1\geq n-n/(2p)-3$ , then $i=0$ , thus we only need to consider that $1<n-n/(2p)-3$ , namely $i<n-n/(2p)-3$ .

For $\mathrm{E}_{1}$ , we consider mutating only one zero bit in $x$ , i.e., $|z|_{0}=i-1$ . Similar to the above analysis, $\mathrm{P}_{\rm mut}(x,z)\geq 1/(en)$ . Note that $p(n-i)/n>1/2+3p/n$ , we derive $\mathrm{P}(\hat{f}(x)=n-i-1)\geq 1-e^{-\Omega(n)}$ by (i) in Lemma 3.2. Thus, $\mathrm{P}_{\rm acc}(x,z)\geq 1-e^{-\Omega(n)}$ . Note that $V(x)-V(z)=i-|z|_{0}=1$ , thus $\mathrm{E}_{1}=\Omega(1/n)$ .

For $\mathrm{E}_{2}$ , it is only necessary to take $z$ with $|z|_{0}\leq i+2<n-n/(2p)-1$ into account. Note that $p(n-|z|_{0})/n>1/2+p/n$ , thus by Lemma 3.2, $\mathrm{P}(\hat{f}(z)=n-|z|_{0}-1)=1-e^{-\Omega(n)}$ , implying that $\mathrm{P}_{\rm acc}(x,z)\leq e^{-\Omega(n)}$ . Then we have $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

Combining the five cases, we have $\mathrm{E}_{1}\geq\Omega(1/n^{7})$ and $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ . By subtracting $\mathrm{E}_{2}$ from $\mathrm{E}_{1}$ , Eq. (10) becomes*

[TABLE]

and we can also derive a polynomial ERT.

(3) $p\geq n/(n+7)$ . The effect of the noise changes when the level of the noise changes. Accordingly, we need to design a new distance function:

[TABLE]

*Next we consider three cases for $i$ .

(3a) $i>n/(2p)+3$ . The proof procedure is the same as case (2a), except that “ $V(x)-V(z)=i-n/(2p)>3$ ” changes to $V(x)-V(z)=i-n/2>n/(2p)+3-n/2\geq 3$ . We derive $\mathrm{E}_{1}=\Omega(1/n)$ and $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

(3b) $n-n/(2p)-3\leq i\leq n/(2p)+3$ . Note that $i\geq n-n/(2p)-3=\Omega(n)$ .

First we consider the positive drift $\mathrm{E}_{1}$ . There exists some $z$ with $|z|_{0}=\lceil n-n/(2p)-3\rceil-1$ and such $z$ can be mutated from $x$ by flipping at most $n/(2p)+3-(n-n/(2p)-3)+1=n/p-n+6+1\leq 14$ 0s. Thus,*

[TABLE]

If $i\leq n-n/(2p)-1$ , then $p(n-i)/n\geq 1/2+p/n$ . Thus, $\mathrm{P}_{\rm acc}(x,z)\geq\mathrm{P}(\hat{f}(x)=n-i-1)\geq 1-e^{-\Omega(n)}$ by (i) in Lemma 3.2. If $i>n-n/(2p)-1$ , it can be verified that $z$ will always be accepted under onebit noise. Note that

[TABLE]

*Thus, we have $\mathrm{E}_{1}=\Omega(1)$ .

For $\mathrm{E}_{2}$ , the proof procedure is the same as that of case (2b), except that “ $V(z)=n/(2p)=V(x)$ ” changes to $V(z)=n/2=V(x)$ . Thus, we get $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

(3c) $i<n-n/(2p)-3$ . The analysis for $\mathrm{E}_{1}$ and $\mathrm{E}_{2}$ is the same as that of case (2e), then we have $\mathrm{E}_{1}=\Omega(1/n)$ and $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ .

Combining the three cases, we have $\mathrm{E}_{1}=\Omega(1/n)$ and $\mathrm{E}_{2}\leq e^{-\Omega(n)}$ . Subtract $\mathrm{E}_{2}$ from $\mathrm{E}_{1}$ , we get*

[TABLE]

and we can also derive a polynomial ERT. ∎ $\blacksquare$

By the above proof, we can give an intuitive explanation for the effectiveness of median sampling. For $x$ and $z$ which satisfy $f(x)\!>\!f(z)$ , when the 2-quantile of $f^{\mathrm{n}}(x)$ is larger than that of $f^{\mathrm{n}}(z)$ , $x$ will be estimated better than $z$ by median sampling w.h.p., implying a correct comparison.

4 Cases Where Median Sampling is Better than Mean Sampling

For OM under segmented noise (Definition 4.1), we show that (1+1)-EA equipped with median sampling can do better than (1+1)-EA using mean sampling. The segmented noise is from [25], but we make a little modification to simplify the analysis. As presented in Definition 4.1, the noisy evaluation of a solution $x$ can be divided into three segments. The objective evaluation is accurate in the first segment, but inaccurate in other segments because of noise. We show that for OM under segmented noise, the ERT of (1+1)-EA using mean sampling is exponential (i.e., Theorem 4.2); and the ERT of (1+1)-EA employing median sampling with $m=2n^{3}+1$ is polynomial (i.e., Theorem 4.6). The analyses show that median sampling can be better if the 2-quantile increases with the true fitness.

Definition 4.1.

$\forall x\in\{0,1\}^{n}$ *, its noisy objective $f^{\mathrm{n}}(\cdot)$ is defined as follows:

(1) if $|x|_{0}>\frac{n}{50}$ , $f^{\mathrm{n}}(x)=n-|x|_{0}$ ;

(2) if $\frac{n}{100}<|x|_{0}\leq\frac{n}{50}$ ,*

[TABLE]

(3) if $|x|_{0}\leq\frac{n}{100}$ ,

[TABLE]

where $n/100\in\mathbb{N}^{+}$ .

Theorem 4.2 shows that mean sampling fails under segmented noise and the reason is similar to that found in [25]. Consider $x$ and $z$ satisfying $|z|_{0}=|x|_{0}+1$ . In segment (2), a small sample cannot eliminate the impact of noise, and $\mathrm{P}(\bar{f}(x)\leq\bar{f}(z))$ is still very large. In segment (3), the expected gap between $f^{\mathrm{n}}(z)$ and $f^{\mathrm{n}}(x)$ is positive. Therefore, a larger sample size will enlarge $\mathrm{P}(\bar{f}(x)\leq\bar{f}(z))$ and performs worse; moreover, no medium sample size makes a good tradeoff. Therefore, mean sampling fails. Its rigorous proof can be derived directly from Theorem 5.2 in [25], because the change of noise doesn’t affect the proof.

Theorem 4.2.

For OM under segmented noise, the ERT of (1+1)-EA employing mean sampling is exponential.

To prove Theorem 4.6, Lemma 4.3 is used. This lemma can upper bound the runtime, when the true better solution has a large probability to be recognized as better. Note that $x^{j}$ denotes some solution with $j$ 0s, and $F(\cdot)$ denotes the estimated fitness of a solution.

Lemma 4.3 ([18]).

The EFHT of (1+1)-EA solving noisy OM is polynomial if

[TABLE]

We also present Lemma 4.4 to analyze $\hat{f}(x)$ under segmented noise by taking a sample size of $2n^{3}+1$ .

Lemma 4.4.

Under segmented noise, if median sampling with $m=2n^{3}+1$ is used, then $\mathrm{P}(\hat{f}(x)=n-|x|_{0})=1-e^{-\Omega(n)}$ if $\frac{n}{100}<|x|_{0}\leq\frac{n}{50}$ ; $\mathrm{P}(\hat{f}(x)=4n(n-|x|_{0}))=1-e^{-\Omega(n)}$ if $|x|_{0}\leq\frac{n}{100}$ .

Proof 4.5.

The main procedure is analogous to Lemma 3.2. If $\frac{n}{100}<|x|_{0}\leq\frac{n}{50}$ , suppose there are $s$ noisy evaluations where $f^{\mathrm{n}}(x)=n-|x|_{0}$ in $m$ independent noisy evaluations. Then Eq. (2) also holds and $\mathrm{P}(\hat{f}(x)=n-|x|_{0})=1-e^{-\Omega(n)}$ . If $|x|_{0}\leq\frac{n}{100}$ , we similarly have $\mathrm{P}(\hat{f}(x)=4n(n-|x|_{0}))=1-e^{-\Omega(n)}$ . Thus, the lemma holds. ∎ $\blacksquare$

Theorem 4.6.

For OM under segmented noise, the ERT of (1+1)-EA employing median sampling with $m=2n^{3}+1$ is polynomial.

Proof 4.7.

*The main idea is applying Lemma 4.3. Given $0<i\leq j$ , let $g=\hat{f}(x^{j})-\hat{f}(x^{i-1})$ . To analyze $\mathrm{P}(g\geq 0)$ , we consider four cases for $i$ .

(1) $i>\frac{n}{50}$ . Note that $f^{\mathrm{n}}(x^{j})=f(x^{j})$ and $f^{\mathrm{n}}(x^{i-1})$ is larger by considering $i-1>\frac{n}{50}$ and $i-1\leq\frac{n}{50}$ , respectively. Therefore, we get $\mathrm{P}(g\geq 0)=0$ .

(2) $\frac{n}{100}+1<i\leq\frac{n}{50}$ . If $j>\frac{n}{50}$ , we have $\mathrm{P}(g\geq 0)=0$ , because $f^{\mathrm{n}}(x^{j})=n-j$ and $f^{\mathrm{n}}(x^{i-1})\geq n-i+1>n-j$ . If $j\leq\frac{n}{50}$ , by Lemma 4.4, we have*

[TABLE]

(3) $i\leq\frac{n}{100}+1$ . The analysis is analogous to case (2). If $j>\frac{n}{100}$ , then $\mathrm{P}(g\geq 0)=0$ . If $j\leq\frac{n}{100}$ , then

[TABLE]

Combining the three cases, we have shown $\forall 0<i\leq j:\mathrm{P}(g\geq 0)\leq\log n/(15n)$ for sufficiently large $n$ . Then, by Lemma 4.3, the EFHT is polynomial. In each iteration, the algorithm needs $2m=4n^{3}+2$ evaluations, thus the ERT is polynomial. ∎ $\blacksquare$

5 Cases Where Mean Sampling is Better than Median Sampling

For OM under partial noise (Definition 5.1), we show that (1+1)-EA using median sampling is sometimes worse than (1+1)-EA using mean sampling. For partial noise presented in Definition 5.1, a false objective value is returned when $|x|_{0}<n/2$ . We prove that for OM under partial noise, the ERT of (1+1)-EA employing mean sampling with $m=n^{3}$ is polynomial (i.e., Theorem 5.2); and the ERT of (1+1)-EA employing median sampling is exponential (i.e., Theorem 5.5). The analyses suggest that median sampling may fail if the 2-quantile of the noisy fitness doesn’t increase with the true objective value, and it is better to choose other strategies.

Definition 5.1.

$\forall x\in\{0,1\}^{n}$ *, its noisy objective $f^{\mathrm{n}}(\cdot)$ is defined as follows:

(1) if $|x|_{0}\geq\frac{n}{2}$ , $f^{\mathrm{n}}(x)=n-|x|_{0}$ ;

(2) if $|x|_{0}<\frac{n}{2}$ ,*

[TABLE]

Theorem 5.2.

For OM under partial noise, the ERT of (1+1)-EA employing mean sampling with $m=n^{3}$ is polynomial.

Proof 5.3.

*The main idea is applying Lemma 4.3. Given $0<i\leq j$ , let $g=\bar{f}(x^{i-1})-\bar{f}(x^{j})$ . To analyze $\mathrm{P}(g\geq 0)$ , we classify $i$ into two cases.

(1) $i\geq\frac{n}{2}+1$ . We have $\bar{f}(x^{i-1})=n-i+1$ and $\bar{f}(x^{j})=n-j$ , thus $\mathrm{P}(g\leq 0)=0$ .

(2) $i<\frac{n}{2}+1$ . First we need to derive $\mu:=\mathrm{E}(g)$ . Note that $\mathrm{E}(\bar{f}(x^{i-1}))=\mathrm{E}(f^{\mathrm{n}}(x^{i-1}))=(i-1)/2\cdot 2/3+2(n-i+1)\cdot 1/3=(2n-i+1)/3$ . We classify $j$ into two cases. (a) If $j\geq\frac{n}{2}$ , then $\mathrm{E}(\bar{f}(x^{j}))=n-j$ , thus*

[TABLE]

(b) If $j<\frac{n}{2}$ , then $\mathrm{E}(\bar{f}(x^{j}))=(2n-j)/3$ and $\mathrm{E}(g)=(j-i+1)/3\geq 1/3$ . Thus, we have $\mu\geq 1/3$ . Then we have

[TABLE]

*where the second inequality holds by $|f^{\mathrm{n}}(x^{i-1})-f^{\mathrm{n}}(x^{j})|\leq 2n$ and Hoeffding’s inequality.

Similar to the discussion at the end of Theorem 4.6, the ERT is polynomial. ∎ $\blacksquare$ *

From the proof, we can derive an intuitively explanation for the effectiveness of mean sampling. For $x$ and $z$ satisfying $f(x)>f(z)$ (i.e., $z$ is worse), the expectation of $f^{\mathrm{n}}(x)$ is larger than $f^{\mathrm{n}}(z)$ . Then, there is a small enough probability to accept $z$ if using mean sampling. Thus, the search direction of (1+1)-EA will not be misled and the optimal solution can be quickly found.

To prove Theorem 5.5, we use Lemma 5.4 [18], which intuitively means that if a true worse solution (i.e., a solution with more 0s) is estimated better than a true better solution with some probability, then we can derive the lower bound for the runtime.

Lemma 5.4 ([18]).

If there exists a real number $l\leq n/4$ satisfying

[TABLE]

then w.h.p., the FHT of (1+1)-EA solving noisy OM is $2^{\Omega(l)}$ .

Theorem 5.5.

For OM under partial noise, the ERT of (1+1)-EA employing median sampling is exponential.

Proof 5.6.

*We use Lemma 5.4 for the proof. Given $0<i<\frac{n}{2}$ , let $g=\hat{f}(x^{i})-\hat{f}(x^{i-1})$ . First we show that $\mathrm{P}(\hat{f}(x^{i-1})=(i-1)/2)\geq 1/3$ for $i<\frac{n}{2}$ . Suppose $s$ denotes the number of noisy evaluations satisfying $f^{\mathrm{n}}(x^{i-1})=(i-1)/2$ in $m$ independent noisy evaluations. We classify $m$ into 2 cases.

(1) $m$ is even. Let*

[TABLE]

Note that sum of the three items is 1, and $A\geq B$ ,

[TABLE]

*Thus, $\mathrm{P}(s\geq m/2+1)=A\geq 1/3$ . By definition of median sampling, we derive $\mathrm{P}(\hat{f}(x^{i-1})=(i-1)/2)\geq 1/3$ .

(2) $m$ is odd. We have*

[TABLE]

*Thus, $\mathrm{P}(s\geq(m+1)/2)\geq 1/2$ . By the definition of median sampling, we can derive that $\mathrm{P}(\hat{f}(x^{i-1})=(i-1)/2)\geq 1/2$ .

To make $g\geq 0$ , it is sufficient that $\hat{f}(x^{i-1})=(i-1)/2$ since it always holds that $\hat{f}(x^{i})\geq i/2$ . Thus, $\mathrm{P}(g\geq 0)\geq 1/3$ for $i<\frac{n}{2}$ . Then, the condition of Lemma 5.4 holds by setting $l=n/48$ . Thus, the EFHT is $2^{\Omega(l)}=2^{\Omega(n)}$ , i.e., exponential. ∎ $\blacksquare$ *

From the analysis, we can give an intuitive explanation for the failure of median sampling. Consider $x$ and $z$ satisfying $|x|_{0}=|z|_{0}-1$ (that is, $z$ is worse), the 2-quantile of $f^{\mathrm{n}}(z)$ is larger than that of $f^{\mathrm{n}}(x)$ , and $z$ will be estimated better than $x$ by median sampling w.h.p., implying a wrong comparison.

6 Application Illustration

In this section, we provide some guidance for employing median sampling in practice. The theoretical results have revealed that if the 2-quantile of the noisy fitness increases with the true fitness, we can use median sampling to tackle noise. Inspired by this finding, we may use the following three steps to check the effectiveness of median sampling in practice.

Find a sequence of solutions with increasing true objective values. Note that the solution space can be very large, and we only need to find some representative solutions. The true fitness of a solution can be obtained by conducting evaluation accurately, instead of using an approximation. For example, a prediction model in machine learning can be evaluated using a large amount of data, and a structure in aerodynamic design can be evaluated by CFDs simulation. Note that the number of representative solutions is very limited, and the evaluation process can be easily parallelized, thus the computational cost is usually acceptable. 2. 2.

Find an appropriate sample size $m$ , such that the 2-quantile of the noisy fitness increases with the sequence. If such sample size doesn’t exist or the sample size is too large, it would be better to choose other strategies. 3. 3.

If finding such a sample size, evaluate each solution $m$ times independently and output the median of the $m$ objective values as the estimated fitness during the optimization procedure.

As an application illustration, we use (1+1)-EA to solve OM under onebit noise. It has been known that the ERT of (1+1)-EA solving OM under onebit noise is super-polynomial if the noise probability $p=\omega(\log n/n)$ , thus we set $p=\log^{2}n/n$ . We set the problem size $n=100$ and use $0^{n},10^{n-1},\ldots,1^{n}$ as the sequence of solutions with increasing true fitness. We select the sample size $m$ from $5,10,15,\ldots$ , such that the 2-quantile of the noisy fitness increases with the sequence. Figure 2 shows that it holds when $m=15$ . Thus, using a sample size $m=15$ is probably enough to reduce the negative effect of noise.

To show the effectiveness of median sampling, we next compare the ERT of (1+1)-EA with and without median sampling for the problem size $n\in\{5,10,...100\}$ . For each $n$ , we run (1+1)-EA 100 times independently. In each run, we record the number of fitness evaluations until an optimal solution with respect to the true fitness function is found for the first time. The total number of evaluations of the 100 runs are averaged as the estimation of the ERT. The results are shown in Figure 2. It can be observed that though using median sampling needs to evaluate a solution $m$ times for estimating the fitness, the total number of evaluations required by (1+1)-EA to find the optimum is decreased drastically.

7 Conclusion

In this paper, we introduce median sampling into EAs to handle noise and theoretically analyze the effectiveness of median sampling. We first consider one classical case, i.e., OM under onebit noise, and show that median sampling can reduce the ERT of (1+1)-EA from exponential to polynomial. Next, by two illustrative examples, we show that when the 2-quantile of the noisy fitness increases with the true objective value, median sampling is better than the commonly used mean sampling; otherwise, it is worse. The results provide us with some guidance to employ median sampling in practice. In the future, it would be interesting to analyze the effect of median sampling on real-world noisy optimization problems.

\Acknowledgements

The authors want to thank the editor and anonymous reviewers for their helpful comments and suggestions, and one reviewer of our work [25], whose comments motivate this work. This work was supported by the National Key Research and Development Program of China (2017YFB1003102), the NSFC (62022039, 61672478, 61876077), and the MOE University Scientific-Technological Innovation Plan Program.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bäck T. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, Oxford, UK, 1996
2[2] Xu P, Liu X, Cao H, et al. An efficient energy aware virtual network migration based on genetic algorithm. Front Comput Sci, 2019, 13(2): 440-442
3[3] Yuan Q, Tang H, You W, et al. Virtual network function scheduling via multilayer encoding genetic algorithm with distributed bandwidth allocation. Sci China Inf Sci, 2018, 61(9): 092107
4[4] Jin Y, Branke J. Evolutionary optimization in uncertain environments—A survey. IEEE Trans Evol Comput, 2005, 9(3): 303-317
5[5] Aizawa A, Wah B. Scheduling of genetic algorithms in a noisy environment. Evol Comput, 1994, 2(2): 97-122
6[6] Stagge P. Averaging efficiently in the presence of noise. In: Proceedings of the 5th International Conference on Parallel Problem Solving from Nature, Amsterdam, The Netherlands, 1998. 188-197
7[7] Branke J, Schmidt C. Selection in the presence of noise. In: Proceedings of the 5th ACM Conference on Genetic and Evolutionary Computation, Chicago, IL, 2003. 766-777
8[8] Branke J, Schmidt C. Sequential sampling in noisy environments. In: Proceedings of the 8th International Conference on Parallel Problem Solving from Nature, Birmingham, UK, 2004. 202-211

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On the Robustness of Median Sampling

Abstract

keywords:

1 Introduction

2 Preliminaries

2.1 OneMax Problem

Definition 2.1**.**

2.2 (1+1) Evolutionary Algorithm

2.3 Median Sampling

Definition 2.2** (Mean Sampling).**

Definition 2.3** (Median Sampling).**

2.4 Analysis Tool

Theorem 2.4** (Additive drift [39]).**

3 The Robustness of Median Sampling Against Onebit Noise

Definition 3.1** (Onebit Noise).**

Lemma 3.2**.**

Proof 3.3**.**

Theorem 3.4**.**

Proof 3.5**.**

4 Cases Where Median Sampling is Better than Mean Sampling

Definition 4.1**.**

Theorem 4.2**.**

Lemma 4.3** (​​[18]).**

Lemma 4.4**.**

Proof 4.5**.**

Theorem 4.6**.**

Proof 4.7**.**

5 Cases Where Mean Sampling is Better than Median Sampling

Definition 5.1**.**

Theorem 5.2**.**

Proof 5.3**.**

Lemma 5.4** (​​[18]).**

Theorem 5.5**.**

Proof 5.6**.**

6 Application Illustration

7 Conclusion

Definition 2.1.

Definition 2.2 (Mean Sampling).

Definition 2.3 (Median Sampling).

Theorem 2.4 (Additive drift [39]).

Definition 3.1 (Onebit Noise).

Lemma 3.2.

Proof 3.3.

Theorem 3.4.

Proof 3.5.

Definition 4.1.

Theorem 4.2.

Lemma 4.3 ([18]).

Lemma 4.4.

Proof 4.5.

Theorem 4.6.

Proof 4.7.

Definition 5.1.

Theorem 5.2.

Proof 5.3.

Lemma 5.4 ([18]).

Theorem 5.5.

Proof 5.6.