Analysis of the $(\mu/\mu_I,\lambda)$-CSA-ES with Repair by Projection   Applied to a Conically Constrained Problem

Patrick Spettel; Hans-Georg Beyer

arXiv:1901.07871·cs.NE·August 12, 2019

Analysis of the $(\mu/\mu_I,\lambda)$-CSA-ES with Repair by Projection Applied to a Conically Constrained Problem

Patrick Spettel, Hans-Georg Beyer

PDF

TL;DR

This paper provides a theoretical analysis of a specific evolution strategy with step size adaptation applied to a conically constrained linear optimization problem, predicting its dynamics and steady states.

Contribution

It introduces a stochastic iterative model for the strategy and derives equations predicting its behavior, validated by comparison with actual algorithm runs.

Findings

01

Theoretical predictions match real algorithm dynamics.

02

Steady state values are accurately predicted.

03

Model simplifies fluctuations to analyze mean behavior.

Abstract

Theoretical analyses of evolution strategies are indispensable for gaining a deep understanding of their inner workings. For constrained problems, rather simple problems are of interest in the current research. This work presents a theoretical analysis of a multi-recombinative evolution strategy with cumulative step size adaptation applied to a conically constrained linear optimization problem. The state of the strategy is modeled by random variables and a stochastic iterative mapping is introduced. For the analytical treatment, fluctuations are neglected and the mean value iterative system is considered. Non-linear difference equations are derived based on one-generation progress rates. Based on that, expressions for the steady state of the mean value iterative system are derived. By comparison with real algorithm runs, it is shown that for the considered assumptions, the theoretical…

Figures40

Click any figure to enlarge with its caption.

Equations216

f (x) = x_{1}

f (x) = x_{1}

x_{1}^{2} - ξ k = 2 \sum N x_{k}^{2}

x_{1}^{2} - ξ k = 2 \sum N x_{k}^{2}

x_{1}

\hat{x} = x^{'} arg min ∥ x^{'} - x ∥^{2} s.t. x_{1}^{'}^{2} - ξ k = 2 \sum N x_{k}^{'}^{2} \geq 0 x_{1}^{'} \geq 0

\hat{x} = x^{'} arg min ∥ x^{'} - x ∥^{2} s.t. x_{1}^{'}^{2} - ξ k = 2 \sum N x_{k}^{'}^{2} \geq 0 x_{1}^{'} \geq 0

\hat{x} = projectOntoCone (x)

\hat{x} = projectOntoCone (x)

\hat{x} = ⎩ ⎨ ⎧ \frac{ξ}{ξ + 1} (x_{1} + \frac{∣∣ r ∣∣}{ξ}) (1, \frac{x _{2}}{ξ ∣∣ r ∣∣}, \dots, \frac{x _{N}}{ξ ∣∣ r ∣∣})^{T} 0 if \frac{ξ}{ξ + 1} (x_{1} + \frac{∣∣ r ∣∣}{ξ}) > 0 otherwise

\hat{x} = ⎩ ⎨ ⎧ \frac{ξ}{ξ + 1} (x_{1} + \frac{∣∣ r ∣∣}{ξ}) (1, \frac{x _{2}}{ξ ∣∣ r ∣∣}, \dots, \frac{x _{N}}{ξ ∣∣ r ∣∣})^{T} 0 if \frac{ξ}{ξ + 1} (x_{1} + \frac{∣∣ r ∣∣}{ξ}) > 0 otherwise

s_{⊙}^{(g)} := \frac{1}{r ^{(g)}} k = 2 \sum N (x^{(g)})_{k} (s^{(g)})_{k} .

s_{⊙}^{(g)} := \frac{1}{r ^{(g)}} k = 2 \sum N (x^{(g)})_{k} (s^{(g)})_{k} .

\left(\begin{array}[]{c}x^{(g+1)}\\ r^{(g+1)}\\ s_{1}^{(g+1)}\\ s_{\odot}^{(g+1)}\\ ||\mathbf{s}^{(g+1)}||^{2}\\ \sigma^{(g+1)}\end{array}\right)\leftarrow\left(\begin{array}[]{c}x^{(g)}\\ r^{(g)}\\ s_{1}^{(g)}\\ s_{\odot}^{(g)}\\ ||\mathbf{s}^{(g)}||^{2}\\ \sigma^{(g)}\end{array}\right).

\left(\begin{array}[]{c}x^{(g+1)}\\ r^{(g+1)}\\ s_{1}^{(g+1)}\\ s_{\odot}^{(g+1)}\\ ||\mathbf{s}^{(g+1)}||^{2}\\ \sigma^{(g+1)}\end{array}\right)\leftarrow\left(\begin{array}[]{c}x^{(g)}\\ r^{(g)}\\ s_{1}^{(g)}\\ s_{\odot}^{(g)}\\ ||\mathbf{s}^{(g)}||^{2}\\ \sigma^{(g)}\end{array}\right).

\left(\begin{array}[]{c}\overline{x^{(g+1)}}\\ \overline{r^{(g+1)}}\\ \overline{s_{1}^{(g+1)}}\\ \overline{s_{\odot}^{(g+1)}}\\ \overline{||\mathbf{s}^{(g+1)}||^{2}}\\ \overline{\sigma^{(g+1)}}\end{array}\right)\leftarrow\left(\begin{array}[]{c}\overline{x^{(g)}}\\ \overline{r^{(g)}}\\ \overline{s_{1}^{(g)}}\\ \overline{s_{\odot}^{(g)}}\\ \overline{||\mathbf{s}^{(g)}||^{2}}\\ \overline{\sigma^{(g)}}\end{array}\right).

\left(\begin{array}[]{c}\overline{x^{(g+1)}}\\ \overline{r^{(g+1)}}\\ \overline{s_{1}^{(g+1)}}\\ \overline{s_{\odot}^{(g+1)}}\\ \overline{||\mathbf{s}^{(g+1)}||^{2}}\\ \overline{\sigma^{(g+1)}}\end{array}\right)\leftarrow\left(\begin{array}[]{c}\overline{x^{(g)}}\\ \overline{r^{(g)}}\\ \overline{s_{1}^{(g)}}\\ \overline{s_{\odot}^{(g)}}\\ \overline{||\mathbf{s}^{(g)}||^{2}}\\ \overline{\sigma^{(g)}}\end{array}\right).

φ_{x} (\overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}) := E [\overline{x^{(g)}} - x^{(g + 1)} ∣ \overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}]

φ_{x} (\overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}) := E [\overline{x^{(g)}} - x^{(g + 1)} ∣ \overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}]

φ_{r} (\overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}) := E [\overline{r^{(g)}} - r^{(g + 1)} ∣ \overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}] .

φ_{r} (\overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}) := E [\overline{r^{(g)}} - r^{(g + 1)} ∣ \overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}] .

φ_{x}^{*} (\cdot) := \frac{N φ _{x} ( \cdot )}{x ^{(g)}},

φ_{x}^{*} (\cdot) := \frac{N φ _{x} ( \cdot )}{x ^{(g)}},

φ_{r}^{*} (\cdot) := \frac{N φ _{r} ( \cdot )}{r ^{(g)}},

φ_{r}^{*} (\cdot) := \frac{N φ _{r} ( \cdot )}{r ^{(g)}},

σ^{*} := \frac{N σ}{r ^{(g)}}

σ^{*} := \frac{N σ}{r ^{(g)}}

\overline{x^{(g + 1)}}

\overline{x^{(g + 1)}}

\overline{r^{(g + 1)}}

\overset{r}{ˉ} = \overline{r^{(g)}} 1 + \frac{σ ^{(g)} ^{*} ^{2}}{N} (1 - \frac{1}{N})

\overset{r}{ˉ} = \overline{r^{(g)}} 1 + \frac{σ ^{(g)} ^{*} ^{2}}{N} (1 - \frac{1}{N})

σ_{r} = \overline{r^{(g)}} \frac{σ ^{(g)} ^{*}}{N} \frac{1 + \frac{σ ^{(g)} ^{*} ^{2}}{2 N} ( 1 - \frac{1}{N} )}{1 + \frac{σ ^{(g)} ^{*} ^{2}}{N} ( 1 - \frac{1}{N} )}

σ_{r} = \overline{r^{(g)}} \frac{σ ^{(g)} ^{*}}{N} \frac{1 + \frac{σ ^{(g)} ^{*} ^{2}}{2 N} ( 1 - \frac{1}{N} )}{1 + \frac{σ ^{(g)} ^{*} ^{2}}{N} ( 1 - \frac{1}{N} )}

\begin{multlined}{\varphi^{(g)}_{x}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})\left[\frac{\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\right]+[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]\\ \times\underbrace{\left[\frac{N}{1+\xi}\left(1-\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}\right)+\frac{\sqrt{\xi}}{1+\xi}\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\sqrt{1+\frac{1}{\xi}\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{2N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right]}_{=:{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}\end{multlined}{\varphi^{(g)}_{x}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})\left[\frac{\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\right]+[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]\\ \times\underbrace{\left[\frac{N}{1+\xi}\left(1-\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}\right)+\frac{\sqrt{\xi}}{1+\xi}\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\sqrt{1+\frac{1}{\xi}\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{2N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right]}_{=:{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}

\begin{multlined}{\varphi^{(g)}_{x}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})\left[\frac{\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\right]+[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]\\ \times\underbrace{\left[\frac{N}{1+\xi}\left(1-\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}\right)+\frac{\sqrt{\xi}}{1+\xi}\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\sqrt{1+\frac{1}{\xi}\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{2N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right]}_{=:{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}\end{multlined}{\varphi^{(g)}_{x}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})\left[\frac{\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\right]+[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]\\ \times\underbrace{\left[\frac{N}{1+\xi}\left(1-\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}\right)+\frac{\sqrt{\xi}}{1+\xi}\frac{\sqrt{\xi}\overline{r^{(g)}}}{\overline{x^{(g)}}}\overline{{\sigma^{(g)}}^{*}}c_{\mu/\mu,\lambda}\sqrt{1+\frac{1}{\xi}\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{2N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right]}_{=:{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}

\displaystyle\begin{multlined}{\varphi^{(g)}_{r}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})N\left(1-\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}\right)\\ +[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]N\left(1-\frac{\overline{x^{(g)}}}{\sqrt{\xi}\overline{r^{(g)}}}\left(1-\frac{{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}{N}\right)\sqrt{\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right).\end{multlined}{\varphi^{(g)}_{r}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})N\left(1-\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}\right)\\ +[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]N\left(1-\frac{\overline{x^{(g)}}}{\sqrt{\xi}\overline{r^{(g)}}}\left(1-\frac{{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}{N}\right)\sqrt{\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right).

\displaystyle\begin{multlined}{\varphi^{(g)}_{r}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})N\left(1-\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}\right)\\ +[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]N\left(1-\frac{\overline{x^{(g)}}}{\sqrt{\xi}\overline{r^{(g)}}}\left(1-\frac{{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}{N}\right)\sqrt{\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right).\end{multlined}{\varphi^{(g)}_{r}}^{*}\approx P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})N\left(1-\sqrt{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}\right)\\ +[1-P_{\text{feas}}(\overline{x^{(g)}},\overline{r^{(g)}},\overline{\sigma^{(g)}})]N\left(1-\frac{\overline{x^{(g)}}}{\sqrt{\xi}\overline{r^{(g)}}}\left(1-\frac{{\varphi_{x}^{*}}_{\text{infeas}}^{(g)}}{N}\right)\sqrt{\frac{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{\mu N}}{1+\frac{{\overline{{\sigma^{(g)}}^{*}}}^{2}}{N}}}\right).

P_{feas} (\overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}) ≃ Φ [\frac{1}{σ ^{(g)}} (\frac{x ^{(g)}}{ξ} - \overset{r}{ˉ})]

P_{feas} (\overline{x^{(g)}}, \overline{r^{(g)}}, \overline{σ^{(g)}}) ≃ Φ [\frac{1}{σ ^{(g)}} (\frac{x ^{(g)}}{ξ} - \overset{r}{ˉ})]

c_{μ / μ, λ} := \frac{λ - μ}{2 π} (μ λ) \int_{t = - \infty}^{t = \infty} e^{- t^{2}} [Φ (t)]^{λ - μ - 1} [1 - Φ (t)]^{μ - 1} d t .

c_{μ / μ, λ} := \frac{λ - μ}{2 π} (μ λ) \int_{t = - \infty}^{t = \infty} e^{- t^{2}} [Φ (t)]^{λ - μ - 1} [1 - Φ (t)]^{μ - 1} d t .

\overline{s_{1}^{(g + 1)}}

\overline{s_{1}^{(g + 1)}}

φ_{x}^{(g)} = E [x^{(g)} - x^{(g + 1)}] = E [x^{(g)} - (x^{(g)} + σ^{(g)} (⟨ \tilde{z}^{(g)} ⟩)_{1})] = - \overline{σ^{(g)}} E [(⟨ \tilde{z}^{(g)} ⟩)_{1}]

φ_{x}^{(g)} = E [x^{(g)} - x^{(g + 1)}] = E [x^{(g)} - (x^{(g)} + σ^{(g)} (⟨ \tilde{z}^{(g)} ⟩)_{1})] = - \overline{σ^{(g)}} E [(⟨ \tilde{z}^{(g)} ⟩)_{1}]

E [(⟨ \tilde{z}^{(g)} ⟩)_{1}]

E [(⟨ \tilde{z}^{(g)} ⟩)_{1}]

\overline{s_{1}^{(g + 1)}}

\overline{s_{1}^{(g + 1)}}

\displaystyle\begin{multlined}s_{\odot}^{(g+1)}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}(\mathbf{x}^{(g+1)})_{k}(\mathbf{s}^{(g+1)})_{k}\end{multlined}s_{\odot}^{(g+1)}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}(\mathbf{x}^{(g+1)})_{k}(\mathbf{s}^{(g+1)})_{k}

\displaystyle\begin{multlined}s_{\odot}^{(g+1)}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}(\mathbf{x}^{(g+1)})_{k}(\mathbf{s}^{(g+1)})_{k}\end{multlined}s_{\odot}^{(g+1)}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}(\mathbf{x}^{(g+1)})_{k}(\mathbf{s}^{(g+1)})_{k}

\displaystyle\begin{multlined}\phantom{s_{\odot}^{(g+1)}}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}\left[(\mathbf{x}^{(g)})_{k}+\sigma^{(g)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}\right]\left[(1-c)(\mathbf{s}^{(g)})_{k}+\sqrt{\mu c(2-c)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}\right]\end{multlined}\phantom{s_{\odot}^{(g+1)}}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}\left[(\mathbf{x}^{(g)})_{k}+\sigma^{(g)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}\right]\left[(1-c)(\mathbf{s}^{(g)})_{k}+\sqrt{\mu c(2-c)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}\right]

\displaystyle\begin{multlined}\phantom{s_{\odot}^{(g+1)}}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}(1-c)\left[(\mathbf{x}^{(g)})_{k}(\mathbf{s}^{(g)})_{k}+\sigma^{(g)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}(\mathbf{s}^{(g)})_{k}\right]\\ \hskip 56.9055pt+\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}\sqrt{\mu c(2-c)}\left[(\mathbf{x}^{(g)})_{k}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}+\sigma^{(g)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}\right]\end{multlined}\phantom{s_{\odot}^{(g+1)}}=\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}(1-c)\left[(\mathbf{x}^{(g)})_{k}(\mathbf{s}^{(g)})_{k}+\sigma^{(g)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}(\mathbf{s}^{(g)})_{k}\right]\\ \hskip 56.9055pt+\frac{1}{r^{(g+1)}}\sum_{k=2}^{N}\sqrt{\mu c(2-c)}\left[(\mathbf{x}^{(g)})_{k}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}+\sigma^{(g)}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}\right]

\displaystyle\begin{multlined}s_{\odot}^{(g+1)}=\frac{r^{(g)}}{r^{(g+1)}}(1-c)\left[s_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}^{T}(\mathbf{s}^{(g)})_{2..N}\right]\\ \hskip 28.45274pt+\frac{r^{(g)}}{r^{(g+1)}}\sqrt{\mu c(2-c)}\left[z_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}||({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}||^{2}\right].\end{multlined}s_{\odot}^{(g+1)}=\frac{r^{(g)}}{r^{(g+1)}}(1-c)\left[s_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}^{T}(\mathbf{s}^{(g)})_{2..N}\right]\\ \hskip 28.45274pt+\frac{r^{(g)}}{r^{(g+1)}}\sqrt{\mu c(2-c)}\left[z_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}||({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}||^{2}\right].

\displaystyle\begin{multlined}s_{\odot}^{(g+1)}=\frac{r^{(g)}}{r^{(g+1)}}(1-c)\left[s_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}^{T}(\mathbf{s}^{(g)})_{2..N}\right]\\ \hskip 28.45274pt+\frac{r^{(g)}}{r^{(g+1)}}\sqrt{\mu c(2-c)}\left[z_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}||({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}||^{2}\right].\end{multlined}s_{\odot}^{(g+1)}=\frac{r^{(g)}}{r^{(g+1)}}(1-c)\left[s_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}^{T}(\mathbf{s}^{(g)})_{2..N}\right]\\ \hskip 28.45274pt+\frac{r^{(g)}}{r^{(g+1)}}\sqrt{\mu c(2-c)}\left[z_{\odot}^{(g)}+\frac{{\sigma^{(g)}}^{*}}{N}||({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}||^{2}\right].

r^{(g + 1)}

r^{(g + 1)}

= k = 2 \sum N ((x^{(g)})_{k}^{2} + 2 σ^{(g)} (x^{(g)})_{k} (⟨ \tilde{z}^{(g)} ⟩)_{k} + σ^{(g)}^{2} (⟨ \tilde{z}^{(g)} ⟩)_{k}^{2})

= r^{(g)}^{2} + 2 \frac{σ ^{(g)} ^{*}}{N} r^{(g)}^{2} z_{⊙}^{(g)} + \frac{σ ^{(g)} ^{*} ^{2}}{N ^{2}} r^{(g)}^{2} ∣∣ (⟨ \tilde{z}^{(g)} ⟩)_{2.. N} ∣ ∣^{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Analysis of the $(\mu/\mu_{I},\lambda)$ -CSA-ES

with Repair by Projection Applied to a Conically Constrained Problem

\namePatrick Spettel \[email protected]

\addrResearch Center Process and Product Engineering, Vorarlberg University of Applied Sciences, Dornbirn, 6850, Austria \AND\nameHans-Georg Beyer \[email protected]

\addrDepartment of Computer Science, Research Center Process and Product Engineering, Vorarlberg University of Applied Sciences, Dornbirn, 6850, Austria

Abstract

Theoretical analyses of evolution strategies are indispensable for gaining a deep understanding of their inner workings. For constrained problems, rather simple problems are of interest in the current research. This work presents a theoretical analysis of a multi-recombinative evolution strategy with cumulative step size adaptation applied to a conically constrained linear optimization problem. The state of the strategy is modeled by random variables and a stochastic iterative mapping is introduced. For the analytical treatment, fluctuations are neglected and the mean value iterative system is considered. Non-linear difference equations are derived based on one-generation progress rates. Based on that, expressions for the steady state of the mean value iterative system are derived. By comparison with real algorithm runs, it is shown that for the considered assumptions, the theoretical derivations are able to predict the dynamics and the steady state values of the real runs.

Keywords

Evolution strategy, constraint handling, repair by projection, cumulative step size adaptation, conically constrained problem.

1 Introduction

Thorough theoretical investigations of evolution strategies (ESs) are necessary for gaining a deep understanding of how they work. A lot of research has been done for analyzing ESs applied to unconstrained problems. For the constrained setting, there are still aspects for which a deep theoretical understanding is missing. As a step in that direction, this work theoretically analyzes a $(\mu/\mu_{I},\lambda)$ -ES with cumulative step size adaptation (CSA) applied to a conically constrained linear problem.

Regarding related work, a $(1,\lambda)$ -ES with constraint handling by discarding infeasible offspring has been analyzed by Arnold, 2011b for a single linear constraint. Repair by projection has been considered (Arnold, 2011a, ) and a comparison with repair by reflection and repair by truncation has been performed by Hellwig and Arnold, (2016). Based on Lagrangian constraint handling, Arnold and Porter, (2015) presented a $(1+1)$ -ES applied to a single linear inequality constraint with the sphere model. The one-generation behavior has been analyzed in that work.

A theoretical investigation based on Markov chains for a multi-recombinative variant with Lagrangian constraint handling has been presented by Atamna et al., (2016). Investigation of a single linear constraint in that work has been extended to multiple linear constraints (Atamna et al.,, 2017).

Arnold, (2013) has considered a conically constrained problem. In that work, a $(1,\lambda)$ -ES is applied to the problem by discarding infeasible offspring. Spettel and Beyer, 2018a have considered the same problem and have analyzed a $(1,\lambda)$ - $\sigma$ -Self-Adaptation ES ( $\sigma$ SA-ES). It has been extended to the multi-recombinative $(\mu/\mu_{I},\lambda)$ variant (Spettel and Beyer, 2018b, ). The contribution of this paper is the analysis considering CSA instead of $\sigma$ SA for the mutation strength control mechanism.

The remainder of the paper is organized as follows. Section 2 introduces the optimization problem under consideration and describes the algorithm that is analyzed. Section 3 concerns the theoretical analysis. First, a mean value iterative system that models the dynamics of the ES is derived in Section 3.1. Second, steady state considerations are shown in Section 3.2. For the theoretical considerations, plots comparing them to results of real ES runs are presented for showing the approximation quality. Finally, Section 4 discusses the results and concludes the paper.

2 Problem and Algorithm

Minimization of

[TABLE]

subject to constraints

[TABLE]

is considered in this work ( $\mathbf{x}=(x_{1},\ldots,x_{N})^{T}\in\mathbb{R}^{N}$ and $\xi>0$ ).

The state of an ES individual can be uniquely described in the $(x,r)^{T}$ -space. It consists of $x$ , the distance from [math] in $x_{1}$ -direction (cone axis), and $r$ , the distance from the cone axis. Because isotropic mutations are considered in the ES, the coordinate system can be rotated (w.l.o.g.) such that $(\tilde{x},\tilde{r})^{T}$ corresponds to $(\tilde{x},\tilde{r},0,\ldots,0)^{T}$ in the parameter space. Figure 1 visualizes the problem. The equation for the cone boundary is $r=\frac{x}{\sqrt{\xi}}$ , which follows from Equation 2. The projection line can be derived using the cone direction vector $\left(1,\frac{1}{\sqrt{\xi}}\right)^{T}$ and its counterclockwise rotation by 90 degrees $\left(-\frac{1}{\sqrt{\xi}},1\right)^{T}$ yielding $r=-\sqrt{\xi}x+q\left(\sqrt{\xi}+\frac{1}{\sqrt{\xi}}\right)$ . The values $\mathbf{x}$ and $\tilde{\mathbf{x}}$ denote a parental individual and an offspring individual, respectively. The corresponding mutation is indicated as $\tilde{\sigma}\mathbf{z}$ . The values of $x$ and $r$ after projection are denoted by $q$ and $q_{r}$ , respectively.

The algorithm to be analyzed is a $(\mu/\mu_{I},\lambda)$ -CSA-ES with repair by projection applied to the problem introduced above. Its pseudo code is shown in Algorithm 1. In the beginning, the parameters are initialized (1 to 2). In the generational loop, $\lambda$ offspring are created (6 to 16). Each offspring’s parameter vector is sampled from a multivariate normal distribution with mean $\mathbf{x}^{(g)}$ and standard deviation $\sigma^{(g)}$ in 7 and 8. If the generated offspring is infeasible ( $\mathrm{isFeasible}(\mathbf{x})=x_{1}\geq 0\land x_{1}^{2}-\xi\sum_{k=2}^{N}x_{k}^{2}\geq 0$ ), its parameter vector is projected onto the point on the boundary of the feasible region that minimizes the Euclidean distance to the offspring point. The corresponding mutation vector leading to this repaired point is calculated back (9 to 12). Projection means solving the optimization problem

[TABLE]

where $\mathbf{x}$ is the individual to be projected. The function

[TABLE]

is introduced, which returns $\mathbf{\hat{x}}$ of the problem (4). Appendix A in the supplementary material of Spettel and Beyer, 2018b shows a geometrical approach for deriving a closed-form solution to the projection optimization problem (4). Given an infeasible individual $\mathbf{x}$ , it reads

[TABLE]

where $||\mathbf{r}||=\sqrt{\sum_{k=2}^{N}x_{k}^{2}}$ . After possible repair, the offspring’s fitness is determined in 13. The next generation’s parental individual $\mathbf{x}^{(g+1)}$ (18) and the next generation’s mutation strength $\sigma^{(g+1)}$ (20) are computed next. The next generation’s parental parameter vector is set to the mean of the $\mu$ best (w.r.t. fitness) offspring parameter vectors111Note that the order statistic notation $m;\lambda$ is used to denote the $m$ -th best (w.r.t. fitness) out of $\lambda$ values. The notation $(\mathbf{x})_{k}$ is used to denote the $k$ -th element of a vector $\mathbf{x}$ . It is equivalent to writing $x_{k}$ .. For the mutation strength update, first the cumulative $\mathbf{s}$ -vector is updated. The cumulation parameter $c$ determines the fading strength. The mutation strength is then updated using this $\mathbf{s}$ -vector. The parameter $D$ acts as a damping factor. If the squared length of the $\mathbf{s}$ -vector is smaller than $N$ , the step size is decreased. Otherwise, the step size is increased. Intuitively, this means that multiple correlated steps allow a larger step size and vice versa. The update of the generation counter ends one iteration of the generation loop. The values $x^{(g)}$ , $r^{(g)}$ , $q_{l}$ , $\mathbf{q^{\prime}}_{l}$ , $\langle q\rangle$ , and $\langle q_{r}\rangle$ are only needed in the theoretical analysis and can be removed in practical applications of the ES. They are indicated in the algorithm in 4, 5, 14, 15, 21 and 22, respectively.

Figure 2 shows an example of the $x$ - and $r$ -dynamics of Algorithm 1 (solid line) in comparison with results of the closed-form approximate iterative system (dotted line) that is derived in the sections that follow. As one can see, the real dynamics are predicted satisfactorily by the theoretical considerations for the case shown.

3 Theoretical Analysis

To completely describe the state of the ES, the random variables $\sigma$ , $\mathbf{s}$ , and the squared length $||\mathbf{s}||^{2}$ need to be modeled in addition to the variables for the position in the parameter space, $x$ and $r$ . The random vector $\mathbf{s}$ is decomposed into its magnitude along the cone axis $s_{1}^{(g)}$ and its magnitude in direction of the parental individual’s $2..N$ components

[TABLE]

This leads to a stochastic iterative system of the form

[TABLE]

3.1 Derivation of a Mean Value Iterative System for Modeling the

Dynamics of the ES

Similar to the analysis in Section IV of Spettel and Beyer, 2018b , fluctuation terms are neglected and deterministic evolution equations under asymptotic assumptions are derived. This allows predicting the mean value dynamics of the ES. To make the distinction between the random variable and its mean value in the iterative system clear, $\overline{z}:=\mathrm{E}[z]$ is used to denote the expected value of a random variate $z$ . Thus, the mean value iterative system is represented as

[TABLE]

This section presents derivations of difference equations for the system (9). In Section 3.1.1, difference equations are presented for expressing $\overline{x^{(g+1)}}$ with $\overline{x^{(g)}}$ and $\overline{r^{(g+1)}}$ with $\overline{r^{(g)}}$ by using the respective local progress rates. Section 3.1.2, Section 3.1.3, and Section 3.1.4 deal with the derivation of difference equations for $\overline{s_{1}}$ , $\overline{s_{\odot}}$ , and $\overline{||\mathbf{s}^{(g)}||^{2}}$ , respectively. They are derived from the corresponding steps of Algorithm 1 and they also make use of the local progress rates. Finally, the difference equation for $\overline{\sigma}$ is stated in Section 3.1.5, the derived system of equations is summarized, and it is compared to real ES runs in Section 3.1.6.

3.1.1 Derivation of Mean Value Difference Equations for $x$

and $r$

The starting points for the derivation of mean value difference equations for $x$ and $r$ are the progress rates in $x$ and $r$ direction. Their definitions read

[TABLE]

They describe the one-generation expected change in the parameter space. The normalizations

[TABLE]

and

[TABLE]

are introduced in order to have quantities that are independent of the position in the search space. Using Equation 10 with Equation 12 and Equation 11 with Equation 13, the equations

[TABLE]

follow. Approximations for ${\varphi^{(g)}_{x}}^{*}$ and ${\varphi^{(g)}_{r}}^{*}$ have already been derived by Spettel and Beyer, 2018b (, Equations (37) and (38)). In that work, expressions for ${\varphi^{(g)}_{x}}^{*}$ and ${\varphi^{(g)}_{r}}^{*}$ have been derived under the asymptotic assumptions of sufficiently large values of $\xi$ and $N$ . In those derivations, two cases have been distinguished. If one considers the ES being far from the cone boundary, offspring are feasible with overwhelming probability. The opposite case of being in the vicinity of the cone boundary results in infeasible offspring almost surely. These observations allow simplifications for the former case because the projection can be ignored. Both cases are combined into single equations by weighting the feasible and infeasible cases with an approximation for the offspring feasibility and offspring infeasibility probability, respectively. The $r$ -distribution in those derivations has been approximated by a normal distribution $\mathcal{N}(\bar{r},\sigma_{r}^{2})$ where

[TABLE]

and

[TABLE]

(it is referred to Appendix B in the supplementary material of Spettel and Beyer, 2018b for the detailed derivation). The results that build the basis for the following CSA analysis are briefly recapped here.222In the further considerations, the symbols “ $\simeq$ ” and “ $\approx$ ” are used. Expressions in the form of $\mathrm{lhs}\simeq\mathrm{rhs}$ denote that $\mathrm{lhs}$ is asymptotically equal to $\mathrm{rhs}$ for given asymptotical assumptions (e.g. $N\rightarrow\infty$ ). The particular assumptions are stated explicitly for every use of “ $\simeq$ ”. That is, in the limit case of the given assumptions, $\mathrm{lhs}$ is equal to $\mathrm{rhs}$ . The form $\mathrm{lhs}\approx\mathrm{rhs}$ is used for cases where $\mathrm{rhs}$ is an approximation for $\mathrm{lhs}$ with given assumptions that are not of asymptotical nature. In this sense, “ $\approx$ ” is weaker than “ $\simeq$ ”. The expression for ${\varphi^{(g)}_{x}}^{*}$ has been derived as

[TABLE]

and the one for ${\varphi^{(g)}_{r}}^{*}$ reads

[TABLE]

The approximate offspring feasibility probability writes

[TABLE]

where $\Phi(\cdot)$ denotes the cumulative distribution function of the standard normal distribution. ${\varphi_{x}^{*}}_{\text{infeas}}^{(g)}$ denotes the infeasible part of Equation 19. The constant $c_{\mu/\mu,\lambda}$ is a so-called progress coefficient. A definition is given in (Beyer,, 2001, Eq. 6.102, p. 247). It reads

[TABLE]

3.1.2 Derivation of a Mean Value Difference Equation for $s_{1}$

For $s_{1}$ , a mean value difference equation can be derived using the update rule from 19 of Algorithm 1. Computation of the expected value with $\overline{s_{1}^{(g+1)}}:=\mathrm{E}[s_{1}^{(g+1)}]$ directly yields

[TABLE]

$\mathrm{E}[(\langle\tilde{\mathbf{z}}^{(g)}\rangle)_{1}]$ can be expressed with the progress rate in $x$ -direction $\varphi_{x}$ . From the definition of the $x$ progress rate,

[TABLE]

follows. Therefore,

[TABLE]

holds. Using Equation 27 and Equation 14,

[TABLE]

follows.

3.1.3 Derivation of a Mean Value Difference Equation for $s_{\odot}$

For $s_{\odot}$ , a mean value difference equation can be derived using the update rule from 19 of Algorithm 1 and considering Equation 7. To begin with,

[TABLE]

can be derived. Equation 35 can further be rewritten by the introduction of $z_{\odot}^{(g)}:=\frac{1}{r^{(g)}}\sum_{k=2}^{N}(\mathbf{x}^{(g)})_{k}({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}$ (similar to Equation 7) and use of Equation 14 resulting in

[TABLE]

For the fraction $r^{(g)}/r^{(g+1)}$ , $r^{(g+1)}$ has to be derived. From the offspring generation and selection steps it follows that

[TABLE]

holds. Using the result from Equation 41,

[TABLE]

can be derived. For further simplification of Equation 42, asymptotic assumptions are made for $N\rightarrow\infty$ . Because the mutation vector is corrected in case of projection (11 in Algorithm 1), ${\langle\tilde{\mathbf{z}}^{(g)}\rangle}$ denotes the centroid of the $\mu$ best (w.r.t. fitness) offspring mutation vectors after the projection step. Approximation of ${\langle\tilde{\mathbf{z}}^{(g)}\rangle}$ for the asymptotic case by its value before projection and selection yields a normal distribution for $({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{k}=\frac{1}{\mu}\sum_{m=1}^{\mu}(\tilde{\mathbf{z}}_{m;\lambda})_{k}\sim\mathcal{N}(0,\frac{1}{\mu})=\frac{1}{\sqrt{\mu}}\mathcal{N}(0,1)$ , which follows by the properties of a sum of normal distributed random variables.

Hence, $||({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}||^{2}$ can be approximated by a $\chi^{2}$ distribution with $N-1$ degrees of freedom. As the expected value of the $\chi^{2}$ distribution corresponds to its number of degrees of freedom,

[TABLE]

follows for $N\rightarrow\infty$ by the law of large numbers. With Equation 43 and the assumptions $N\gg 2{\sigma^{(g)}}^{*}z^{(g)}_{\odot}$ and $\mu N\gg{{\sigma^{(g)}}^{*}}^{2}$ ,

[TABLE]

follows. Making use of Equation 44 and Equation 43, Equation 38 can be simplified for the asymptotic case $N\rightarrow\infty$ yielding

[TABLE]

Taking expected values of Equation 47 with $\mathrm{E}[s_{\odot}^{(g+1)}]:=\overline{s_{\odot}^{(g+1)}}$ results in

[TABLE]

To treat Equation 50 further, $\mathrm{E}[({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}^{T}(\mathbf{s}^{(g)})_{2..N}]$ and $\mathrm{E}[{z_{\odot}^{(g)}}]$ need to be derived.

For $\mathrm{E}[({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}^{T}(\mathbf{s}^{(g)})_{2..N}]$ , $({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}$ is decomposed into a vector in direction of the parental individual’s $2..N$ components $\mathbf{e}^{(g)}_{\odot}$ and in a direction $\mathbf{e}^{(g)}_{\ominus}$ that is orthogonal to $\mathbf{e}^{(g)}_{\odot}$ , i.e., ${\mathbf{e}^{(g)}_{\odot}}^{T}\mathbf{e}^{(g)}_{\ominus}=0$ . Further, in the following the assumption is made that those direction vectors are unit vectors, i.e., $||\mathbf{e}^{(g)}_{\odot}||=1$ and $||\mathbf{e}^{(g)}_{\ominus}||=1$ . Therefore, $({\langle\tilde{\mathbf{z}}^{(g)}\rangle})_{2..N}$ can be written as

[TABLE]

where $z_{\odot}^{(g)}$ and $z_{\ominus}^{(g)}$ are the projections of the mutation vector in direction of $\mathbf{e}^{(g)}_{\odot}$ and $\mathbf{e}^{(g)}_{\ominus}$ , respectively. Using Equation 51,

[TABLE]

follows. Note that ${\mathbf{e}^{(g)}_{\odot}}^{T}(\mathbf{s}^{(g)})_{2..N}$ corresponds to the definition in Equation 7. Taking into account the statistical independence of the cumulated path vector and the mutation in the current generation, taking expectation results in

[TABLE]

Note that $\mathrm{E}[z_{\ominus}^{(g)}]$ vanishes because the mutations in direction $\mathbf{e}^{(g)}_{\ominus}$ are isotropic and selectively neutral. Hence, the second summand of Equation 53 is [math] in expectation. To investigate the behavior of $\mathrm{E}[{\mathbf{e}^{(g)}_{\ominus}}^{T}(\mathbf{s}^{(g)})_{2..N}]$ , the dynamics of ${\mathbf{e}^{(g)}_{\ominus}}^{T}(\mathbf{s}^{(g)})_{2..N}$ have been empirically determined for different parameter configurations in real ES runs (not shown here). It turned out that ${\mathbf{e}^{(g)}_{\ominus}}^{T}(\mathbf{s}^{(g)})_{2..N}$ fluctuates around [math] (with the empirical mean being approximately [math]), which further justifies the step from Equation 53 to Equation 54.

$\mathrm{E}[{z_{\odot}^{(g)}}]$ can be calculated from the progress rate of the quadratic distance from the cone axis. It writes

[TABLE]

where $\langle q_{r}\rangle$ denotes the distance from the cone boundary of the centroid after projection (cf. 22 of Algorithm 1). Expressions for $\mathrm{E}[{{\langle q_{r}\rangle}_{\text{feas}}^{2}}]$ and $\mathrm{E}[{{\langle q_{r}\rangle}_{\text{infeas}}^{2}}]$ have already been derived in Appendix D in the supplementary material of Spettel and Beyer, 2018b . The used Taylor approximation in Equation (D.157) of that work allows using the square of Equation (D.165) for the feasible case yielding

[TABLE]

Similarly, the Taylor expansion used in Equation (D.172) of that work allows using the square of Equation (D.217) as an approximation for the infeasible case. It reads

[TABLE]

Using Equation 57 and Equation 58,

[TABLE]

follows, where a closed-form approximation

[TABLE]

has been derived in Spettel and Beyer, 2018b as well (refer to the derivations leading to Equation (C.149) in Appendix C in the supplementary of that work for the details). With Equation 59 and Equation 60, $\varphi_{r^{2}}^{(g)}$ can be computed for a given state of the system. The goal is now to express $\varphi_{r^{2}}^{(g)}$ in terms of $\mathrm{E}[{z_{\odot}^{(g)}}]$ . Subsequently solving for $\mathrm{E}[{z_{\odot}^{(g)}}]$ allows then to compute its value. Using Equation 55 and Equation 41 with Equation 43, $\varphi_{r^{2}}^{(g)}$ can alternatively be written as

[TABLE]

Equation 62 can be solved for $\mathrm{E}[z_{\odot}^{(g)}]$ yielding

[TABLE]

Reinsertion of Equation 54 and Equation 63 into Equation 50 yields

[TABLE]

3.1.4 Derivation of a Mean Value Difference

Equation for $||\mathbf{s}||^{2}$

Using the update rule from 19 of Algorithm 1,

[TABLE]

can be derived. For treating ${\mathbf{s}^{(g)}}^{T}{\langle\tilde{\mathbf{z}}^{(g)}\rangle}$ , the vector $\mathbf{s}^{(g)}$ can be decomposed into a sum of vectors in direction of the cone axis $\mathbf{e}^{(g)}_{1}$ , in direction of the parental individual’s $2..N$ components $\mathbf{e}^{(g)}_{\odot}$ , and in a direction $\mathbf{e}^{(g)}_{\ominus}$ that is orthogonal to $\mathbf{e}^{(g)}_{1}$ and $\mathbf{e}^{(g)}_{\odot}$ . Formally, this can be written as

[TABLE]

where $||\mathbf{e}^{(g)}_{1}||=||\mathbf{e}^{(g)}_{\odot}||=||\mathbf{e}^{(g)}_{\ominus}||=1$ and ${\mathbf{e}^{(g)}_{1}}^{T}\mathbf{e}^{(g)}_{\odot}={\mathbf{e}^{(g)}_{1}}^{T}\mathbf{e}^{(g)}_{\ominus}={\mathbf{e}^{(g)}_{\odot}}^{T}\mathbf{e}^{(g)}_{\ominus}=0$ . $s_{1}^{(g)}$ , $s_{\odot}^{(g)}$ , and $s_{\ominus}^{(g)}$ denote the corresponding projections in those directions. Consequently,

[TABLE]

and subsequently

[TABLE]

follow. Taking into account the statistical independence between the cumulation path and a particular generation’s mutations allows writing

[TABLE]

Again, $\mathrm{E}[{\mathbf{e}^{(g)}_{\ominus}}^{T}{\langle\tilde{\mathbf{z}}^{(g)}\rangle}]$ vanishes because those mutations are selectively neutral and isotropic. Taking expectation of Equation 72, considering Equations 77, 27, 63 and 14, and using $\mathrm{E}[||{\langle\tilde{\mathbf{z}}^{(g)}\rangle}||^{2}]\simeq\frac{N}{\mu}$ ,

[TABLE]

follows.

3.1.5 Derivation of a Mean Value Difference

Equation for $\sigma$

From the update rule of $\sigma$ in 20 of Algorithm 1, ${\sigma^{(g+1)}}={\sigma^{(g)}}\exp\left(\frac{||\mathbf{s}^{(g+1)}||^{2}-N}{2DN}\right)$ follows for the update of the mutation strength. Taking expected values and knowing that ${\sigma^{(g)}}$ is constant w.r.t. $||\mathbf{s}^{(g+1)}||^{2}$ , this writes $\overline{{\sigma^{(g+1)}}}=\overline{{\sigma^{(g)}}}\mathrm{E}\left[\exp\left(\frac{||\mathbf{s}^{(g+1)}||^{2}-N}{2DN}\right)\right]$ . Assuming that the fluctuations of $||\mathbf{s}^{(g+1)}||^{2}$ around its expected value are sufficiently small, the expected value can be pulled into the exponential function yielding

[TABLE]

3.1.6 Summary of the Mean Value Difference Equations

[TABLE]

The mean value dynamics of the $(3/3_{I},10)$ -CSA-ES on the conically constrained problem are shown in Figure 3 for $N=400$ , $\xi=10$ , $c=\frac{1}{\sqrt{N}}$ , and $D=\frac{1}{c}$ . The agreement of the simulations and the derived expressions is satisfactory. In particular, one observes that the lines of the iteration with one-generation experiments are very similar to the lines generated by real ES runs. Consequently, the modeling of the system with Equations 86 to 100 is appropriate and the deviations for the theoretically derived expressions are mainly due to approximations in the derivations of the local progress rates. For this, it is referred to the additional figures provided in the supplementary material (Appendix A). They show a larger deviation for smaller values of $\xi$ and smaller values of $N$ . But notice that in those figures the iteration with one-generation experiments for the local progress measures coincides well with the results of real ES runs. This again shows the appropriateness of the modeling in Equations 86 to 100. The deviations for small $N$ stem from asymptotic assumptions using $N\rightarrow\infty$ . They help simplifying expressions resulting in a theoretical analysis that is tractable. The deviations for small $\xi$ are due to approximations in the derivation of the offspring cumulative distribution function after the projection step in $x_{1}$ -direction $P_{Q}(q)$ (for the details, it is referred to Section 3.1.2.1.2.3 in Spettel and Beyer, 2018c , in particular to the step from Equation (3.73) to Equation (3.74)).

For the figures, results of $100$ real runs of the ES have been averaged for generating the solid lines. The lines for the iteration by approximation have been computed by iterating the mean value iterative system (Equations 86 to 100) with Equations 19, 22 and 59 for ${\varphi^{(g)}_{x}}$ (and ${\varphi^{(g)}_{x}}^{*}$ ), ${\varphi^{(g)}_{r}}$ (and ${\varphi^{(g)}_{r}}^{*}$ ), and ${\varphi^{(g)}_{r^{2}}}$ , respectively. The lines for the iteration with one-generation experiments have been generated by iterating the system (Equations 86 to 100) and simulating ${\varphi^{(g)}_{x}}$ (and ${\varphi^{(g)}_{x}}^{*}$ ), ${\varphi^{(g)}_{r}}$ (and ${\varphi^{(g)}_{r}}^{*}$ ), and ${\varphi^{(g)}_{r^{2}}}$ . It can happen that in a generation of iterating the system (Equations 86 to 100), infeasible $(x^{(g)},r^{(g)})^{T}$ are created. In such circumstances, the corresponding $(x^{(g)},r^{(g)})^{T}$ have been projected back.

3.2 Behavior of the ES in the Steady State

The goal of this section is to derive approximate closed-form expressions for the steady state values of the mean value iterative system that is summarized in Section 3.1.6. A working ES should steadily decrease $x$ and $r$ (Equation 86 and Equation 88, respectively) in order to move towards the optimizer. For determining the steady state normalized mutation strength value, the fixed point of the system of non-linear equations (Equations 90 to 100) is to be computed.

3.2.1 Derivations Towards Closed-Form Steady State Expressions

This section comprises a first step towards closed-form approximations for the steady state values of the system summarized in Section 3.1.6. Expressions are derived that finally lead to a steady state equation for the normalized mutation strength. A closed form solution of this equation is not apparent. Hence, further assumptions for different cases are considered in the following sections.

To compute the fixed point of the system described by Equations 90 to 100, stationary state expressions ${\varphi_{x}}_{ss}^{*}$ , ${\varphi_{r}}_{ss}^{*}$ , $\left(-\frac{N\varphi_{x}^{(g)}}{\overline{{\sigma^{(g)}}^{*}}\,\overline{r^{(g)}}}\right)_{ss}$ , and $\left(-\frac{N\varphi_{r^{2}}^{(g)}}{2{\overline{{\sigma^{(g)}}^{*}}}\,{{\overline{r^{(g)}}}^{2}}}\right)_{ss}$ for ${\varphi_{x}^{(g)}}^{*}$ , ${\varphi_{r}^{(g)}}^{*}$ , $-\frac{N\varphi_{x}^{(g)}}{\overline{{\sigma^{(g)}}^{*}}\,\overline{r^{(g)}}}$ , and $-\frac{N\varphi_{r^{2}}^{(g)}}{2{\overline{{\sigma^{(g)}}^{*}}}\,{{\overline{r^{(g)}}}^{2}}}$ , respectively, need to be derived first because they are dependent on the position in the parameter space. The bottom left subplot of Figure 3 shows that the ES moves in the vicinity of the cone boundary in the steady state. This can be seen because the dynamics of $x$ and $r$ are plotted by converting them into each other for the cone boundary case. Notice that those lines coincide in the steady state. In this situation, $P_{\text{feas}}\simeq 0$ for $N\rightarrow\infty$ . This follows from Equation 23. By the cone boundary equation (Equation 2), a parental individual $(x^{(g)},r^{(g)})^{T}$ is on the cone boundary for $r^{(g)}=x^{(g)}/\sqrt{\xi}$ . Using this together with Equation 14 and Equation 23 yields

[TABLE]

By taking into account Equation 17,

[TABLE]

follows. If $N{\sigma^{(g)}}^{*}$ is sufficiently large, $P_{\text{feas}}\simeq 0$ .

For the distance ratio $\frac{r^{(g)}}{x^{(g)}}$ , one observes that it approaches a stationary state value $\left(\frac{r}{x}\right)_{ss}:=\lim_{g\rightarrow\infty}\frac{r^{(g)}}{x^{(g)}}$ . This can be expressed with the condition $\frac{r^{(g)}}{x^{(g)}}=\frac{r^{(g+1)}}{x^{(g+1)}}=\left(\frac{r}{x}\right)_{ss}$ for sufficiently large values of $g$ . Making use of the progress rates (Equations 10 to 13), $\left(\frac{r}{x}\right)_{ss}=\left(\frac{r}{x}\right)_{ss}\frac{\left(1-\frac{{\varphi_{r}}_{ss}^{*}}{N}\right)}{\left(1-\frac{{\varphi_{x}}_{ss}^{*}}{N}\right)}$ follows, which implies

[TABLE]

The normalized mutation strength should be constant on average in the steady state for a continuous decrease towards the optimizer. That is, the definition of the steady state normalized mutation strength reads $\sigma_{ss}^{*}:=\lim_{g\rightarrow\infty}{\sigma^{(g)}}^{*}$ . Expressed as a condition, it can be stated as ${\sigma^{(g)}}^{*}={\sigma^{(g+1)}}^{*}=\sigma_{ss}^{*}$ .

Considering the case of $P_{\text{feas}}\simeq 0$ , use of the infeasible case approximations (the infeasible part of Equation 19 and the infeasible part Equation 22) for handling Equation 103, results in

[TABLE]

This can subsequently be rewritten to

[TABLE]

For $P_{\text{feas}}\simeq 0$ , the infeasible case approximations can be used. Insertion of Equation 107 into the infeasible part of Equation 19 assuming the expected $\sigma_{ss}^{*}$ steady state together with Equation 103 and $\frac{1}{\xi}\sqrt{\frac{1+\frac{{{\sigma_{ss}^{*}}}^{2}}{2N}}{1+\frac{{{\sigma_{ss}^{*}}}^{2}}{N}}}\simeq\frac{1}{\xi}$ yields

[TABLE]

$\sqrt{\frac{1+\frac{{{\sigma_{ss}^{*}}}^{2}}{\mu N}}{1+\frac{{{\sigma_{ss}^{*}}}^{2}}{N}}}\simeq 1$ for $N\gg{\sigma_{ss}^{*}}^{2}$ has been used from Equation 111 to Equation 113. In addition, a Taylor expansion with cut-off after the linear term has been applied to $\sqrt{1+\frac{{{\sigma_{ss}^{*}}}^{2}}{\mu N}}$ .

A steady state expression for $-\frac{N\varphi_{x}^{(g)}}{\overline{{\sigma^{(g)}}^{*}}\,\overline{r^{(g)}}}$ is derived next. With Equation 12 and Equation 115,

[TABLE]

can be derived. Use of Equation 107 for the fraction $\left(\frac{x}{r}\right)_{ss}$ results in

[TABLE]

Similarly, a steady state expression for $-\frac{N\varphi_{r^{2}}^{(g)}}{2{\overline{{\sigma^{(g)}}^{*}}}\,{{\overline{r^{(g)}}}^{2}}}$ can be derived. Considering the infeasible case (because in the steady state $P_{\text{feas}}\simeq 0$ ) of Equation 59, we have

[TABLE]

According to 21 and 4 of Algorithm 1, $\mathrm{E}\left[{\langle q\rangle}\right]=\mathrm{E}\left[x^{(g+1)}\right]=\overline{x^{(g+1)}}$ . Hence, Equation 118 can be rewritten using Equation 15 for the infeasible $\langle q\rangle$ case $\mathrm{E}\left[{\langle q\rangle}_{\text{infeas}}\right]$ and Equation 107 for $\left(\frac{x^{2}}{\xi r^{2}}\right)_{ss}$ , resulting in

[TABLE]

Using Equation 117 and Equation 119, steady state expressions for Equations 90 to 100 can be derived. Requiring $\overline{s_{1}^{(g+1)}}=\overline{s_{1}^{(g)}}={s_{1}}_{ss}$ in Equation 90 using Equation 117 yields

[TABLE]

Analogously, requiring $\overline{s_{\odot}^{(g+1)}}=\overline{s_{\odot}^{(g)}}={s_{\odot}}_{ss}$ in Equation 93 using Equation 119 results in

[TABLE]

In the same way, setting $\overline{||\mathbf{s}^{(g+1)}||^{2}}=\overline{||\mathbf{s}^{(g)}||^{2}}=||\mathbf{s}||^{2}_{ss}$ in Equation 96 using Equation 117 and Equation 119 gives

[TABLE]

For the mutation strength,

[TABLE]

follows from 20 of Algorithm 1 with the use of Equation 14. Rewriting Equation 123 and using Equation 42 together with Equation 43 for the fraction $r^{(g)}/r^{(g+1)}$ , we have

[TABLE]

Use of the Taylor expansion $\exp(x)\simeq 1+x$ (around zero and neglecting terms of quadratic and higher order) results in

[TABLE]

Computing the expectation of Equation 126 and requiring $\overline{{\sigma^{(g+1)}}^{*}}=\overline{{\sigma^{(g)}}^{*}}=\sigma_{ss}^{*}$ , we get

[TABLE]

Usage of Equation 63 together with the steady state expression derived in Equation 119 for $\mathrm{E}[z_{\odot}]$ results in

[TABLE]

Consideration of Equations 115, 120, 121 and 122 allows numerically solving Equation 129 for $\sigma_{ss}^{*}$ .

3.2.2 Derivation of Closed-Form Approximations for the Steady State

with the Assumptions $c=O\left(\frac{1}{\sqrt{N}}\right)$ and $N\rightarrow\infty$

The goal of this section is to simplify the expressions derived in Section 3.2.1 further using additional asymptotic assumptions in order to arrive at closed-form steady state approximations.

The expression derived for $\left(-\frac{N{\varphi_{r^{2}}}}{2{\sigma^{*}}{{r}^{2}}}\right)_{ss}$ as Equation 119 is simplified further yielding

[TABLE]

In Equation 131, ${\sigma^{*}}_{ss}N\gg{{\varphi_{x}^{*}}_{ss}}^{2}$ has been assumed and therefore the second summand has been neglected.

Insertion of Equation 131 into Equation 129 replacing $-\frac{N}{2\sigma_{ss}^{*}}\left[1-\left(1-\frac{{\varphi_{x}^{*}}_{ss}}{N}\right)^{2}\right]{}$ yields (after simplification)

[TABLE]

for the steady state mutation strength equation. Equation 131 can also be inserted into Equation 121 replacing $-\frac{N}{2\sigma_{ss}^{*}}\left[1-\left(1-\frac{{\varphi_{x}^{*}}_{ss}}{N}\right)^{2}\right]{}$ . This results in

[TABLE]

With the assumptions $N\rightarrow\infty$ and $c=O\left(\frac{1}{\sqrt{N}}\right)$ , the expression $\frac{{\varphi_{x}^{*}}_{ss}}{N}+\frac{{\sigma_{ss}^{*}}^{2}}{2\mu N}-\frac{c{\varphi_{x}^{*}}_{ss}}{N}-\frac{c{\sigma_{ss}^{*}}^{2}}{2\mu N}$ is an order of magnitude smaller than c and can therefore be neglected w.r.t. c. Hence, Equation 134 simplifies to

[TABLE]

Similarly, Equation 131 inserted into Equation 122 replacing $-\frac{N}{2\sigma_{ss}^{*}}\left[1-\left(1-\frac{{\varphi_{x}^{*}}_{ss}}{N}\right)^{2}\right]{}$ results in

[TABLE]

Insertion of Equation 135 and Equation 120 into Equation 136 yields (after straight-forward simplification)

[TABLE]

$\xi\left(\frac{1+\frac{{\sigma_{ss}^{*}}^{2}}{N}}{1+\frac{{\sigma_{ss}^{*}}^{2}}{\mu N}}\right)\simeq\xi$ for $N\rightarrow\infty$ allows writing

[TABLE]

From Equation 138 to Equation 139, ${\varphi_{x}^{*}}_{ss}$ has been substituted by Equation 115, its square has been calculated, and the resulting expression has been simplified.

With this, insertion of Equation 115 and Equation 139 into Equation 132 yields the quadratic equation

[TABLE]

for the steady state normalized mutation strength equation. By solving Equation 140 for the positive root (because $\sigma_{ss}^{*}>0$ ) with subsequent simplification of the result we get

[TABLE]

as an asymptotic ( $N\rightarrow\infty$ ) closed-form expression for the steady state normalized mutation strength. Insertion of $c=1/\sqrt{N}$ and $D=1/c=\sqrt{N}$ into Equation 141 results in the expression

[TABLE]

Assuming $N\rightarrow\infty$ and $\frac{\xi}{\sqrt{N}}\rightarrow 0$ allows a further asymptotic simplification of Equation 142 (neglecting $\frac{1}{\sqrt{N}}$ , $\frac{\xi}{N}$ , $\frac{1}{N}$ , $\frac{2\xi}{\sqrt{N}}$ , and $\frac{\xi}{\sqrt{N}}$ ) resulting in

[TABLE]

For sufficiently large $\xi$ , $\sqrt{\xi+1}\simeq\sqrt{\xi+2}$ , and Equation 143 writes $\sigma_{ss}^{*}\simeq 2\mu c_{\mu/\mu,\lambda}$ . Back-insertion of Equation 141 (or Equation 143) into Equations 107, 115, 120, 121 and 122 allows calculating the steady state distance from the cone boundary, the normalized steady state progress, ${s_{1}}_{ss}$ , ${s_{\odot}}_{ss}$ , and $||\mathbf{s}||^{2}_{ss}$ .

Figure 4 shows plots of the steady state computations. Results computed by Equation 141 have been compared to real ES runs. The values for the points denoting the approximations have been determined by computing the normalized steady state mutation strength $\sigma_{ss}^{*}$ using Equation 141 for different values of $\xi$ . The results for $\varphi_{x}^{*}$ and $\varphi_{r}^{*}$ have been determined by using the computed steady state $\sigma_{ss}^{*}$ values with Equation 115. The approximations for $\left(\frac{x}{\sqrt{\xi}r}\right)_{ss}$ have been determined by evaluating Equation 107. The values for the points denoting the experiments have been determined by computing the averages of the particular values in real ES runs. The figures show that the derived expressions get better for larger values of $\xi$ and $N$ . Again, the deviations for small $\xi$ are due to approximations in the derivation of the local progress rates. The deviations for small $N$ stem from the use of asymptotic assumptions $N\rightarrow\infty$ .

3.2.3 Derivation of Closed-Form Approximations for the Steady State

with the Assumptions $c=O\left(\frac{1}{N}\right)$ and $N\rightarrow\infty$

In Section 3.2.2 it has been assumed that $c=O(\frac{1}{\sqrt{N}})$ from Equation 134 to Equation 135. This section presents a derivation for the case $c=O(\frac{1}{N})$ . To this end, Equation 133 is rewritten to

[TABLE]

With $c=O(\frac{1}{N})$ , we have $\frac{1}{c}\gg 1$ . This together with the assumption $|{\varphi_{x}^{*}}_{ss}|\ll\frac{{\sigma_{ss}^{*}}^{2}}{2\mu}$ allows rewriting Equation 144 to

[TABLE]

With the additional assumption $2\mu\ll\frac{{\sigma_{ss}^{*}}^{2}}{cN}$ , Equation 145 simplifies to

[TABLE]

Insertion of Equation 146 and Equation 120 into Equation 136 yields

[TABLE]

In the step from Equation 147 to Equation 148, $\xi\left(\frac{1+\frac{{\sigma_{ss}^{*}}^{2}}{N}}{1+\frac{{\sigma_{ss}^{*}}^{2}}{\mu N}}\right)\simeq\xi$ for $N\rightarrow\infty$ has been used.

Insertion of Equation 115 and Equation 148 into Equation 132 yields

[TABLE]

for the steady state mutation strength equation. By assuming $c\ll 1$ , Equation 149 simplifies. Together with grouping the powers of $\sigma_{ss}^{*}$ it writes

[TABLE]

Introducing common denominators allows rewriting Equation 150 to

[TABLE]

Simplification of Equation 151 using $cD=1$ and $cN\simeq 1$ results in

[TABLE]

Note that multiplying Equation 152 by $\sigma_{ss}^{*}$ results in a cubic equation that can be solved. However, the expressions for the closed-form solutions are rather long. Hence, a quadratic equation is aimed for. To this end, Equation 152 is approximated quadratically. Neglecting ${\sigma_{ss}^{*}}^{-1}\left(\frac{\mu c_{\mu/\mu,\lambda}}{\sqrt{1+\xi}}\right)$ and ${\sigma_{ss}^{*}}\left(\frac{-c_{\mu/\mu,\lambda}}{\sqrt{1+\xi}(1+\xi)}\right)$ in Equation 152 results in333Plots of further approximations are presented in the supplementary material (Appendix B).

[TABLE]

Solving Equation 153 for the positive root with subsequent simplification yields

[TABLE]

Figure 5 shows plots of the steady state computations. Results computed by Equation 154 have been compared to real ES runs. The values for the points denoting the approximations have been determined by computing the normalized steady state mutation strength $\sigma_{ss}^{*}$ using Equation 154 for different values of $\xi$ . The results for $\varphi_{x}^{*}$ and $\varphi_{r}^{*}$ have been determined by using the computed steady state $\sigma_{ss}^{*}$ values with Equation 115. The approximations for $\left(\frac{x}{\sqrt{\xi}r}\right)_{ss}$ have been determined by evaluating Equation 107. The values for the points denoting the experiments have been determined by computing the averages of the particular values in real ES runs.

4 Conclusions

In this work, the $(\mu/\mu_{I},\lambda)$ -CSA-ES has been theoretically analyzed. For this, a mean value iterative system has been introduced and compared to real ES runs. Based on this derived system, steady state expressions have been derived and compared to ES simulations.

The comparison of the mean value iterative system summarized in Section 3.1.6 with real ES runs shows a satisfactory agreement of the theory and simulations for large $\xi$ and large $N$ (see Figure 3). The deviations for small $N$ are due to the asymptotic assumptions $N\rightarrow\infty$ that are used in the derivations of the microscopic and macroscopic aspects of the ES. They are used to simplify the expressions and thus make a theoretical analysis tractable. The deviations for small $\xi$ stem from the derivation of the offspring cumulative distribution function after the projection step in $x_{1}$ -direction $P_{Q}(q)$ (for the details, it is referred to Section 3.1.2.1.2.3 in Spettel and Beyer, 2018c , in particular to the step from Equation (3.73) to Equation (3.74)). The same observations regarding the deviations can be made for the derived steady state expressions (see Figures 4 and 5).

For the steady state derivations, it is of particular interest to compare the results obtained in this work for the CSA-ES with the results obtained for the $\sigma$ SA-ES. The $(\mu/\mu_{I},\lambda)$ - $\sigma$ SA-ES has been theoretically analyzed by Spettel and Beyer, 2018b applied to the same conically constrained problem. In that work, the microscopic and macroscopic aspects of the $(\mu/\mu_{I},\lambda)$ - $\sigma$ SA-ES have been investigated. For the microscopic aspects, expressions for the local progress for $x$ and $r$ and the self-adaptation response (SAR) function have been derived using asymptotic assumptions. Those results have then been used for the macroscopic analysis. The mean value dynamics generated by iteration using those local measures have been compared to real runs. In addition, steady state expressions have been derived and discussed. They show that the $\sigma$ SA-ES is able to achieve sufficiently high mutation strengths to keep the progress almost constant for increasing $\xi$ . Surprisingly, for the CSA-ES, the choice of the cumulation parameter $c$ has a qualitative influence on the behavior.

Considering the choice of $c=1/\sqrt{N}$ proposed in early publications on CMA-ES (Hansen and Ostermeier,, 1997), the steady state mutation strengths attained flatten with increasing $\xi$ . As a consequence, the steady state progress decreases with higher values of $\xi$ . This can be seen by considering Equation 143 that leads to $\sigma_{ss}^{*}\simeq 2\mu c_{\mu/\mu,\lambda}$ for sufficiently large $\xi$ . For $\mu=3$ and $\lambda=10$ , this results in a steady state normalized mutation strength of approximately $6.39$ . Note that this value corresponds to the approximations for the larger values of $\xi$ shown in Figure 4 (right-most column, $N=10000$ ). Equation 143 can be inserted into the steady state progress rate (Equation 115) yielding

[TABLE]

From the simplified result of Equation 155 one immediately notices that $\varphi^{*}_{ss}\rightarrow 0$ for $\xi\rightarrow\infty$ (respecting $\frac{\xi}{\sqrt{N}}\rightarrow 0$ that was used in the derivations leading to Equation 143). This is exactly what one sees in Figure 4, the stationary state progress decreases with increasing $\xi$ .

In contrast, for the case $c=O\left(\frac{1}{N}\right)$ that is proposed in newer publications ( $\frac{4}{N+4}$ by Hansen and Ostermeier, (2001) or $\frac{\mu+2}{N+\mu+5}$ by Hansen, (2016), both of which are in $O\left(\frac{1}{N}\right)$ for $\mu\ll N$ ), the steady state mutation strength increases with increasing $\sigma_{ss}^{*}$ . It is therefore able to achieve a constant progress rate for increasing $\xi$ . The steady state progress is less than that of the $\sigma$ SA-ES. Due to the increase of $\xi$ with increasing $\sigma_{ss}^{*}$ , the increasing deviations of the approximation from the simulations can be explained. In the derivations leading to Equation 44, it has been assumed that $\mu N\gg{{\sigma^{(g)}}^{*}}^{2}$ . As the steady state $\sigma^{*}$ increases with $\xi$ , $N$ must be increased in order to have the same approximation quality for higher values of $\xi$ . This can be explained more formally. Assuming large $\xi$ , $(1+\xi)/(2+\xi)\simeq 1$ holds in Equation 154. Hence,

[TABLE]

follows, which - for large $\xi$ - is of the same order as the one of the $\sigma$ SA-ES (see Eq. (62) in Spettel and Beyer, 2018b ). While it is common practice to use $c=O\left(\frac{1}{N}\right)$ since the seminal CMA-ES paper (Hansen and Ostermeier,, 2001), this is the first theoretical result that shows an advantage of the $O\left(\frac{1}{N}\right)$ choice.

A further aspect for discussion is the special case of the non-recombinative $(1,\lambda)$ -CSA-ES that is contained in the derivations for the $(\mu/\mu_{I},\lambda)$ -CSA-ES. The iterative system for the non-recombinative ( $\mu=1$ ) case can be derived analogously to the multi-recombinative ( $\mu>1$ ) case. The resulting equations differ in the expressions for $\varphi_{x}^{(g)}$ , $\varphi_{r}^{(g)}$ , and $\varphi_{r^{2}}^{(g)}$ . It has been investigated further (not shown here) and for $N^{2}(1+1/\xi)\gg{{\sigma^{(g)}}^{*}}^{2}$ the mean value iterative systems of the $(1,\lambda)$ -CSA-ES and the $(\mu/\mu_{I},\lambda)$ -CSA-ES agree. An interesting observation between the case $\mu=1$ and the case $\mu>1$ is the evolution near the cone boundary. Whereas the CSA-ES with $\mu=1$ evolves on the boundary, the CSA-ES with $\mu>1$ attains a certain steady state distance from the boundary (cf. the bottom subfigures of Figure 4 and Figure 5). Considering a parental individual on the cone boundary, the offspring are infeasible with overwhelming probability for sufficiently large $N$ . Hence, they are repaired by projection and are on the boundary after projection. In particular, the best of them is on the boundary. Therefore, for $\mu=1$ , the ES evolves on the boundary. For $\mu>1$ , the centroid computation after projection results in offspring that are inside the feasible region.

To conclude the paper, topics for future work are outlined. In addition to the $\sigma$ SA and the CSA for the $\sigma$ control mechanism, it is of interest to investigate the behavior of Meta-ESs applied to the conically constrained problem. Comparison of the repair by projection approach with other repair methods is another topic for further research. Analysis of ESs applied to other constrained problems is another research direction for the future.

Acknowledgments

This work was supported by the Austrian Science Fund FWF under grant P29651-N32.

Appendix A Additional Results Comparing the Derived Approximations with

Simulations

Figures 6 to 11 show the mean value dynamics of the $(3/3_{I},10)$ -CSA-ES applied to the conically constrained problem with different parameters as indicated in the title of the subplots. The plots are organized into three rows and two columns. The first two rows show the $x$ (first row, first column), $r$ (first row, second column), $\sigma$ (second row, first column), and $\sigma^{*}$ (second row, second column) dynamics. The third row shows $x$ and $r$ converted into each other by $\sqrt{\xi}$ . The third row shows that after some initial phase, the ES transitions into a stationary state. In this steady state, the ES moves along the cone boundary. This becomes clear in the plots because the equation for the cone boundary is $r=x/\sqrt{\xi}$ or equivalently $x=r\sqrt{\xi}$ . In the first two rows, the solid line has been generated by averaging $100$ real runs of the ES. The dashed line has been determined by iterating the mean value iterative system of Section 3.1.6 with one-generation experiments for ${\varphi^{(g)}_{x}}$ , ${\varphi^{(g)}_{x}}^{*}$ , ${\varphi^{(g)}_{r}}$ , ${\varphi^{(g)}_{r}}^{*}$ , and $\varphi_{r^{2}}^{(g)}$ . The dotted lines have been computed by iterating the mean value iterative system with the derived approximations as indicated in the derivations leading to the equations in Section 3.1.6 for ${\varphi^{(g)}_{x}}$ , ${\varphi^{(g)}_{x}}^{*}$ , ${\varphi^{(g)}_{r}}$ , ${\varphi^{(g)}_{r}}^{*}$ , and $\varphi_{r^{2}}^{(g)}$ . Due to the approximations used it is possible that in a generation $g$ , the iteration of the mean value iterative system yields infeasible $(x^{(g)},r^{(g)})^{T}$ . In such cases, the particular $(x^{(g)},r^{(g)})^{T}$ have been projected back and projected values have been used in the further iterations.

Appendix B Further Investigations Considering the Derivation of Closed-Form

Approximations for the Steady State with the Assumptions $c=O\left(\frac{1}{N}\right)$ and $N\rightarrow\infty$

By plotting the left-hand side of (152) for different parameters, one observes that for the values of interest ( ${\sigma_{ss}^{*}}>0$ ), the function is quadratic.

Inspired by that, a Taylor expansion around $a>0$ is performed up to and including the quadratic term. It results in a quadratic equation that can be solved for $\sigma_{ss}^{*}>0$ . Figure 12 shows plots of the steady state computations with this approximation compared to real ES runs. The values for the points denoting the approximations have been determined by computing the normalized steady state mutation strength $\sigma_{ss}^{*}$ using the solution of the mentioned quadratic equation that has been derived by a Taylor expansion. The results for $\varphi_{x}^{*}$ and $\varphi_{r}^{*}$ have been determined by using the computed steady state $\sigma_{ss}^{*}$ values with Equation 115. The approximations for $\left(\frac{x}{\sqrt{\xi}r}\right)_{ss}$ have been determined by evaluating Equation 107. The values for the points denoting the experiments have been determined by computing the averages of the particular values in real ES runs.

Neglecting terms already in Equation 152 is another approach to arrive at a simpler approximate form. Neglecting ${\sigma_{ss}^{*}}^{-1}\left(\frac{\mu c_{\mu/\mu,\lambda}}{\sqrt{1+\xi}}\right)$ in Equation 152, results in

[TABLE]

Solving Equation B.1 for the positive root yields

[TABLE]

Figure 13 shows plots of the steady state computations. Results computed by Equation B.2 have been compared to real ES runs. The values for the points denoting the approximations have been determined by computing the normalized steady state mutation strength $\sigma_{ss}^{*}$ using Equation B.2 for different values of $\xi$ . The results for $\varphi_{x}^{*}$ and $\varphi_{r}^{*}$ have been determined by using the computed steady state $\sigma_{ss}^{*}$ values with Equation 115. The approximations for $\left(\frac{x}{\sqrt{\xi}r}\right)_{ss}$ have been determined by evaluating Equation 107. The values for the points denoting the experiments have been determined by computing the averages of the particular values in real ES runs.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Arnold, D. V. (2011 a). Analysis of a repair mechanism for the (1, λ 𝜆 \lambda )-ES applied to a simple constrained problem. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation , pages 853–860. ACM.
2(2) Arnold, D. V. (2011 b). On the behaviour of the (1, λ 𝜆 \lambda )-ES for a simple constrained problem. In Proceedings of the 11th Workshop Proceedings on Foundations of Genetic Algorithms , pages 15–24. ACM.
3Arnold, (2013) Arnold, D. V. (2013). On the behaviour of the (1, λ 𝜆 \lambda )-ES for a conically constrained problem. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation , pages 423–430. ACM.
4Arnold and Porter, (2015) Arnold, D. V. and Porter, J. (2015). Towards an augmented Lagrangian constraint handling approach for the (1+1)-ES. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation , GECCO ’15, pages 249–256, New York, NY, USA. ACM.
5Atamna et al., (2016) Atamna, A., Auger, A., and Hansen, N. (2016). Augmented Lagrangian constraint handling for CMA-ES - case of a single linear constraint. In International Conference on Parallel Problem Solving from Nature , pages 181–191. Springer.
6Atamna et al., (2017) Atamna, A., Auger, A., and Hansen, N. (2017). Linearly convergent evolution strategies via augmented Lagrangian constraint handling. In Proceedings of the 14th ACM/SIGEVO Conference on Foundations of Genetic Algorithms , pages 149–161. ACM.
7Beyer, (2001) Beyer, H.-G. (2001). The Theory of Evolution Strategies . Natural Computing Series. Springer.
8Hansen, (2016) Hansen, N. (2016). The CMA evolution strategy: A tutorial. ar Xiv:1604.00772 [cs.LG]. Available online: http://arxiv.org/abs/1604.00772 (accessed Jun 07, 2017).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Analysis of the (μ/μI,λ)(\mu/\mu_{I},\lambda)(μ/μI​,λ)-CSA-ES

Abstract

1 Introduction

2 Problem and Algorithm

3 Theoretical Analysis

3.1 Derivation of a Mean Value Iterative System for Modeling the

3.1.1 Derivation of Mean Value Difference Equations for xxx

3.1.2 Derivation of a Mean Value Difference Equation for s1s_{1}s1​

3.1.3 Derivation of a Mean Value Difference Equation for s⊙s_{\odot}s⊙​

3.1.4 Derivation of a Mean Value Difference

3.1.5 Derivation of a Mean Value Difference

3.1.6 Summary of the Mean Value Difference Equations

3.2 Behavior of the ES in the Steady State

3.2.1 Derivations Towards Closed-Form Steady State Expressions

3.2.2 Derivation of Closed-Form Approximations for the Steady State

3.2.3 Derivation of Closed-Form Approximations for the Steady State

4 Conclusions

Acknowledgments

Appendix A Additional Results Comparing the Derived Approximations with

Appendix B Further Investigations Considering the Derivation of Closed-Form

Analysis of the $(\mu/\mu_{I},\lambda)$ -CSA-ES

3.1.1 Derivation of Mean Value Difference Equations for $x$

3.1.2 Derivation of a Mean Value Difference Equation for $s_{1}$

3.1.3 Derivation of a Mean Value Difference Equation for $s_{\odot}$