Stochastic Multi-objective Optimization on a Budget: Application to   multi-pass wire drawing with quantified uncertainties

Piyush Pandita; Ilias Bilionis; Jitesh Panchal; B.P. Gautham; Amol; Joshi; Pramod Zagade

arXiv:1706.01665·math.OC·June 20, 2019

Stochastic Multi-objective Optimization on a Budget: Application to multi-pass wire drawing with quantified uncertainties

Piyush Pandita, Ilias Bilionis, Jitesh Panchal, B.P. Gautham, Amol, Joshi, Pramod Zagade

PDF

TL;DR

This paper advances Bayesian global optimization for multi-objective problems with uncertainties by reformulating the expected improvement over the dominated hypervolume, enabling efficient optimization without estimating stochastic parameters.

Contribution

It introduces a systematic reformulation of EIHV for stochastic MOO, allowing noise filtering and confidence characterization without stochastic parameter estimation.

Findings

01

Successfully applied to synthetic test problems with known solutions.

02

Demonstrated effectiveness on industrial steel wire drawing process.

03

Enhanced optimization efficiency under parametric uncertainties.

Abstract

Design optimization of engineering systems with multiple competing objectives is a painstakingly tedious process especially when the objective functions are expensive-to-evaluate computer codes with parametric uncertainties. The effectiveness of the state-of-the-art techniques is greatly diminished because they require a large number of objective evaluations, which makes them impractical for problems of the above kind. Bayesian global optimization (BGO), has managed to deal with these challenges in solving single-objective optimization problems and has recently been extended to multi-objective optimization (MOO). BGO models the objectives via probabilistic surrogates and uses the epistemic uncertainty to define an information acquisition function (IAF) that quantifies the merit of evaluating the objective at new designs. This iterative data acquisition process continues until a stopping…

Equations90

O_{i} (x) \geq O_{i} (x^{'}), \forall i = 1, \dots, m .

O_{i} (x) \geq O_{i} (x^{'}), \forall i = 1, \dots, m .

O [X] = {y \in R^{m} : \exists x \in X, y = O (x)},

O [X] = {y \in R^{m} : \exists x \in X, y = O (x)},

O [X] \subset [r, \infty) .

O [X] \subset [r, \infty) .

A [B] := {y \in [r, \infty) : \exists y^{'} \in B, y^{'} \geq y},

A [B] := {y \in [r, \infty) : \exists y^{'} \in B, y^{'} \geq y},

A_{O} := A [O [X]] .

A_{O} := A [O [X]] .

P [B] := {y \in B : {y^{'} \in B : y^{'} \geq y} = \emptyset} .

P [B] := {y \in B : {y^{'} \in B : y^{'} \geq y} = \emptyset} .

P [B] = \partial A [B] ∖ \cup_{i = 1}^{m} {r + t (y \in B max y_{i}) e_{i}},

P [B] = \partial A [B] ∖ \cup_{i = 1}^{m} {r + t (y \in B max y_{i}) e_{i}},

P_{O} := P [O [X]] .

P_{O} := P [O [X]] .

x_{1 : n} = (x_{1}, \dots, x_{n}) \in X^{n},

x_{1 : n} = (x_{1}, \dots, x_{n}) \in X^{n},

y_{1 : n} = (y_{1}, \dots, y_{n}) .

y_{1 : n} = (y_{1}, \dots, y_{n}) .

f^{e} ∣ θ^{e} \sim GP (0, k),

f^{e} ∣ θ^{e} \sim GP (0, k),

f_{1 : n}^{e} ∣ x_{1 : n}, θ^{e} \sim N (0, k (x_{1 : n}, θ^{e})),

f_{1 : n}^{e} ∣ x_{1 : n}, θ^{e} \sim N (0, k (x_{1 : n}, θ^{e})),

k(\mathbf{x},\mathbf{x}^{\prime},\theta^{e})={s^{2}}\Bigg{(}\exp\left\{{-\sqrt{3\sum\limits_{j=1}^{d}{\frac{{{{({x_{j}}-{x_{j}}^{\prime})}^{2}}}}{{\ell_{j}^{2}}}}}}\right\}\Bigg{)}\Bigg{(}1+\sqrt{3\sum\limits_{j=1}^{d}{\frac{{{{({x_{j}}-{x_{j}}^{\prime})}^{2}}}}{{\ell_{j}^{2}}}}}\Bigg{)},

k(\mathbf{x},\mathbf{x}^{\prime},\theta^{e})={s^{2}}\Bigg{(}\exp\left\{{-\sqrt{3\sum\limits_{j=1}^{d}{\frac{{{{({x_{j}}-{x_{j}}^{\prime})}^{2}}}}{{\ell_{j}^{2}}}}}}\right\}\Bigg{)}\Bigg{(}1+\sqrt{3\sum\limits_{j=1}^{d}{\frac{{{{({x_{j}}-{x_{j}}^{\prime})}^{2}}}}{{\ell_{j}^{2}}}}}\Bigg{)},

p (y_{1 : n} ∣ x_{1 : n}, θ^{e}) = N (y_{1 : n} ∣0, k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}),

p (y_{1 : n} ∣ x_{1 : n}, θ^{e}) = N (y_{1 : n} ∣0, k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}),

f^{e} ∣ x_{1 : n}, y_{1 : n}, θ^{e} \sim GP (μ_{n}, k_{n}),

f^{e} ∣ x_{1 : n}, y_{1 : n}, θ^{e} \sim GP (μ_{n}, k_{n}),

μ_{n} (x; θ^{e}) = k_{n} (x, x_{1 : n}, θ^{e}) [k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}]^{- 1} y_{1 : n},

μ_{n} (x; θ^{e}) = k_{n} (x, x_{1 : n}, θ^{e}) [k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}]^{- 1} y_{1 : n},

\begin{array}[]{ccc}k_{n}(\mathbf{x},\mathbf{x}^{\prime},\theta^{e})&=&k(\mathbf{x},\mathbf{x}^{\prime},\theta^{e})\\ &&-k_{n}(\mathbf{x},\mathbf{x}_{1:n},\theta^{e})\left[k(\mathbf{x}_{1:n},\theta^{e})+\sigma^{2}I_{n}\right]^{-1}k_{n}(\mathbf{x}_{1:n},\mathbf{x},\theta^{e})\\ \end{array}

\begin{array}[]{ccc}k_{n}(\mathbf{x},\mathbf{x}^{\prime},\theta^{e})&=&k(\mathbf{x},\mathbf{x}^{\prime},\theta^{e})\\ &&-k_{n}(\mathbf{x},\mathbf{x}_{1:n},\theta^{e})\left[k(\mathbf{x}_{1:n},\theta^{e})+\sigma^{2}I_{n}\right]^{-1}k_{n}(\mathbf{x}_{1:n},\mathbf{x},\theta^{e})\\ \end{array}

f^{e} (x) ∣ x_{1 : n}, y_{1 : n}, θ^{e} \sim N (μ_{n} (x; θ^{e}), σ_{n}^{2} (x; θ^{e})),

f^{e} (x) ∣ x_{1 : n}, y_{1 : n}, θ^{e} \sim N (μ_{n} (x; θ^{e}), σ_{n}^{2} (x; θ^{e})),

L (θ^{e}) = - \frac{1}{2} y_{1 : n}^{T} [k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}]^{- 1} y_{1 : n}

L (θ^{e}) = - \frac{1}{2} y_{1 : n}^{T} [k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}]^{- 1} y_{1 : n}

- \frac{1}{2} lo g det [k (x_{1 : n}, θ^{e}) + ν^{2} I_{n}] - \frac{n}{2} lo g 2 π .

a_{n}^{e} (y) := P^{e} [{ω^{e} \in Ω^{e} : y \in A [f_{ω^{e}}^{e} [X]]} ∣ x_{1 : n}, y_{1 : n}],

a_{n}^{e} (y) := P^{e} [{ω^{e} \in Ω^{e} : y \in A [f_{ω^{e}}^{e} [X]]} ∣ x_{1 : n}, y_{1 : n}],

Q_{n, β}^{e} := {y \in [r, \infty) : a_{n}^{e} (y) \geq β},

Q_{n, β}^{e} := {y \in [r, \infty) : a_{n}^{e} (y) \geq β},

λ (Q_{n, β}^{e}) \leq E^{e} [λ (A [f^{e} [X]]) ∣ x_{1 : n}, y_{1 : n}] \leq λ (Q_{n, β^{*}}^{e}), \forall β \in [β^{*}, 1],

λ (Q_{n, β}^{e}) \leq E^{e} [λ (A [f^{e} [X]]) ∣ x_{1 : n}, y_{1 : n}] \leq λ (Q_{n, β^{*}}^{e}), \forall β \in [β^{*}, 1],

Q_{n, β^{*}}^{e} △ A [f^{e} [X]] := (Q_{n, β^{*}}^{e} \cup A [f^{e} [X]]) ∖ (Q_{n, β^{*}}^{e} \cap A [f^{e} [X]]) .

Q_{n, β^{*}}^{e} △ A [f^{e} [X]] := (Q_{n, β^{*}}^{e} \cup A [f^{e} [X]]) ∖ (Q_{n, β^{*}}^{e} \cap A [f^{e} [X]]) .

d_{n}^{e} (y) = P^{e} [y \in Q_{n, β^{*}}^{e} △ A [f^{e} [X]] ∣ x_{1 : n}, y_{1 : n}] .

d_{n}^{e} (y) = P^{e} [y \in Q_{n, β^{*}}^{e} △ A [f^{e} [X]] ∣ x_{1 : n}, y_{1 : n}] .

\tilde{f}_{s, i, 1 : \tilde{n}}^{e} ∣ \tilde{x}_{s, 1 : \tilde{n}}, x_{1 : n}, y_{i, 1 : n} \sim N (μ_{i, n} (\tilde{x}_{s, 1 : \tilde{n}}), k_{i, n} (\tilde{x}_{s, 1 : \tilde{n}})),

\tilde{f}_{s, i, 1 : \tilde{n}}^{e} ∣ \tilde{x}_{s, 1 : \tilde{n}}, x_{1 : n}, y_{i, 1 : n} \sim N (μ_{i, n} (\tilde{x}_{s, 1 : \tilde{n}}), k_{i, n} (\tilde{x}_{s, 1 : \tilde{n}})),

\tilde{a}_{S, \tilde{n}, n}^{e} (y) = \frac{1}{S} s = 1 \sum S 1_{A [\tilde{F}_{s}^{e}]} (y),

\tilde{a}_{S, \tilde{n}, n}^{e} (y) = \frac{1}{S} s = 1 \sum S 1_{A [\tilde{F}_{s}^{e}]} (y),

\tilde{d}_{S, \tilde{n}, n}^{e} (y) = \frac{1}{S} s = 1 \sum S 1_{\tilde{Q}_{S, \tilde{n}, n, β^{*}} △ A [\tilde{F}_{s}^{e}]} (y),

\tilde{d}_{S, \tilde{n}, n}^{e} (y) = \frac{1}{S} s = 1 \sum S 1_{\tilde{Q}_{S, \tilde{n}, n, β^{*}} △ A [\tilde{F}_{s}^{e}]} (y),

S \to \infty lim \tilde{n} \to \infty lim \tilde{a}_{S, \tilde{n}, n}^{e} = a_{n}^{e},

S \to \infty lim \tilde{n} \to \infty lim \tilde{a}_{S, \tilde{n}, n}^{e} = a_{n}^{e},

S \to \infty lim \tilde{n} \to \infty lim \tilde{d}_{S, \tilde{n}, n}^{e} = d_{n}^{e} .

S \to \infty lim \tilde{n} \to \infty lim \tilde{d}_{S, \tilde{n}, n}^{e} = d_{n}^{e} .

\begin{array}[]{cccl}\operatorname{EEIHV}(\mathbf{x})&=&\mathbb{E}^{e}\Big{[}&\mathbb{E}^{e}\big{[}\lambda\left(A[\mathbf{f}^{e}[X]]\right)\big{|}\mathbf{x},\mathbf{y},\mathbf{x}_{1:n},\mathbf{y}_{1:n}\big{]}\\ &&&-\mathbb{E}^{e}\big{[}\lambda\left(A[\mathbf{f}^{e}[X]\right)|\mathbf{x}_{1:n},\mathbf{y}_{1:n}\big{]}\Big{|}\mathbf{x},\mathbf{x}_{1:n},\mathbf{y}_{1:n}\Big{]},\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Stochastic Multi-objective Optimization on a Budget: Application to multi-pass wire drawing with quantified uncertainties

Piyush Pandita1, Ilias Bilionis1,*, Jitesh Panchal1, B.P. Gautham2, Amol Joshi2, Pramod Zagade2,

1 School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907

2 Tata Research Development and Design Centre, Tata Consultancy Services, Pune, India

[email protected]

Abstract

*Design optimization of engineering systems with multiple competing objectives is a painstakingly tedious process especially when the objective functions are expensive-to-evaluate computer codes with parametric uncertainties. The effectiveness of the state-of-the-art techniques is greatly diminished because they require a large number of objective evaluations, which makes them impractical for problems of the above kind. Bayesian global optimization (BGO), has managed to deal with these challenges in solving single-objective optimization problems and has recently been extended to multi-objective optimization (MOO). BGO models the objectives via probabilistic surrogates and uses the epistemic uncertainty to define an information acquisition function (IAF) that quantifies the merit of evaluating the objective at new designs. This iterative data acquisition process continues until a stopping criterion is met. The most commonly used IAF for MOO is the expected improvement over the dominated hypervolume (EIHV) which in its original form is unable to deal with parametric uncertainties or measurement noise. In this work, we provide a systematic reformulation of EIHV to deal with stochastic MOO problems. The primary contribution of this paper lies in being able to filter out the noise and reformulate the EIHV without having to observe or estimate the stochastic parameters. An addendum of the probabilistic nature of our methodology is that it enables us to characterize our confidence about the predicted Pareto front. We verify and validate the proposed methodology by applying it to synthetic test problems with known solutions. We demonstrate our approach on an industrial problem of die pass design for a steel wire drawing process. *

1 Introduction

The goal of this paper is to derive a sequential information acquisition methodology that aims at efficiently discovering the Pareto set of a stochastic MOO problem. Stochastic MOOs are characterized by uncertain objective measurements, i.e., for a fixed design, repeated measurements of the objectives may vary. When the objectives are the outcomes of an experiment, this randomness may be due to manufacturing imperfections, operational uncertainties, wear and tear of the specimen, sensor malfunction, etc. When the objectives depend on a simulation model, then this randomness may be induced by uncertainty in the model parameters, e.g., boundary/initial conditions, parameters of constitutive relations, or artifact geometries. In the latter case, the designer chooses probability distributions for all uncertain parameters in an effort to accurately describe their state of knowledge about the artifact.

MOO techniques based on evolutionary algorithms [8], e.g., the strength Pareto evolutionary algorithm [32], the non-dominated sorting genetic algorithm II (NSGA-II) [9], require a significant number of objective evaluations, especially when coupled with a sample average approximation [13] to estimate the stochastic objectives. Other popular techniques like goal programming [5, 20] that involve a slight modification of the original MOO objectives face shortcomings [31] like selecting the relative importance of the objectives, or requiring the designer to have prior information about discontinuities in the objective space.

Bayesian global optimization (BGO) [22, 17] is a class of black-box optimization algorithms that can operate under a limited objective evaluation budget. BGO models the objectives using probabilistic surrogates, e.g., Gaussian process regression, and exploits the epistemic uncertainty to select which experiments/simulations to perform. The latter is typically done by maximizing an information acquisition function (IAF) which quantifies the value of evaluating the objective at a specific design. The choice of the IAF depends on the details of the underlying optimization task. One of the most popular IAFs is the expected improvement (EI) [21, 17, 14, 12, 3]. The EI balances the exploration-exploitation trade-off better than other popular IAFs such as the probability of improvement (PI) or the upper confidence bound (UCB) [16]. Keane [18] extended the original version of EI to MOO by deriving the expected improvement over the dominated hypervolume (EIHV). The EIHV evaluates the expected improvement in the volume of the attained set induced by a hypothetical observation at an untried design. [10] derived a closed form representation which made the evaluation of EIHV computationally efficient. Research in EIHV has been gaining momentum over the past few years [30, 28, 11], but it has not yet been extended to cover the case of stochastic multi-objective optimization.

In this work, we propose an extension to the EIHV suitable for stochastic MOO, which is the main contribution of this paper. We will be referring to the proposed methodology as the extended EIHV (EEIHV). Our proposal is a generalization of the extended expected improvement (EEI) which we developed in [24] to deal with stochastic single-objective optimization. The methodology relies on building probabilistic surrogates of the objectives and uses the EEIHV IAF to quantify the merit of evaluating the expensive stochastic computer code at a new design. We leverage the work done in [2] to quantify our uncertainty about the estimated PF at each stage/iteration.

We apply the above methodology to solve a multi-pass steel wire manufacturing problem under uncertainty. The competing objectives in this problem are the ultimate tensile strength (UTS) and the strain non-uniformity factor (SNUF) of the drawn wire. A finite element (FE) solver (developed at Tata Consulatancy Services (TCS), Pune, India) generates these objectives. The reduction ratios and the die angles at each pass are the design/process variables which have associated uncertainties due to unavoidable manufacturing tolerances as well as die wear during the process.

The outline of the paper is as follows. We start Sec. 2 by providing the mathematical definition of the stochastic MOO optimization problem that we are studying. In Sec. 2.1, we introduce Gaussian process regression (GPR) which is used to construct the probabilistic surrogates of the map between the design variables and the objectives. In Sec. 2.3, we derive our extension to EIHV suitable for stochastic multi-objective optimization. Our numerical results are presented in Sec. 3. In particular, in Sec. 3.2 and 3.3, we validate our approach using two synthetic stochastic MOO problems with known analytical expressions, and we experiment with varying levels of stochasticity (to represent noisy measurements). In Sec. 3.4, we apply our methodology to solve the wire drawing problem. We present our conclusions in Sec. 4.

2 Methodology

Let $X$ denote the set of feasible designs and $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space. We assume that $X$ is a closed and bounded set of a Euclidean space. We have $m$ stochastic quantities of interest (QoIs) which we represent as Borel-measurable functions $o_{i}:X\times\Omega\rightarrow\mathbb{R},i=1,\dots,m$ . Our goal is to find designs $\mathbf{x}\in X$ that maximize the expectations of these QoIs over $\omega\in\Omega$ , i.e., we wish to maximize $O_{i}(\mathbf{x}):=\mathbb{E}[o_{i}(\mathbf{x},\omega)]:=\int o_{i}(\mathbf{x},\omega)d\mathbb{P}(\omega)$ . We say that $\mathbf{x}\in X$ dominates $\mathbf{x}^{\prime}\in X$ , and write $\mathbf{x}\succcurlyeq\mathbf{x}^{\prime}$ , if and only if

[TABLE]

We say that $\mathbf{x}$ strictly dominates $\mathbf{x}^{\prime}$ , and write $\mathbf{x}\succ\mathbf{x}^{\prime}$ if and only if $\mathbf{x}\succcurlyeq\mathbf{x}^{\prime}$ and there exists $i\in\{1,\dots,m\}$ such that $\mathbb{E}[o_{i}(\mathbf{x},\omega)]>\mathbb{E}[o_{i}(\mathbf{x}^{\prime},\omega)]$ .

We wish to characterize the set of optimal designs, otherwise known as the Pareto-efficient frontier, induced by the preference relation ‘ $\succcurlyeq$ ’. In words, the Pareto-efficient frontier, $P_{O}$ , is the set of achievable objectives that are not dominated. Since $P_{O}$ has Lebesgue measure zero, working with it directly is problematic. Instead, we will work with the attained set, $A_{O}$ , which is defined as the set of achievable objectives that are strictly dominated. $P_{O}$ is simply part of the boundary of $A_{O}$ .

We now proceed to the exact mathematical definition of $A_{O}$ and, subsequently, $P_{O}$ . At first glance, our definitions may seem unnecessarily complex. The benefit of such a rigorous approach is that it highlights the dependence of these quantities on the objectives $\mathbf{O}$ . Explicitly denoting this dependence will help us appreciate the nature of our approximation to the Pareto frontier when $\mathbf{O}$ is replaced by a Gaussian process surrogate.

Select a point $\mathbf{r}=(r_{1},\dots,r_{m})\in\mathbb{R}^{m}$ for which we have $\min_{\mathbf{x}\in X}O_{i}(\mathbf{x})\geq r_{i}$ . Since $X$ is compact, such a point exists if $O_{i}(\mathbf{x})$ is continuous. $\mathbf{r}$ is known as the reference point. Consider the vector valued function $\mathbf{O}:X\rightarrow\mathbb{R}^{m}$ defined by $\mathbf{O}=(O_{1},\dots,O_{m})$ . $\mathbf{O}$ just joins all the expected objectives in a vector. The image $\mathbf{O}[X]$ of $X$ under $\mathbf{O}$ , defined by

[TABLE]

is the set of all achievable objectives. We do not know exactly how $\mathbf{O}[X]$ looks like. However, exploiting the definition of the reference point, we see that $\mathbf{O}[X]$ is fully contained in the $m$ -dimensional cone $[\mathbf{r},\infty]:=\times_{i=1}^{m}[r_{i},\infty)$ , i.e.,

[TABLE]

Consider any subset $B$ of $[\mathbf{r},\infty)$ . We define the attained set of $B$ , denoted by $A[B]$ , to be the set of points in $[\mathbf{r},\infty)$ that are dominated by $B$ , i.e.,

[TABLE]

where $\mathbf{y}^{\prime}\geq\mathbf{y}$ corresponds to element-wise comparison. The attained set of our multi-objective problem is just:

[TABLE]

Finally, we define the Pareto frontier of $B$ , denoted by $P[B]$ , to be the set of points in $B$ that are not dominated by any other point in $B$ , i.e.,

[TABLE]

But we can get the Pareto frontier of $B$ directly from the boundary of its attained set. Specifically, it is easy to prove that $P[B]$ is the top right boundary of $A[B]$ , i.e.,

[TABLE]

where $\mathbf{e}_{i}$ is the standard basis function of $\mathbb{R}^{m}$ pertaining to the $i$ -th dimension. The Pareto front of our multi-objective problem is just:

[TABLE]

Assume that we can choose to measure the QoIs at any design point $\mathbf{x}\in X$ we wish, albeit only a limited number of times $n$ . Such measurements take place as follows. When we request information about $\mathbf{x}$ , a latent process samples an unobserved $\omega\in\Omega$ according to the probability measure $\mathbb{P}$ , and we observe a noisy version of the QoIs $\mathbf{y}=(o_{1}(\mathbf{x},\omega),\dots,o_{m}(\mathbf{x},\omega))$ . This setup is general enough to account for both simulation-based and experiment-based QoIs.

Assume that we have queried the information source at $n$ design points.

[TABLE]

and that we have made the following noisy observations:

[TABLE]

We address two problems:

What is our state of knowledge about the true Pareto-efficient frontier $P_{O}$ given the observations $(\mathbf{x}_{1:n},\mathbf{y}_{1:n})$ ? 2. 2.

How should we select $\mathbf{x}_{1:n}$ so that we come as close as possible to discovering the true Pareto-efficient frontier $P_{O}$ ?

In the language of probability theory [15], the former problem seeks to characterize the probability (a state of belief) of a design being optimal conditional on the observations. The uncertainty encoded in this probability is epistemic and it is induced by the fact that inference is based on just a small number of observations. We address this problem by leveraging the Bayesian nature of Gaussian process surrogates, see Sec. 2.1. Looking for an optimal information acquisition policy that solves the latter problem is a mathematically intractable task since the problem is equivalent to a non-linear stochastic dynamic program [26, 1]. We rely on a myopic/greedy one-step-look-ahead strategy (which is sub-optimal) by extending the definition of the standard EIHV, see Sec. 2.3, so that it can cope robustly with noise.

2.1 Gaussian process regression

Gaussian process (GP) regression [27] is the Bayesian interpretation of classical Kriging [7, 29]. It is a powerful non-linear and non-parametric regression technique that is able to quantify the epistemic uncertainty induced by limited data. We use GP regression to model our state of knowledge about the objectives, i.e., $O_{i}(\mathbf{x})=\mathbb{E}[o_{i}(\mathbf{x},\omega)],i=1,\dots,m$ , as induced by a set of n observations $(\mathbf{x}_{1:n},\mathbf{y}_{1:n})$ . The methodology applies to each $i=1,\dots,m$ independently. For simplicity, we will write $f(\mathbf{x})$ for $O_{i}(\mathbf{x})$ and $y_{1:n}$ for $y_{i,1:n}=(y_{i1},\dots,y_{in})$ .

2.1.1 Expressing prior beliefs

Let $(\Omega^{e},\mathcal{F}^{e},\mathbb{P}^{e})$ be the probability space corresponding to our epistemic uncertainty. Note that this is different from $(\Omega,\mathcal{F},\mathbb{P})$ which is associated with the problem uncertainty. A GP $f^{e}(x,\omega^{e})$ is a $(\Omega^{e},\mathcal{F}^{e},\mathbb{P}^{e})$ -random field indexed by $\mathbf{x}\in X$ with Gaussian finite dimensional distributions. That is, for any $\mathbf{x}_{1:n}\in X^{n}$ the random vector $f^{e}_{1:n}:=(f^{e}(\mathbf{x}_{1},\omega^{e}),\dots,f^{e}(\mathbf{x}_{n},\omega^{e}))$ follows a multivariate Gaussian. The interpretation is as follows. Nature has chosen a reality $\omega^{e}\in\Omega^{e}$ , i.e., $f(\cdot)\equiv f^{e}(\cdot,\omega^{e})$ , that we cannot directly observe. $(\Omega^{e},\mathcal{F}^{e},\mathbb{P}^{e})$ models our prior state of knowledge about this reality, in the sense that for all $B\in\mathcal{F}^{e}$ the probability that we give to $\omega^{e}\in B$ is $\mathbb{P}^{e}[B]=\int_{B}d\mathbb{P}^{e}(\omega^{e})$ .

A GP is characterized by a mean and a covariance function. Without loss of generality, we may assume that the mean function is zero, since the covariance can always be modified to include a non-zero mean trend. Mathematically, we write:

[TABLE]

where $k:X\times X\times\Theta^{e}\rightarrow\mathbb{R}$ is a covariance function parameterized by the epistemic random variable $\theta^{e}:\Omega^{e}\rightarrow\Theta^{e}$ . According to the definition of the GP, our a priori beliefs about the values $f^{e}_{1:n}$ are captured by:

[TABLE]

where $\mathcal{N}(\lambda,\Sigma)$ denotes the multivariate Gaussian distribution with mean $\lambda$ and covariance matrix $\Sigma$ , for all $\mathbf{x}^{\prime}_{1:n^{\prime}}\in X^{n^{\prime}}$ we define $k(\mathbf{x}_{1:n},\mathbf{x}^{\prime}_{1:n^{\prime}},\theta^{e})$ to be the $n\times n^{\prime}$ matrix with $(i,j)$ element $k(\mathbf{x}_{i},\mathbf{x}_{j},\theta^{e})$ , and $k(\mathbf{x}_{1:n},\theta^{e}):=k(\mathbf{x}_{1:n},\mathbf{x}_{1:n},\theta^{e})$ is the covariance matrix. In our numerical examples, we use the Matern( $\nu=\frac{3}{2}$ ) [27] covariance:

[TABLE]

where $d$ is the dimensionality of the design space, $s>0$ and $\ell_{j}>0$ can be interpreted as the signal strength of the response and the lengthscale along input dimension $j$ , respectively, and $\theta^{e}=(s,{\ell_{1}},\ldots,{\ell_{d}})\in\mathbb{R}^{d}_{+}$ .

2.1.2 Modeling the measurement process

In general, the noise that contaminates the measurement $y$ is heteroscedastic, i.e., input-dependent. However, we approximate this noise as Gaussian with a fixed, but unknown, variance $\nu^{2}$ . Despite this fact, we observe numerically that the GP can still estimate the optimization objectives, i.e., expectation of $y$ , when the noise to signal ratio is not too big. The likelihood of the model is:

[TABLE]

where $I_{n}\in\mathbb{R}^{n\times n}$ is the identity matrix, $k(\mathbf{x}_{1:n},\theta^{e})$ is as in Eq. (10), and, for notational convenience, we have re-defined $\theta^{e}\leftarrow(\theta^{e},\nu)$ .

2.1.3 Posterior state of knowledge about the objectives

Bayes rule combines our prior beliefs with the data and yields a posterior probability measure on the space of meta-models. Conditioned on the hyperparameters $\theta^{e}$ , this measure is also a GP,

[TABLE]

where the posterior mean and covariance functions are

[TABLE]

and

[TABLE]

respectively. Restricting our attention to a specific design point $\mathbf{x}$ , we can derive from Eq. (13) the point-predictive PDF conditioned on the hyperparameters $\theta^{e}$ :

[TABLE]

where predictive variance is $\sigma_{n}^{2}(\mathbf{x};\theta^{e})=k_{n}(\mathbf{x},\mathbf{x};\theta^{e})$ .

The hyper-parameters of the covariance function are estimated by maximizing the likelihood $p(y_{1:n}|\mathbf{x}_{1:n},\theta^{e})$ with respect to $\theta^{e}$ . To avoid numerical instabilities, one typically works with the logarithm of the likelihood:

[TABLE]

This maximization problem is solved using the BFGS algorithm [4]. To account for the positivity constraints we simply optimize with respect to the logarithms of the hyperparameters. The solution of this optimization problem, denoted by $\hat{\theta}^{e}$ , is known as the maximum likelihood estimate (MLE) of $\theta^{e}$ . For notational convenience, in what follows we are not going to be explicitly indicating the dependence of $\mu_{n}$ and $k_{n}$ on $\theta^{e}$ . Instead it will be understood that $\mu_{n}(\mathbf{x})\equiv\mu_{n}(\mathbf{x},\hat{\theta}^{e})$ , $k_{n}(\mathbf{x},\mathbf{x}^{\prime})\equiv k_{n}(\mathbf{x},\mathbf{x}^{\prime},\hat{\theta}^{e})$ , and $\sigma_{n}(\mathbf{x})\equiv\sigma_{n}(\mathbf{x},\hat{\theta}^{e})$ .

2.2 Characterization of the Pareto-efficient frontier using limited data

What is our state of knowledge about the true Pareto-efficient frontier $P_{O}$ given $n\leq N$ observations $(\mathbf{x}_{1:n},\mathbf{y}_{1:n})$ ? Let $\mathbf{f}^{e}=(f^{e}_{1},\dots,f^{e}_{m})$ be the GPs representing our state of knowledge about each one of the $m$ objectives. Our state of knowledge about the relation ‘ $\succcurlyeq$ ’ is now captured by the random relation ‘ $\succcurlyeq^{e}$ ’, namely $\mathbf{x}\succcurlyeq^{e}\mathbf{x}^{\prime}$ if and only if $\mathbf{f}^{e}(\mathbf{x})\geq\mathbf{f}^{e}(\mathbf{x}^{\prime})$ . Our state of knowledge about the attained set $A_{O}$ of Eq. (3) is given by the random set $A[\mathbf{f}^{e}[X]]$ . Similarly, our state of knowledge about the Pareto front $P_{O}$ of Eq. (6) is represented by the random set $P[\mathbf{f}^{e}[X]]$ .

The first step is to derive summary statistics of $A[\mathbf{f}^{e}[X]]$ that can be used to visualize our epistemic uncertainty about it. Following [2, 6], we achieve this by estimating the Vorob’ev expectation and deviation of the random set $A[\mathbf{f}^{e}[X]]$ . Towards this end, we introduce the attainment function and its upper level sets. The attainment function $a^{e}_{n}:[\mathbf{r},\infty)\rightarrow[0,1]$ is defined to be the conditional probability, given $(\mathbf{x}_{1:n},\mathbf{y}_{1:n})$ , that a vector of objectives $\mathbf{y}\in[\mathbf{r},\infty)$ can be attained, i.e., we define

[TABLE]

where $\mathbf{f}^{e}_{\omega^{e}}(\cdot)=\left(f_{1}^{e}(\cdot,\omega^{e}),\dots,f_{m}^{e}(\cdot,\omega^{e})\right)$ . For $\beta\in[0,1]$ , the upper level sets of the attainment function,

[TABLE]

are known as the $\beta$ -quantiles of $A[\mathbf{f}^{e}[X]]$ . Intuitively, $Q_{n,\beta^{*}}^{e}$ can be seen as the set of objectives that are considered achievable with probability greater than or equal to $\beta$ . The conditional Vorob’ev expectation [23] of $A[\mathbf{f}^{e}[X]]$ is defined to be the $\beta^{*}$ -quantile $Q_{n,\beta^{*}}^{e}$ for which:

[TABLE]

where $\lambda$ is the Lebesgue measure on $\mathbb{R}^{m}$ . In words, $Q_{n,\beta^{*}}^{e}$ is the $\beta$ -quantile that has the same Lebesgue measure as the conditional expectation of the Lebesgue measure of the attained set. Intuitively, $Q_{n,\beta^{*}}^{e}$ and its top right boundary are our expectations about the attained set $A_{O}$ and $P_{O}$ , respectively, after observing $(\mathbf{x}_{1:n},\mathbf{y}_{1:n})$ .

Now, we are in a position to quantify our uncertainty about $P_{O}$ . Consider the symmetric difference $Q_{n,\beta^{*}}^{e}\triangle A[\mathbf{f}^{e}[X]]$ between the set $Q_{n,\beta^{*}}^{e}$ and $A[\mathbf{f}^{e}[X]]$ defined by

[TABLE]

That is, a point $\mathbf{y}$ belongs in $Q_{n,\beta^{*}}^{e}\triangle A[\mathbf{f}^{e}[X]]$ only if it belongs to exactly one of these sets. Such points appear in the top right corner of $[\mathbf{r},\infty)$ and are candidate points for the Pareto front. Therefore, we quantify our uncertainty about $P_{O}$ through the symmetric deviation function $d_{n}^{e}:[\mathbf{r},\infty)\rightarrow[0,1]$ defined as the conditional probability that a vector of objectives $\mathbf{y}\in[\mathbf{r},\infty)$ belongs to the symmetric difference $Q_{n,\beta^{*}}^{e}\triangle A[\mathbf{f}^{e}[X]]$ , i.e.,

[TABLE]

Unfortunately, it is not possible to characterize $a_{n}^{e}(\mathbf{y})$ , $Q_{n,\beta^{*}}^{e}$ , and $d_{n}^{e}(\mathbf{y})$ exactly. The difficulty arises from the fact that $X$ may be infinite dimensional. To overcome this obstacle, we use a Monte Carlo (MC) approach. Let $(\tilde{\Omega},\tilde{\mathcal{F}},\tilde{\mathbb{P}})$ be a new probability space associated with the MC approximation uncertainty. Let $\tilde{X}_{s}:\tilde{\Omega}\rightarrow X^{\tilde{n}}$ , collectively denoted by $\tilde{X}_{1:S}=(\tilde{X}_{1},\dots,\tilde{X}_{S})$ , be independent identically distributed (iid) random variables in $(\tilde{\Omega},\tilde{\mathcal{F}},\tilde{\mathbb{P}})$ with values in $X^{\tilde{n}}$ . For each one, we have $\tilde{X}_{s}:=\tilde{\mathbf{x}}_{1,1:\tilde{n}}:=(\tilde{\mathbf{x}}_{s,1},\dots,\tilde{\mathbf{x}}_{s,\tilde{n}})$ The specific distribution of these variables is not important as soon they cover $X$ . For convergence, it suffices to make all the $\tilde{\mathbf{x}}_{s,i},s=1,\dots,S,i=1,\dots,\tilde{n}$ iid with a support that covers $X$ . In our numerical examples, we take all these random variables to be independently uniform. Conditional on each $\tilde{X_{s}}$ , define the epistemic random variable $\tilde{F}_{s}^{e}\in\mathbb{R}^{m\tilde{n}}$ associated with the values of the objectives on $\tilde{X}_{s}$ . That is, $\tilde{F}_{s}^{e}:=\tilde{\mathbf{f}}^{e}_{s,1:m,1:\tilde{n}}:=(\tilde{f}^{e}_{s,1,1:\tilde{n}},\dots,\tilde{f}^{e}_{s,m,1:\tilde{n}})$ , with $\tilde{f}_{s,i,1:\tilde{n}}:=(f^{e}_{i}(\tilde{\mathbf{x}}_{s,1}),\dots,f^{e}_{i}(\tilde{\mathbf{x}}_{s,\tilde{n}}))\in\mathbb{R}^{\tilde{n}}$ . Note that, since we constructed each one of the GPs representing the objectives independently, we have that $\tilde{f}^{e}_{s,i,1:\tilde{n}},s=1,\dots,S,i=1,\dots,m$ are independent. Making use of the posterior GP representing our state of knowledge about $f^{e}_{i}(\mathbf{x})$ , see Eq. (13), we get that, conditional on $\tilde{\mathbf{x}}_{1:\tilde{n}}$ and $(\mathbf{x}_{1:n},y_{i,1:n})$ , $\tilde{f}^{e}_{s,i,1:\tilde{n}}$ is normally distributed:

[TABLE]

where $\mu_{i,n}(\mathbf{x})$ and $k_{i,n}(\mathbf{x},\mathbf{x}^{\prime})$ are the posterior mean and posterior covariance functions ( $\mu_{n}(\mathbf{x})$ and $k_{n}(\mathbf{x},\mathbf{x}^{\prime})$ ) of Sec. 2.1.3, respectively, if we make the substitution $y_{1:n}\leftarrow y_{i,1:n}$ . Using $\tilde{F}^{e}_{s}$ , and the definition in Eq. (2) we denote the sampled attained set by $A[\tilde{F}^{e}_{s}]$ and the corresponding sampled Pareto front by $P[\tilde{F}^{e}_{s}]$ . Now we can compute the empirical attainment function $\tilde{a}_{S,\tilde{n},n}^{e}:[\mathbf{r},\infty)\rightarrow[0,1]$ :

[TABLE]

where $1_{B}(\mathbf{y})$ is the characteristic function of the set $B$ . Using $\tilde{a}_{S,\tilde{n},n}^{e}(\mathbf{y})$ we can obtain estimates of the $\beta$ -quantiles, say $\tilde{Q}_{S,\tilde{n},n,\beta}^{e}$ . Just like [2], estimates of the $\beta$ -quantiles can be used within a bisection algorithm to estimate the Vorob’ev expectation $\tilde{Q}_{S,\tilde{n},n,\beta^{*}}^{e}$ . Finally, we compute the empirical symmetric deviation function:

[TABLE]

which is an estimate of $d_{n}^{e}(\mathbf{y})$ . In our numerical examples (in which $m=2$ ) we represent $\tilde{a}^{e}_{S,\tilde{n},n}(\mathbf{y})$ and $\tilde{d}_{S,\tilde{n},n}^{e}(\mathbf{y})$ on a $64\times 64$ grid defined on $\times_{i=1}^{m}[r_{i},u_{i}]$ , where $\mathbf{u}=(u_{1},\dots,u_{m})\in\mathbb{R}^{m}$ is a point of the design space with $u_{i}\geq\max_{\mathbf{x}\in X}O_{i}(\mathbf{x}),i=1,\dots,m$ . For larger number of objectives $m>3$ , more sophisticated techniques must be developed in order to overcome the curse of dimensionality. From the law of large numbers, we have that

[TABLE]

We also expect that the attainment function $a^{e}_{n}$ will converge to the characteristic function of the attained set $A_{O}$ as $n\rightarrow\infty$ on a set of design points that becomes dense. The exact nature of the latter convergence is beyond the scope of the present work.

2.3 Extended expected improvement over dominated hypervolume

Given our current state of knowledge about $P_{O}$ , how should we select the next observation $\mathbf{x}$ ? We derive a myopic one-step-look-ahead strategy that attempts to sequentially maximize the expected improvement in the volume of the attained set. Specifically, we define the extended expected improvement over the dominated hypervolume (EEIHV) as the expectation of the change in the Lebesgue measure of the attained set conditional on a hypothetical observation. Mathematically, we define for $\mathbf{x}\in X$ :

[TABLE]

where the outer expectation is over our state of knowledge about the hypothetical measurement $\mathbf{y}$ induced by the GPs of Sec. 2.1:

[TABLE]

where $\mu_{i,n}(\cdot)=\mu_{i,n}(\cdot;\theta^{e}_{i})$ and $\sigma_{i,n}^{2}(\cdot)=\sigma_{i,n}^{2}(\cdot;\theta^{e}_{i})$ are the posterior predictive mean and variance of the GP $f^{e}_{i}$ pertaining to objective $i=1,\dots,m$ , see Eq. (16). Our myopic strategy is outlined in Algorithm 1.

Eq. (28) is analytically intractable and must be approximated using the sampling methods of Sec. 2.2. This is computationally inefficient because it does not allow the use of gradient-based optimization algorithms such as BFGS. To overcome this difficulty, we derive an approximation that will allow us to make use of the analytical formulas derived by [10]. We have:

[TABLE]

The first row inequality comes from $\mathbf{x}_{1:n}\subset X$ implying $\mathbf{f}^{e}[\mathbf{x}_{1:n}]\subset\mathbf{f}^{e}[X]$ which, in turn, yields $A[\mathbf{f}^{e}[\mathbf{x}_{1:n}]]\subset A[\mathbf{f}^{e}[X]]$ . For the approximation in the second row, start by noticing that $\mathbf{z}=\mathbf{f}^{e}[\mathbf{x}_{1:n}]$ conditioned on $\mathbf{x}_{1:n}$ and that $\mathbf{y}_{1:n}$ follows a multivariate Gaussian, see Eq. (13). Then, take the Taylor expansion of $\lambda(A[\mathbf{z}])$ about $\mathbf{z}=\mathbf{z}_{0}=\bm{\mu}_{n}(\mathbf{x}_{1:n}):=\left(\mu_{1,n}(\mathbf{x}_{1:n}),\dots,\mu_{m,n}(\mathbf{x}_{1:n})\right)$ . The zero order term is the constant you see above, i.e., $\lambda\left(A[\bm{\mu}_{n}[\mathbf{x}_{1:n}]]\right)$ . The expectation of the first order term vanishes and we ignore second and higher order terms. Thinking in the same way, we can get:

[TABLE]

where $\bm{\mu}_{n,(\mathbf{x},\mathbf{y})}$ is the posterior mean after seeing the hypothetical observation $(\mathbf{x},\mathbf{y})$ . Finally, we approximate the expectation over the hypothetical measurement as:

[TABLE]

To see why this is possible, note that $\mathbf{y}=\mathbf{f}^{e}(\mathbf{x})+\nu^{2}\bm{\epsilon}$ where $\bm{\epsilon}$ is Gaussian with zero mean and unit covariance, take the Taylor expansion of the integrand in the first line about $\bm{\epsilon}=\mathbf{0}$ , and keep only the zero order term (the expectation of the first order term vanishes). Putting everything together, we get the (approximate) inequality:

[TABLE]

The inequality is approximate because the first term on the right hand side is approximately greater than the second one. The accuracy is again second order and proving it requires taking the Taylor expansion of the integrand of the first term with respect to $\mathbf{z}\equiv\mathbf{f}^{e}(\mathbf{x})$ about $\mathbf{z}=\mathbf{z}_{0}\equiv\bm{\mu}_{n}(\mathbf{x})$ .

The important observation here is that the lower bound to EEIHV, i.e., $\overline{\operatorname{EEIHV}}$ on right hand side of Eq. (30), is similar to the original EIHV of [10] with a few key differences. Specifically, $\overline{\operatorname{EEIHV}}$ has the same analytical form as EIHV if in EIHV (i) we replace the observed targets with their projections to the posterior mean, i.e., if we work with the denoised measurements instead of the noisy ones; and (ii) we remove the noise variance from the predictive distribution of the GP. Therefore, the analytical formula for the calculation of EIHV found in [10] applies to $\overline{\operatorname{EEIHV}}$ subject to the aforementioned substitutions. In all our numerical examples, we use $\overline{\operatorname{EEIHV}}$ . We maximize the lower bound over $\mathbf{x}$ using BFGS with multiple random restarts.

3 Numerical Results

In Sec. 3.1 we use a synthetic example to visualize some of the concepts used through out this section. In Sections 3.2 and 3.3, we validate our approach using two synthetic stochastic optimization problems with known optimal solutions. To assess the robustness of the methodology, we experiment with various levels of stochasticity which causes the resultant noise in the outputs. In Sec. 3.4, we solve the steel wire drawing problem with uncertainties in the incoming wire diameters and the die angles at each pass. In all the problems the objectives are scaled by subtracting and dividing by the emprical mean and standard deviation, respectively.

3.1 Correspondence between nomenclature and visualizations

Fig. 1 uses an $m=2$ synthetic example to help us visualize and name some of the concepts used throughout this section. The dark blue staircase is an approximation of the true $P_{O}$ , generated by taking the empirical Pareto frontier of sample averaged objective measurements at a large number of designs. The figure also shows a scatter plot of the denoised measurements $\bm{\mu}_{n}(\mathbf{x}_{1:n})$ (green dots), and as well as the corresponding empirical Pareto frontier $P[\bm{\mu}_{n}[\mathbf{x}_{1:n}]]$ (green line). The red dot marks the denoised measurement made at the design $\mathbf{x}_{n+1}$ that maximizes $\overline{\operatorname{EEIHV}}(\mathbf{x})$ . The red line is the top right boundary of the Vorob’ev expectation of $A[\mathbf{f}^{e}[X]]$ conditioned on the observed data $(\mathbf{x}_{1:n},\mathbf{y}_{1:n})$ , i.e., it is our expectation about $P[\mathbf{f}^{e}[X]]$ conditioned on our current state of knowledge. The gray contours show to the symmetric deviation $d^{e}_{n}(\mathbf{y})$ of $A[\mathbf{f}^{e}[X]]$ which corresponds to our uncertainty about $P[\mathbf{f}^{e}[X]]$ .

3.2 Two-dimensional synthetic example

Consider the two-dimensional synthetic multi-objective problem taken from [25] which has been slightly modified for our use here:

[TABLE]

for $\mathbf{x}=(x_{1},x_{2})\in X=[0,1]^{2}$ . The $(\Omega,\mathbb{P},\mathcal{F})$ random variable $\xi$ is standard normal, i.e., $\xi\sim\mathcal{N}(0,1)$ . The parameter $s$ controls the standard deviation of the noise infused by $\xi$ . Notice that even though $\xi$ is normal, the measured objectives $o_{i}(\mathbf{x},\omega)$ are not normally distributed due to the non-linearities. That is the statistics of the measurement process do not match our assumptions in Sec. 2.1. We do this on purpose. In real applications the statistics of the measurements process are not known and we would like to investigate to what extent the normality assumption produces robust results.

To validate our methodology, we must first estimate accurately the true $P_{O}$ . We achieve this by finding the empirical Pareto frontier of a large number of designs (10000) while approximating $O_{i}(\mathbf{x})=\mathbb{E}[o_{i}(\mathbf{x},\omega)]$ with 100 Monte Carlo samples. In this example, we aim to maximize the two objectives.

We start with $n=20$ random initial observations and we add an additional 100 measurements selected according to Algorithm 1. Fig. 2 depicts our final state of knowledge about $P[\mathbf{f}^{e}[\mathbf{x}_{1:n}]]$ for increasing noise levels $s=0.01,0.03,0.05,$ and $0.1$ . Another graphic that appears on this figure is the line joining the large yellow dots. These points represent the Pareto frontier obtained by a sampling average of the objectives at the Pareto optimal designs found by the methodology after the fixed number of iterations, i.e., an estimation of $P[\mathbf{O}[\mathbf{x}_{1:N_{\max}}]]$ which is to be contrasted to $P[\bm{\mu}_{N_{\max}}[\mathbf{x}_{1:N_{\max}}]]$ . This Pareto frontier is a representation of the quality of the solution obtained by the methodology. With low levels of stochasticity the methodology neatly approximates the noise in the outputs as Gaussian, shown in Fig. 2 (a) and (b). With an increase in the value of the stochasticity parameter, $s$ , the final Pareto frontier obtained starts diverging from $P_{O}$ , shown in Fig. 2 (c) and (d). In Fig. 2 (c) and (d), the methodology ends up exploiting the area near the two ends of the observed $P[\mathbf{O}[\mathbf{x}_{1:N_{\max}}]]$ only, and not the whole $P_{O}$ which is possibly a manifestation of the methodology not being able to estimate and filter out the excessive non-Gaussian noise in these cases. The contours of the symmetric deviation (which can be understood as the probability of a particular set of objective values being achievable conditional on the observations made thus far) do reinforce greater knowledge about the plausibility of the achievable values even in regions which tend to dominate the approximated Pareto frontier. This means that with more simulations the methodology should eventually discover more Pareto efficient solutions across the complete boundary of the approximated Pareto frontier. So, the symmetric deviation allows the decision maker to realize the potential value that lies in doing further simulations.

3.3 Six-dimensional synthetic example

Consider the following test objective functions from [19]:

[TABLE]

for $\mathbf{x}\in X=[0,1]^{6}$ , where $\xi_{i}\sim\mathcal{N}(0,1),i=1,\dots,6$ are independent. As before, the expected objectives are not analytically available. We use the same approximation technique as in the previous example to estimate the ground truth of $P_{O}$ for this test problem.

Fig. 3 depicts our final state of knowledge about $P[\mathbf{f}^{e}[\mathbf{x}_{1:n}]]$ for increasing noise levels $s=0.01,0.03,0.05$ , and $0.1$ . As before, the larger the noise the harder it is for the methodology to discover $P_{O}$ , the true Pareto frontier. In general, as can be seen in Fig. 3 the method is robust to noise as long as the noise is reasonably low for the given number of initial measurements. The powerfulness of the methodology can be observed through Fig. 3 (a) and (b) , where the final $P[\mathbf{f}^{e}[\mathbf{x}_{1:n}]]$ contains points that dominate the $P[\mathbf{O}[\mathbf{x}_{1:N_{\max}}]]$ , when the noise parameter has relatively low values. The method, as expected, discovers very few points on $P_{O}$ as the noise increases to $s=0.1$ as can be seen in (d) of Fig. 3.

3.4 Wire drawing problem

The wire drawing process is designed to achieve the desired final diameter and mechanical properties such as ultimate tensile strength (UTS) and ductility through cold reduction of a larger diameter wire. The desired wire properties depend on applications – for example, high torsional ductility is required for application in tires, high strength wires used in machine tools for metal cutting. A typical reduction of the cross section the wire, based on the final properties required would be in the range of 70-90 percent and this is achieved by reducing the wire diameter in a number of passes. Each pass involves drawing through a conical die and the sequence of reductions and corresponding die angles at each pass would play an important role on the final properties as well as performance of operations. Here we consider a wire drawing process having a fixed number of passes (8 passes). An finite element analysis (FEA) based simulator, developed for an industrial operation was used to simulate this process. This wire drawing simulator includes wire deformation, heat generation and dissipation in the wire as well as dies, cooling of wire on the cooling drum and in the atmosphere and is based on large deformation theory. The model considers the process to be axisymmetric. The multi-pass drawing effect is modeled by considering carryover effect of previous pass such as residual stress, plastic strain and temperature. The FEA is done using four noded isoparametric elements. A penalty parameter approach is used for modeling the contact between the wire and the dies. The simulator takes the input as wire material properties, input wire diameter, die pass schedule (reduction and die angle at each pass), wire drawing speed, cooling conditions, friction, etc.; and predicts the internal stress and strains in the wire and the die, load on each die and the drum, temperature of the wire and the die, properties indicative of final wire mechanical properties – UTS representing strength and strain non-uniformity factor (SNUF) representing relative ductility.

The plastic deformation across the cross section of the final wire should be as uniform as possible for enhanced ductility. The UTS is primarily governed by the total reduction but the non-uniform deformation has a significant secondary role on the final UTS. To understand this uniformity, the plastic strain distribution is modeled and is represented as SNUF. SNUF is a ratio of difference between the peak and average strain to average strain. Besides the properties of the drawn wire, process defects such as wire burst during drawing process is an important aspect to consider as central burst is highly undesired since it leads to wire breakage during drawing process and this effect is modeled through the measurement of triaxiality by a factor called the hydraulic failure factor (HFF). The coefficient of friction is assumed to be constant throughout the process. Here, we have the UTS and the SNUF as the two competing objectives for the process.

The design variables for this problem are the die angles (one at each pass) and the incoming wire diameter (implicit in the reduction ratio) at each pass. The outgoing wire diameter at a pass is same as the incoming wire diameter for the next pass. The incoming wire diameter $d_{j}$ and the reduction ratio $(rr_{j})$ for a pass $j$ are related by the formula given in (38).

[TABLE]

For this problem we take the case of drawing an 8mm wire into a 3mm wire Fig. 4.

So, with the overall reduction ratio (and the incoming wire diameter for the first pass) fixed, the problem becomes that of two objectives with 15 design parameters (8 die angles and 7 incoming wire diameters). We apply our methodology to the wire drawing problem and demonstrate its ability to deal with the problem of stochasticity in the objectives induced by our inability to fully control the design parameters, to obtain a set of Pareto optimal solutions. This uncertainty can be understood as the ubiquitous effect of the continuous wear and tear on the die which would cause the process to deviate from delivering ideal (no noise) outputs. Also, in any manufacturing process the tolerances need to be accounted for as the procured dies themselves would not have exact dimensions as required. The design space has been bounded by choosing a suitable range for design variables as follows:

For $i=1,\dots,7$ , $x_{i}\in[0,1]$ represent the incoming wire diameters. 2. 2.

For $i=1,\dots,8$ , $x_{i+7}\in[0,1]$ represent the die angles.

Specifically, we assume that when we try to implement a process with design $\mathbf{x}$ , what we actually get is a process with design $\mathbf{x}+\mathbf{S}\bm{\xi}$ , where $\bm{\xi}\sim\mathcal{N}(\mathbf{0}_{15},\mathbf{I}_{15})$ and $S=\operatorname{diag}(s_{1},\dots,s_{15})$ where $s_{i}=0.05,\forall i\in[1,7]$ and $s_{i}=0.1,\forall i\in[8,15]$ . The above space $X=[0,1]^{15}$ is a scaled representation of the real space for simplification purposes. The random vector from the real space $X=[7.2,7.5]\times[6.6,6.9]\times[5.8,6.1]\times[5.1,5.4]\times[4.4,4.7]\times[3.9,4.2]\times[3.3,3.6]\times[8,14]^{8}$ , can be obtained by rescaling the random vector from the scaled space by using a simple linear transformation. The noisy objectives considered here are:

[TABLE]

The optimization problem involves maximizing the UTS and minimizing the SNUF. For simplifying the problem to the requirements of our code and software we convert it to an equivalent maximization problem where we maximize the UTS and maximize the negative of the SNUF. We consider a scenario with 15 initial observations of the MOO problem and limit our computational budget to allow for 50 additional simulations to be carried out sequentially.

Fig. 5 shows the projected initial observations for the problem. We scale the measurements obtained by subtracting and dividing by the empirical mean and standard deviation just as in the case of the test function discussed above. This is done to maintain consistency with the assumption (in Sec. 2.1.1) of a zero mean (standard normal) GP for computational flexibility.

A key aspect of quantifying our knowledge about the state of the objectives is the Vorob’ev expectation which is computed by obtaining by sampling the design space $X$ . However, it must be noted that in this case with 15 dimensions, it becomes very difficult to cover the whole design space as a result of which certain designs picked by the algorithm, end up outside the sampled designs. The overarching effect of this can be seen in Fig. 6 (a), where the Vorob’ev expectation can be seen lying below the points in the top left corner picked by the methodology. To circumvent this issue, we augment the set of sampled designs with the designs at which we have made observations. This provides a clearer picture, Fig. 6 (b), of the state as it reinforces the information obtained thus far while quantifying our beliefs about the state of the Pareto-efficient frontier.

Fig. 7 depicts the state of the problem after the fiftieth iteration. Since, we do not have the computational resources to obtain $P[\mathbf{O}[\mathbf{x}_{1:N_{\max}}]]$ for comparison, we sample average the value of the objectives, 100 times, corresponding to the final Pareto designs as shown in Fig. 6. This averaging gives us an estimate of the approximate true state of the Pareto-efficient frontier after the computational budget has been exhausted.

4 Conclusions

We constructed an extension to the EIHV information acquisition function which makes possible the application of BGO to stochastic multi-objective black-box optimization problems. In addition to the above, we have shown how the epistemic uncertainty induced by the limited number of simulations can be quantified and used, to represent the uncertainty around the PF at each stage. We have validated our approach by applying it on two, slightly modified to include stochastic parameters, synthetic test functions with known Pareto frontiers. Furthermore, we applied our method on the challenging steel wire drawing problem under parametric uncertainty in a scenario of simulation based design. The method offers a viable alternative to the state-of-the-art evolutionary optimization algorithms which rely heavily on sample averaging and are unaffordable under a limited budget scenario. Moreover, the proposed extension to EIHV gives acceptable results under cases of moderate levels of noise with limited number of initial observations. There remain several open research questions. The most pressing direction to look in would be the efficient treatment of stochastic multi-objective problems under unknown and expensive constraints under a scenario of constrained computational resources.

5 Acknowledgments

Ilias Bilionis acknowledges the startup support provided by the School of Mechanical Engineering at Purdue University.

The authors acknowledge the support provided by Tata Consultancy Services, Pune, India.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

11. D. Bertsekas. Dynamic Programming and Optimal Control . Athena Scientific, 4th edition, 2007.
22. M. Binois, D. Ginsbourger, and O. Roustant. Quantifying uncertainty on Pareto fronts with Gaussian process conditional simulations. European Journal of Operational Research , 243(2):386–394, June 2015.
33. E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. ar Xiv preprint ar Xiv:1012.2599 , 2010.
44. R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing , 16(5):1190–1208, 1995.
55. A. Charnes and W. W. Cooper. Goal programming and multiple objective optimizations: Part 1. European Journal of Operational Research , 1(1):39–54, 1977.
66. C. Chevalier, D. Ginsbourger, J. Bect, and I. Molchanov. Estimating and quantifying uncertainties on level sets using the vorob’ev expectation and deviation with gaussian process models. In m O Da 10–Advances in Model-Oriented Design and Analysis , pages 35–43. Springer, 2013.
77. N. Cressie. The origins of kriging. Mathematical geology , 22(3):239–252, 1990.
88. K. Deb. Introduction to evolutionary multiobjective optimization. In Multiobjective Optimization , pages 59–96. Springer, 2008.