Multifidelity probability estimation via fusion of estimators

Boris Kramer; Alexandre Noll Marques; Benjamin Peherstorfer; Umberto; Villa; Karen Willcox

arXiv:1905.02679·stat.CO·July 30, 2019

Multifidelity probability estimation via fusion of estimators

Boris Kramer, Alexandre Noll Marques, Benjamin Peherstorfer, Umberto, Villa, Karen Willcox

PDF

TL;DR

This paper introduces a multifidelity approach that fuses multiple estimators to efficiently and accurately estimate failure probabilities in complex models, reducing computational cost while maintaining precision.

Contribution

It develops a novel unbiased fusion method for importance sampling estimators that optimally combines multiple models to minimize variance and computational effort.

Findings

01

Fused estimator achieves lower variance than individual estimators.

02

Method reduces computational cost by 65% in a turbulent flow model.

03

Asymptotic analysis confirms optimality of the fusion approach.

Abstract

This paper develops a multifidelity method that enables estimation of failure probabilities for expensive-to-evaluate models via information fusion and importance sampling. The presented general fusion method combines multiple probability estimators with the goal of variance reduction. We use low-fidelity models to derive biasing densities for importance sampling and then fuse the importance sampling estimators such that the fused multifidelity estimator is unbiased and has mean-squared error lower than or equal to that of any of the importance sampling estimators alone. By fusing all available estimators, the method circumvents the challenging problem of selecting the best biasing density and using only that density for sampling. A rigorous analysis shows that the fused estimator is optimal in the sense that it has minimal variance amongst all possible combinations of the estimators.…

Tables6

Table 1. Table 1: Boundary conditions for the combustion model from [ 7 ] .

Boundary	Temperature	Species
$Γ_{D, i}$	$T = 950$ K	$Y_{H_{2}} = 0.0282, Y_{O_{2}} = 0.2259, Y_{H_{2} O} = 0$
$Γ_{D, 0}$	$T = 300$ K	$Y_{H_{2}} = 0, Y_{O_{2}} = 0, Y_{H_{2} O} = 0$
$Γ_{N}$	$\nabla T \cdot 𝒏 = 0$	$\nabla Y_{i} \cdot 𝒏 = 0$

Table 2. Table 2: Parameters for the combustion model from [ 7 ] .

quantity	physical meaning	assumptions	value
$κ$	molecular diffusivity	const., equal, uniform $\forall i$	$2.0 \frac{{cm}^{2}}{s}$
$U$	velocity	const.	$50 \frac{cm}{s}$
$W_{H_{2}}$	molecular weight	const.	$2.016 \frac{g}{mol}$
$W_{O_{2}}$	molecular weight	const.	$31.9 \frac{g}{mol}$
$W_{H_{2} O}$	molecular weight	const.	$18 \frac{g}{mol}$
$ρ$	density of mixture	const.	$1.39 \times 10^{- 3} \frac{g}{{cm}^{3}}$
$R$	univ. gas constant	const.	$8.314472 \frac{J}{mol K}$
$Q$	heat of reaction	const.	$9800$ K
$ν_{H_{2}}$	stochiometric coefficient	const.	2
$ν_{O_{2}}$	stochiometric coefficient	const.	1
$ν_{H_{2} O}$	stochiometric coefficient	const.	2

Table 3. Table 3: CPU time to generate the biasing densities, and the number of samples in the failure domain.

	ROM1	ROM2	ROM3	HFM
# of samples drawn	$10^{5}$	$2 \times 10^{4}$	$2 \times 10^{4}$	$2 \times 10^{4}$
# of samples in failure domain $𝒢^{(i)}$	0	13	17	17
time needed	N.A.	$11.2$ [s]	$12.7$ [s]	$2.1$ [h]

Table 4. Table 4: Weights of the fused estimator P ^ 𝜶 subscript ^ 𝑃 𝜶 \hat{P}_{\boldsymbol{\alpha}} with n 𝑛 n samples.

	$n = 10^{2}$	$n = 10^{3}$	$n = 10^{4}$	$n = 2 \times 10^{4}$	$n = 4 \times 10^{4}$	$n = 10^{5}$
$α_{1}$	0	0	0.005	0.001	0.002	0.005
$α_{2}$	0.587	0.471	0.331	0.294	0.415	0.742
$α_{3}$	0.413	0.529	0.664	0.705	0.583	0.253

Table 5. Table 5: Results for subset simulation to compute failure probabilities for the convection-diffusion-reaction problem.

samples	samples each level	No of levels $L$	failure Prob.	estimated C.o.V.
2000	500	4	$1.06 \times 10^{- 3}$	$3.24 \times 10^{- 1}$
4000	800	5	$8.14 \times 10^{- 4}$	$2.69 \times 10^{- 1}$
4000	1000	4	$8.80 \times 10^{- 4}$	$2.29 \times 10^{- 1}$
6000	1500	4	$9.30 \times 10^{- 4}$	$1.92 \times 10^{- 1}$
10000	2000	5	$7.70 \times 10^{- 4}$	$1.68 \times 10^{- 1}$
20000	4000	5	$8.22 \times 10^{- 4}$	$1.18 \times 10^{- 1}$

Table 6. Table 6: Weights of the fused estimator P ^ 𝜶 subscript ^ 𝑃 𝜶 \hat{P}_{\boldsymbol{\alpha}} with n 𝑛 n samples.

	$n = 300$	$n = 600$	$n = 900$	$n = 1200$
$α_{1}$	0.500	0.502	0.610	0.900
$α_{2}$	0.448	0.044	0.055	0.057
$α_{3}$	0.052	0.453	0.335	0.043

Equations121

G := {z \in D ∣ g (f (z)) < 0} .

G := {z \in D ∣ g (f (z)) < 0} .

I_{G} (z) = {1, z \in G, 0, otherwise .

I_{G} (z) = {1, z \in G, 0, otherwise .

P = E_{p} [I_{G} [Z]] = \int_{D} I_{G} (z) p (z) d z

P = E_{p} [I_{G} [Z]] = \int_{D} I_{G} (z) p (z) d z

P_{n} = \frac{1}{n} i = 1 \sum n I_{G} (z_{i}) .

P_{n} = \frac{1}{n} i = 1 \sum n I_{G} (z_{i}) .

e^{CV} (P_{n}) = \frac{V [ P _{n} ]}{( E [ P _{n} ] ) ^{2}} = \frac{P ( 1 - P )}{n P ^{2}} = \frac{1 - P}{n P} .

e^{CV} (P_{n}) = \frac{V [ P _{n} ]}{( E [ P _{n} ] ) ^{2}} = \frac{P ( 1 - P )}{n P ^{2}} = \frac{1 - P}{n P} .

P = \int_{D} I_{G} (z) p (z) d z = \int_{D} I_{G} (z) \frac{p ( z )}{q ( z )} q (z) d z

P = \int_{D} I_{G} (z) p (z) d z = \int_{D} I_{G} (z) \frac{p ( z )}{q ( z )} q (z) d z

P_{n}^{IS} = \frac{1}{n} i = 1 \sum n I_{G} (z_{i}^{'}) \frac{p ( z _{i}^{'} )}{q ( z _{i}^{'} )} .

P_{n}^{IS} = \frac{1}{n} i = 1 \sum n I_{G} (z_{i}^{'}) \frac{p ( z _{i}^{'} )}{q ( z _{i}^{'} )} .

V [P_{n}^{IS}] = \frac{σ _{q}^{2}}{n},

V [P_{n}^{IS}] = \frac{σ _{q}^{2}}{n},

σ_{q}^{2} = \int_{D} (\frac{I _{G} ( z ^{'} ) p ( z ^{'} )}{q ( z ^{'} )} - P)^{2} q (z^{'}) d z^{'} .

σ_{q}^{2} = \int_{D} (\frac{I _{G} ( z ^{'} ) p ( z ^{'} )}{q ( z ^{'} )} - P)^{2} q (z^{'}) d z^{'} .

E_{q} [P_{n}^{IS}] = E_{p} [I_{G} (Z)] = P .

E_{q} [P_{n}^{IS}] = E_{p} [I_{G} (Z)] = P .

f^{(i)} : D \mapsto R^{d^{'}}, i = 1, \dots, k

f^{(i)} : D \mapsto R^{d^{'}}, i = 1, \dots, k

P_{n_{i}}^{IS} = \frac{1}{n _{i}} j = 1 \sum n_{i} I_{G} (z_{i, j}) \frac{p ( z _{i, j} )}{q _{i} ( z _{i, j} )}, z_{i, j} \sim q_{i}, j = 1, \dots, n_{i},

P_{n_{i}}^{IS} = \frac{1}{n _{i}} j = 1 \sum n_{i} I_{G} (z_{i, j}) \frac{p ( z _{i, j} )}{q _{i} ( z _{i, j} )}, z_{i, j} \sim q_{i}, j = 1, \dots, n_{i},

P_{α} = i = 1 \sum k α_{i} P_{i},

P_{α} = i = 1 \sum k α_{i} P_{i},

α min V [P_{α}] s.t. E [P_{α}] = P .

α min V [P_{α}] s.t. E [P_{α}] = P .

i = 1 \sum k α_{i} = 1 \Leftrightarrow E [P_{α}] = i = 1 \sum k α_{i} E [P_{i}] = P i = 1 \sum k α_{i} = P .

i = 1 \sum k α_{i} = 1 \Leftrightarrow E [P_{α}] = i = 1 \sum k α_{i} E [P_{i}] = P i = 1 \sum k α_{i} = P .

ρ_{i, j} = \frac{C ov ( P _{i} , P _{j} )}{σ _{i} σ _{j}},

ρ_{i, j} = \frac{C ov ( P _{i} , P _{j} )}{σ _{i} σ _{j}},

Σ = σ_{1}^{2} σ_{2} σ_{1} ρ_{2, 1} ⋮ ⋮ σ_{k} σ_{1} ρ_{k, 1} σ_{1} σ_{2} ρ_{1, 2} σ_{2}^{2} ⋱ σ_{k} σ_{2} ρ_{k, 2} \dots σ_{2} σ_{3} ρ_{2, 3} \dots \dots \dots σ_{k - 1}^{2} σ_{k} σ_{k - 1} ρ_{k, k - 1} σ_{1} σ_{k} ρ_{1, k} σ_{2} σ_{k} ρ_{2, k} σ_{k - 1} σ_{k} ρ_{k - 1, k} σ_{k}^{2} .

Σ = σ_{1}^{2} σ_{2} σ_{1} ρ_{2, 1} ⋮ ⋮ σ_{k} σ_{1} ρ_{k, 1} σ_{1} σ_{2} ρ_{1, 2} σ_{2}^{2} ⋱ σ_{k} σ_{2} ρ_{k, 2} \dots σ_{2} σ_{3} ρ_{2, 3} \dots \dots \dots σ_{k - 1}^{2} σ_{k} σ_{k - 1} ρ_{k, k - 1} σ_{1} σ_{k} ρ_{1, k} σ_{2} σ_{k} ρ_{2, k} σ_{k - 1} σ_{k} ρ_{k - 1, k} σ_{k}^{2} .

V [P_{α}] = V [i = 1 \sum k α_{i} P_{i}]

V [P_{α}] = V [i = 1 \sum k α_{i} P_{i}]

= i = 1 \sum k α_{i}^{2} σ_{i}^{2} + 2 i = 1 \sum k j > i \sum k α_{i} α_{j} σ_{i} σ_{j} ρ_{i, j},

V [P_{α}] = α^{T} Σ α .

V [P_{α}] = α^{T} Σ α .

α = \frac{Σ ^{- 1} 1 _{k}}{1 _{k}^{T} Σ ^{- 1} 1 _{k}} .

α = \frac{Σ ^{- 1} 1 _{k}}{1 _{k}^{T} Σ ^{- 1} 1 _{k}} .

P_{α} = \frac{1 _{k}^{T} Σ ^{- 1} P}{1 _{k}^{T} Σ ^{- 1} 1 _{k}}, V [P_{α}] = \frac{1}{1 _{k}^{T} Σ ^{- 1} 1 _{k}} .

P_{α} = \frac{1 _{k}^{T} Σ ^{- 1} P}{1 _{k}^{T} Σ ^{- 1} 1 _{k}}, V [P_{α}] = \frac{1}{1 _{k}^{T} Σ ^{- 1} 1 _{k}} .

α min J (α) = α^{T} Σ α, s.t. α^{T} 1 = 1.

α min J (α) = α^{T} Σ α, s.t. α^{T} 1 = 1.

[Σ 1_{k}^{T} 1_{k} 0] [α λ] = [0_{k} 1] .

[Σ 1_{k}^{T} 1_{k} 0] [α λ] = [0_{k} 1] .

α = \frac{Σ ^{- 1} 1}{1 _{k}^{T} Σ ^{- 1} 1 _{k}},

α = \frac{Σ ^{- 1} 1}{1 _{k}^{T} Σ ^{- 1} 1 _{k}},

α_{i} = \frac{1}{σ _{i}^{2}} \frac{1}{\sum _{l = 1}^{k} \frac{1}{σ _{l}^{2}}} 1 + l = 1 \sum k \frac{1}{σ _{l}^{2}} j > l \sum k α_{j} σ_{l} σ_{j} ρ_{l, j} - j > i \sum k α_{j} σ_{i} σ_{j} ρ_{i, j} .

α_{i} = \frac{1}{σ _{i}^{2}} \frac{1}{\sum _{l = 1}^{k} \frac{1}{σ _{l}^{2}}} 1 + l = 1 \sum k \frac{1}{σ _{l}^{2}} j > l \sum k α_{j} σ_{l} σ_{j} ρ_{l, j} - j > i \sum k α_{j} σ_{i} σ_{j} ρ_{i, j} .

α_{i} = \frac{1}{σ _{i}^{2} \sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}}, V [P_{α}] = \frac{1}{\sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}} .

α_{i} = \frac{1}{σ _{i}^{2} \sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}}, V [P_{α}] = \frac{1}{\sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}} .

V [P_{α}] = \frac{1}{\sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}} = σ_{i}^{2} α_{i} < σ_{i}^{2}, \forall i \Rightarrow V [P_{α}] < i = 1, \dots, k min σ_{i}^{2} .

V [P_{α}] = \frac{1}{\sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}} = σ_{i}^{2} α_{i} < σ_{i}^{2}, \forall i \Rightarrow V [P_{α}] < i = 1, \dots, k min σ_{i}^{2} .

P_{α} = i = 1 \sum k α_{i} P_{n_{i}}^{IS},

P_{α} = i = 1 \sum k α_{i} P_{n_{i}}^{IS},

σ_{q_{j}^{'}}^{2} > \frac{k}{\sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}}

σ_{q_{j}^{'}}^{2} > \frac{k}{\sum _{i = 1}^{k} \frac{1}{σ _{i}^{2}}}

V [P_{α}] < V [P_{j^{'}}^{IS}] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Multifidelity probability estimation

via fusion of estimators

Boris Kramer Department of Aeronautics & Astronautics, Massachusetts Institute of Technology, Cambridge, MA [email protected], [email protected].

Alexandre Noll Marques11footnotemark: 1

Benjamin Peherstorfer Courant Institute of Mathematical Sciences, New York University, NY ([email protected]).

Umberto Villa Department of Electrical and Systems Engineering, Washington University in St. Louis, MO ([email protected]).

Karen Willcox Oden Institute for Computational Engineering & Sciences, The University of Texas at Austin, TX ([email protected])

(April 22, 2019)

Abstract

This paper develops a multifidelity method that enables estimation of failure probabilities for expensive-to-evaluate models via information fusion and importance sampling. The presented general fusion method combines multiple probability estimators with the goal of variance reduction. We use low-fidelity models to derive biasing densities for importance sampling and then fuse the importance sampling estimators such that the fused multifidelity estimator is unbiased and has mean-squared error lower than or equal to that of any of the importance sampling estimators alone. By fusing all available estimators, the method circumvents the challenging problem of selecting the best biasing density and using only that density for sampling. A rigorous analysis shows that the fused estimator is optimal in the sense that it has minimal variance amongst all possible combinations of the estimators. The asymptotic behavior of the proposed method is demonstrated on a convection-diffusion-reaction partial differential equation model for which $10^{5}$ samples can be afforded. To illustrate the proposed method at scale, we consider a model of a free plane jet and quantify how uncertainties at the flow inlet propagate to a quantity of interest related to turbulent mixing. Compared to an importance sampling estimator that uses the high-fidelity model alone, our multifidelity estimator reduces the required CPU time by 65% while achieving a similar coefficient of variation.

keywords:

Multifidelity modeling, uncertainty quantification, information fusion, importance sampling, reduced-order modeling, failure probability estimation, PDEs, turbulent jet

1 Introduction

This paper considers multifidelity estimation of failure probabilities for large-scale applications with expensive-to-evaluate models. Failure probabilities are required in, e.g., reliable engineering design and risk analysis. Yet failure probability estimation with expensive-to-evaluate nonlinear models is computationally challenging due to the large number of Monte Carlo samples needed for low-variance estimates.

Efficient failure probability estimation methods aim to reduce the number of samples at which the expensive model is evaluated, e.g., by exploiting variance-reducing sampling strategies, multifidelity/multilevel estimation methods, or sequential sampling approaches. Variance reduction can be obtained through importance sampling [33], which allows for order-of-magnitude reductions in the number of samples needed to reliably estimate a small probability. However, importance sampling relies on having a good biasing distribution which in turn requires insight into the system. Surrogate models can provide such insight at much lower computational cost. Multifidelity approaches (see [38] for a review) that use surrogates for failure probability estimation via sampling have seen great interest recently [26, 8, 27, 13, 42, 35, 14], but require that the user selects a good importance sampling density. Multifidelity methods that avoid the selection of a single biasing density and instead use a suite of surrogate models to generate importance sampling densities were proposed in [36, 37, 32]. Nevertheless, this framework requires all knowledge about the small probability event to be available in the form of biasing densities, and is therefore only applicable to importance sampling estimators. Multilevel Monte Carlo [15, 2] methods use a hierarchy of approximations to the high-fidelity model in the sampling scheme. However, those model hierarchies have to satisfy certain error decay criteria, an assumption we do not make here. Subset simulation [3, 34] and line search [40, 11] can be used directly on the high-fidelity models, and therefore are of a black-box nature.

In this work, in addition to the computationally expensive model, we also have information about the system in form of surrogate models, analytical models, expert elicitation, and reduced models. In other settings where such a variety of information is available, information fusion has been used to combine multi-source probabilistic information into a single estimator, see [9, 31, 28]. Moreover, combining information from multiple models and sources via a weighted multifidelity method can lead to efficient data assimilation strategies [30].

Here, we propose a new approach to enable small probability estimation for large-scale, computationally expensive models that draws from prior work in information fusion, importance sampling, and multifidelity modeling. We use information fusion in combination with multifidelity importance-sampling-based failure probability estimators, where in addition to the variance reduction from importance sampling, we obtain further variance reduction through information fusion. The proposed multifidelity framework uses the available surrogates to compute multiple unbiased failure probability estimators. We then combine them optimally into a new unbiased estimator that has minimal variance amongst all possible linear combinations of those estimators. The method therefore avoids the selection of the lowest variance biasing density to be used for sampling. Selecting the density that leads to the lowest variance in the failure probability estimator would require additional information, and not even error estimates on the surrogate model would suffice. Thus, we circumvent this step and optimally use all information available to us in form of probability estimators.

This paper is structured as follows: In Section 2 we illustrate the challenges in small failure probability computation and cover the necessary background material for multifidelity importance sampling used herein. Section 3 details our proposed approach of information fusion, importance sampling and multifidelity modeling. We then present in Section 4 a moderately expensive convection-diffusion-reaction test case, where we illustrate the asymptotic behavior of our approach. Section 5 discusses a turbulent jet model and demonstrates the computational efficiency of our proposed methods for this computationally expensive model. We close with conclusions in Section 6.

2 Small probability events and importance sampling estimators

We are interested in computing events with small probabilities, e.g., failure events, where the system fails to meet critical constraints. Section 2.1 describes small probability events, Section 2.2 introduces importance sampling and Section 2.3 briefly summarizes multifidelity importance sampling.

2.1 Small probability events

Let $\Omega$ be a sample space which, together with a sigma algebra and probability measure, defines a probability space. Define a $d$ -dimensional random variable $Z:\Omega\mapsto\mathcal{D}\subseteq\mathbb{R}^{d}$ with probability density $p$ , and let $\mathbf{z}$ be a realization of $Z$ . Let $f:\mathcal{D}\subseteq\mathbb{R}^{d}\mapsto\mathbb{R}^{d^{\prime}}$ be an expensive-to-evaluate model of high fidelity with corresponding $d^{\prime}$ -dimensional quantity of interest $f(\mathbf{z})\in\mathbb{R}^{d^{\prime}}$ . Let $g:\mathbb{R}^{d^{\prime}}\mapsto\mathbb{R}$ denote a limit state function that defines failure of the system. If $g(f(\mathbf{z}))<0$ , then $\mathbf{z}\in\mathcal{D}$ is a configuration where the system fails. This defines a failure set

[TABLE]

Define the indicator function $I_{\mathcal{G}}:\mathcal{D}\mapsto\{0,1\}$ via

[TABLE]

The standard Monte Carlo estimator of the failure probability

[TABLE]

uses $n$ realizations $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ of the random variable $Z$ and estimates

[TABLE]

In the special case of small probabilities, standard Monte Carlo may be unfeasible due to the large number of samples needed to obtain good estimators. Since failure probabilities are generally small, most realizations $\mathbf{z}_{i}$ will be outside the failure domain $\mathcal{G}$ , and conversely, only a small fraction of the $n$ samples lies in the failure region. The coefficient of variation (also called relative root-mean-squared error) of the estimator $P_{n}$ is given by

[TABLE]

Thus, to obtain estimators with a small coefficient of variation, a large number of samples is necessary. For instance, if the small probability is $P=10^{-4}$ and if we want $e^{\text{CV}}=10^{-1}$ we would need $n=\mathcal{O}(10^{6})$ samples via standard Monte Carlo approaches. This challenge is amplified by the presence of an expensive-to-evaluate model, such as the model of a free plane jet in Section 5.

2.2 Importance sampling

Importance sampling achieves variance reduction by using realizations of a random variable $Z^{\prime}:\Omega\mapsto\mathcal{D}$ with probability density $q$ . This random variable $Z^{\prime}$ is chosen such that its probability density function $q$ has higher mass (compared to the nominal density $p$ ) in the region of the event of interest. For a general introduction to importance sampling, see [33, Sec.9]. Define the support $\text{supp}(p)=\{\mathbf{z}\in\mathcal{D}\ |\ p(\mathbf{z})>0\}$ , and let $\text{supp}(p)\subseteq\text{supp}(q)$ . Then

[TABLE]

is well defined, where $p(\mathbf{z})/q(\mathbf{z})$ is the likelihood ratio—in the context of importance sampling also called importance weight. The importance-sampling estimate of the failure probability $P$ then draws $n$ realizations $\mathbf{z}^{\prime}_{1},\ldots,\mathbf{z}^{\prime}_{n}$ of the random variable $Z^{\prime}$ with density $q$ and evaluates

[TABLE]

The variance of the importance sampling estimator is

[TABLE]

where

[TABLE]

If $\text{supp}(p)\subseteq\text{supp}(q)$ , and by using (3), one can show that the importance sampling estimator $P_{n}^{\text{IS}}$ is an unbiased estimator of the failure probability, i.e.,

[TABLE]

The importance sampling estimator $P_{n}^{\text{IS}}$ has mean $P$ and variance $\sigma_{q}^{2}/n$ , and by the central limit theorem converges in distribution to the normal random variable $\mathcal{N}(P,\sigma_{q}^{2}/n)$ . Constructing a good biasing density that leads to small $\sigma_{q}^{2}$ is challenging [33]. We next introduce low-fidelity surrogate models, which are then used to construct biasing densities.

2.3 Multifidelity Importance Sampling (MFIS)

Recall that by $f:\mathcal{D}\mapsto\mathbb{R}^{d^{\prime}}$ we denote an expensive-to-evaluate model of high fidelity with corresponding quantity of interest $f(\mathbf{z})\in\mathbb{R}^{d^{\prime}}$ . Let $k$ surrogates

[TABLE]

of lower fidelities be available, which are cheaper to evaluate than the high-fidelity model $f(\cdot)$ . We do not assume any information about the accuracy of the $f^{(i)}(\cdot)$ with respect to the high-fidelity model $f(\cdot)$ . Sections 4.2 and 5.4 detail the specific surrogate models used for the respective applications.

We use the MFIS method (see [35] for details) to obtain $k$ estimators of the failure probability. First, MFIS evaluates the surrogate models $f^{(i)}$ at $m_{i}$ samples to obtain a surrogate-model specific failure set $\mathcal{G}^{(i)}$ . Second, MFIS computes a biasing density $q_{i}$ by fitting a distribution in form of a Gaussian mixture model to the parameters in the failure set. If no failed samples are found by the surrogate model, i.e., if $\mathcal{G}^{(i)}=\emptyset$ , then we set the biasing density to be the nominal density. This leads to $k$ biasing densities $q_{1},\ldots,q_{k}$ from which we get importance sampling estimators

[TABLE]

for $i=1,\ldots,k$ . The variance of the importance sampling estimator is given by (5) with $n=n_{i}$ and $\sigma_{q}=\sigma_{q_{i}}$ , with $\sigma_{q_{i}}^{2}$ being the asymptotic variance from (6) with $q=q_{i}$ .

3 Fusion of multifidelity estimators

In many practical situations, a range of probability estimators are available, for instance in form of MFIS estimators derived from different biasing densities, in form of analytical models, or estimators derived from expert elicitation [31]. If one a priori knew which was the lowest variance estimator then a good strategy would be to sample only from that estimator. However, knowing a priori which estimator has the lowest variance is a formidable task, and one has to draw samples to assess which estimator has the lowest variance. In this section, we present our new approach that combines all available estimators in an optimal fashion by solving the following problem.

Problem 1.

Given $k$ unbiased estimators, $P_{1},\ldots,P_{k}$ with expected value $P$ , i.e. $\mathbb{E}[P_{i}]=P,\ i=1,\ldots,k$ , find an estimator with expected value $P$ of the form

[TABLE]

such that it attains minimal variance amongst all estimators of the form (8). That is, find the optimal weights $\alpha_{i}\in\mathbb{R},\ i=1,\ldots,k$ such that

[TABLE]

The fused estimator approach allows to still use information coming from the other (high-variance) estimators, whose samples would have otherwise gone to waste. Moreover, with the proposed method we can estimate small-probabilities for expensive-to-evaluate models by exploiting a variety of surrogates. We derive expressions for the mean and variance of the fused estimator in Section 3.1. In Section 3.2, we derive the optimal weights for the fused estimator. Section 3.3 then discusses the special case of uncorrelated estimators. Our proposed algorithm is discussed in Section 3.4, followed by a brief Section 3.5 that discusses measures of convergence of the estimators.

3.1 Mean and variance of fused estimator

We start with the observation that if the weights $\alpha_{i}$ of the fused estimator $P_{\boldsymbol{\alpha}}$ sum to one, then the fused estimator is unbiased:

[TABLE]

Let the estimators $P_{i}$ have corresponding variances $0<\sigma_{i}^{2}<\infty,\ i=1,\ldots k$ . To compute the variance of the fused estimator $P_{\boldsymbol{\alpha}}$ we have to consider covariances between the individual estimators. Define the Pearson product-moment correlation coefficient as

[TABLE]

where $\mathbb{C}\text{ov}(P_{i},P_{j})=\mathbb{E}[(P_{i}-\mathbb{E}[P_{i}])(P_{j}-\mathbb{E}[P_{j}])]=\mathbb{E}[P_{i}P_{j}]-P^{2}$ . We also define the symmetric, positive semi-definite covariance matrix $\boldsymbol{\Sigma}_{ij}=\mathbb{C}\text{ov}(P_{i},P_{j})$ as:

[TABLE]

It is worth noticing that if the estimators $P_{1},\ldots,P_{k}$ are independent, then $\boldsymbol{\Sigma}$ is diagonal. The variance of the fused estimator from (8) is

[TABLE]

which can be written in vector form as

[TABLE]

In the following section, we provide an explicit formula to find the optimal weights $\boldsymbol{\alpha}$ for the general case of (possibly)-correlated estimators $P_{1},\dots,P_{k}$ ; while in Section 3.3 we discuss the case of independent estimators, such as those constructed with the MFIS method.

3.2 Optimizing the weights for minimum-variance estimate

Problem (9) seeks the optimal $\boldsymbol{\alpha}$ such that the variance in (12) is minimized and $P_{\boldsymbol{\alpha}}$ remains unbiased. In this section, we show that such weights exist, are unique, and present a closed-form solution, provided that the covariance matrix $\boldsymbol{\Sigma}$ is invertible. This is summarized in the following result.

Proposition 1.

Let $\boldsymbol{P}=[P_{1},\ldots,P_{k}]^{T}$ be the vector of probability estimators and assume that $\boldsymbol{\Sigma}$ is not singular. Define $\boldsymbol{1}_{k}=[1,\dots,1]^{T}$ as a column-vector of length $k$ . The optimization problem (9) has the unique solution

[TABLE]

That is, the minimal variance unbiased estimator $P_{\boldsymbol{\alpha}}$ is such that

[TABLE]

Proof.

We have seen above that $\sum_{i=1}^{k}\alpha_{i}=1$ if and only if $\mathbb{E}[P_{\boldsymbol{\alpha}}]=P$ . Define the cost function $J(\boldsymbol{\alpha}):=\mathbb{V}[P_{\boldsymbol{\alpha}}]=\boldsymbol{\alpha}^{T}\boldsymbol{\Sigma}\boldsymbol{\alpha}$ by using equation (12). Therefore, the optimization problem (9) can be written as the quadratic program

[TABLE]

Letting $\mathcal{L}(\boldsymbol{\alpha},\lambda):=\boldsymbol{\alpha}^{T}\boldsymbol{\Sigma}\boldsymbol{\alpha}+\lambda(\boldsymbol{\alpha}^{T}\boldsymbol{1}-1)$ denote the Lagrangian cost function associated to (13), the optimality conditions are $\nabla_{\boldsymbol{\alpha}}\mathcal{L}(\boldsymbol{\alpha},\lambda)=\boldsymbol{0}$ and $\frac{d\mathcal{L}}{d\lambda}(\boldsymbol{\alpha},\lambda)=0$ . This optimality system is written as

[TABLE]

For invertible $\boldsymbol{\Sigma}$ , the unique weights to this quadratic program are then obtained by

[TABLE]

and the expression for the variance follows by inserting these weights into (13). The estimator is obtained by inserting the weights into (8). ∎

The weights can be expressed explicitly in terms of the components of the covariance matrix as

[TABLE]

Note, that the weights are inversely proportional to the variance of the individual estimators and the weight $\alpha_{i}$ depends on the covariance between the estimators $P_{i}$ and $P_{j}$ . Also, note that if $P_{i}$ are correlated some weights may be negative, while for a diagonal $\boldsymbol{\Sigma}$ all weights $\alpha_{i}$ are strictly positive. In the next section, we have a closer look at the uncorrelated case.

3.3 The special case of uncorrelated estimators

In the situation where all estimators are uncorrelated, we recover the classical result of the inverse variance-weighted mean [29]. As a corollary from Proposition 1 we get the following result.

Corollary 1.

Consider the setting from Proposition 1, and let $\boldsymbol{\Sigma}=\textup{diag}(\sigma_{1}^{2},\ldots,\sigma_{k}^{2})$ be diagonal. Then the unique solution to the optimization problem (9) is given by

[TABLE]

A few observations about this special case are in order:

The optimal coefficients $\alpha_{i}$ of the combined estimator $P_{\boldsymbol{\alpha}}$ are inversely proportional to the asymptotic variance $\sigma_{i}$ of the corresponding estimator $P_{i}$ . To reduce the variance via a weighted combination of estimators, smaller weights are assigned to estimators with larger variance. 2. 2.

If one variance is small compared to all other ones, say $\sigma_{1}^{2}\ll\sigma_{i}^{2},\ i=2,\ldots,k$ , then $\sum_{i=1}^{k}\frac{1}{\sigma_{i}^{2}}\approx\frac{1}{\sigma_{1}^{2}}$ so that $\mathbb{V}[P_{\boldsymbol{\alpha}}]\approx\sigma_{1}^{2}$ . The estimators with large variance cannot reduce the variance of the fused estimator much more. 3. 3.

If all estimators have equal variance, $\sigma_{1}^{2}=\ldots=\sigma_{k}^{2}$ , then $\sum_{i=1}^{k}\frac{1}{\sigma_{i}^{2}}=\frac{k}{\sigma_{1}}$ so that $\mathbb{V}[P_{\boldsymbol{\alpha}}]=\frac{\sigma_{1}^{2}}{k}$ . Hence, combining the estimators reduces the variance by a factor of $k$ . 4. 4.

Since $0<\alpha_{i}<1,\forall i$ , it follows from both equations in (17) that

[TABLE]

Consequently, we are guaranteed to reduce the variance in $P_{\boldsymbol{\alpha}}$ by combining all estimators in the optimal way described above.

3.4 Fused multifidelity importance sampling: Algorithm and Analysis

We now use the general fusion framework to obtain a failure probability estimator. Thus, we solve Problem 1 in the context of importance-sampling-based failure probability estimators so that $P_{i}=P_{n_{i}}^{\text{IS}}$ . Our proposed method optimally fuses the $k$ MFIS estimators from (7), such that

[TABLE]

with the optimal weights chosen as in Proposition 1 and $\sum_{i=1}^{k}n_{i}=n$ . Since estimator $P_{n_{i}}^{\text{IS}}$ is computed from $n_{i}$ samples, $P_{\boldsymbol{\alpha}}$ uses $n=\sum_{i=1}^{k}n_{i}$ samples.

We now discuss how $P_{\boldsymbol{\alpha}}$ compares to a single importance sampling estimator with $n$ samples. Consider the estimator $P_{j^{\prime}}^{\text{IS}}$ that uses $n$ samples drawn from a single biasing density $q_{j^{\prime}}$ for $j^{\prime}\in\{1,\ldots,k\}$ . This estimator would require selection of the lowest biasing density a priori, a formidable task. The next results compares $P_{\boldsymbol{\alpha}}$ and $P_{j^{\prime}}^{\text{IS}}$ , and gives a criterion for which the former has lower variance than the latter.

Proposition 2.

Let $k$ estimators $P_{n_{i}}^{\text{IS}}$ with $n_{1}=n_{2}=\ldots=n_{k}$ samples be given. Let $j^{\prime}\in\{1,\ldots,k\}$ , and $q_{j^{\prime}}$ be a biasing density that is used to derive an IS estimator $P_{j^{\prime}}^{\text{IS}}$ with $n=kn_{1}$ samples. If

[TABLE]

then the variance of the fused estimator $P_{\boldsymbol{\alpha}}$ in (19) with $n$ samples is smaller than the variance of the estimator with biasing density $q_{j^{\prime}}$ with $n$ samples, i.e.,

[TABLE]

Proof.

Set $n_{i}=n/k,\ i=1,\ldots,k$ , so that all estimators use the same number of samples. According to equation (17),

[TABLE]

as well as $\mathbb{V}[P_{j^{\prime}}^{\text{IS}}]=\frac{\sigma^{2}_{q_{j^{\prime}}}}{n}$ , so that

[TABLE]

∎

The importance sampling estimate (7) requires evaluating the high-fidelity model at $n_{i}$ samples from the biasing density. While not required, we use $n_{i}=n/k,\ i=1,\ldots,k$ to distribute the computational load evenly. Extension of Proposition 2 is straightforward to the case with different number of samples $n_{j}$ for each estimator $P_{j}$

The computational procedure is summarized in Algorithm 1. Here, we denote sampling-based estimates as $\hat{P}_{n_{i}}^{\text{IS}}$ , which are realizations of the estimator $P_{n_{i}}^{\text{IS}}$ .

3.5 Error measures and practical computation

The failure probability estimate $\hat{P}_{n_{i}}^{\text{IS}}$ is computed as in (20) and the sample variance $\hat{\sigma}_{q_{i}}^{2}$ as in (21). The root-mean-squared-error (RMSE) of the estimate $\hat{P}_{n_{i}}$ is

[TABLE]

and the relative mean-squared-error, or coefficient of variation is computed as

[TABLE]

4 Test case: Convection-diffusion-reaction

We first consider a PDE model whose solution can be numerically evaluated with moderate computational cost. With this model, we demonstrate the asymptotic behavior of our method because we can afford to sample the high-fidelity model $n=10^{5}$ times, which will be too costly for the model in Section 5. The test problem is the convection-diffusion-reaction PDE introduced in Section 4.1. Its discretizations and reduced-order models are described in Section 4.2. Numerical results are presented in Section 4.3.

4.1 Convection-diffusion-reaction PDE model

We consider a simplified model of a premixed combustion flame at constant and uniform pressure, and follow the notation and set-up in [7, Sec.3]. The model includes a one-step reaction of the species

[TABLE]

in the presence of an additional non-reactive species, nitrogen. The physical combustor domain is $18mm$ in length ( $x$ -direction), and $9mm$ in height ( $y$ -direction), as shown in Figure 1.

The velocity field $U$ is set to be constant in the positive $x$ direction, and divergence free. The molecular diffusivity $\kappa$ is modeled as constant, equal and uniform for all species and temperature. The PDE model is given by

[TABLE]

where the state is comprised of the components $s=[T,Y_{H_{2}},Y_{O_{2}},Y_{H_{2}O}]$ , with the $Y_{i}$ being the mass fractions of the species (fuel, oxidizer, product), and $T$ denoting the temperature. Referring to Figure 1, we have that $\Gamma_{D}=\Gamma_{D,i}\cup\Gamma_{D,0}$ is the Dirichlet part of the boundary and $\Gamma_{N}$ combines the top, bottom and right boundary, where Neumann conditions are prescribed. In sum, $\partial\tilde{\Omega}=\Gamma_{D}\cup\Gamma_{N}$ ; the boundary conditions are imposed as given in Table 1. The nonlinear reaction term $\mathcal{F}(s,\mathbf{z})=[\mathcal{F}_{T},\mathcal{F}_{H_{2}},\mathcal{F}_{O_{2}},\mathcal{F}_{H_{2}O}](s,\mathbf{z})$ is of Arrhenius type [10], and modeled as

[TABLE]

The parameters of the model are defined in Table 2. The uncertain parameters are the pre-exponential factor $A$ and the activation energy $E$ of the Arrhenius model. The domain for these parameters is denoted as $\mathcal{D}$ . In particular, we have that

[TABLE]

4.2 Discretization and reduced-order models

The model is discretized using a finite difference approximation in two spatial dimensions, with 72 nodes in $x$ direction, and 36 nodes in $y$ direction, leading to $10,804$ unknowns in the model. The nonlinear system is solved with Newton’s method. Let $\mathbf{T}(\mathbf{z})$ be the vector with components corresponding to the approximations of the temperature $T(x,y;\mathbf{z})$ at the grid points. The high-fidelity model (HFM) is $f:\mathcal{D}\mapsto\mathbb{R}$ and the quantity of interest is the maximum temperature over all grid points:

[TABLE]

Reduced-order models provide a powerful framework to obtain surrogates for expensive-to-evaluate models. In the case of nonlinear systems, reduced-order models can be obtained via reduced-basis methods [19], dynamic mode decomposition [24], proper orthogonal decomposition [5], and many others; for a survey, see [4]. Here, we compute reduced-order models $f^{(i)}$ for our multifidelity approach via Proper Orthogonal Decomposition and the Discrete Empirical Interpolation Method (DEIM) for an efficient evaluation of the nonlinear term. The training snapshots are generated from solutions to the high-fidelity model on a parameter grid of $50\times 50$ equally spaced values $\mathbf{z}\in\mathcal{D}$ . The three surrogate models are built from $2,10,15$ POD basis vectors, and accordingly $2,5,10$ DEIM interpolation points. The corresponding models are denoted as ROM1, ROM2, ROM3, respectively. We denote by $\mathbf{T}_{r}^{(i)}(\mathbf{z})$ the approximation to the temperature $T(x,y;\mathbf{z})$ via the $i$ th ROM. The surrogate models $f^{(i)}$ are the mappings $f^{(i)}:\mathcal{D}\mapsto\mathbb{R}$ with corresponding quantity of interest denoted as

[TABLE]

We refer the reader to [7] for more details on the discretization and ROM construction for this convection-diffusion-reaction model.

4.3 Results for multifidelity fusion of failure probabilities

We define a failure of the system when the maximum temperature in the combustor exceeds $2430$ K, so that the limit state function is

[TABLE]

and likewise for the reduced-order models $g(f^{(i)}(\mathbf{z}))=2430-f^{(i)}(\mathbf{z})$ .

To compute the biasing densities, we draw $\hat{m}=20,000$ samples from the uniform distribution on $\mathcal{D}$ , compute surrogate-based solutions, and evaluate the limit state function for those solutions. If the limit state function indicates failure of the system for a solution obtained from the $i$ th surrogate model, the corresponding parameter is added to $\mathcal{G}^{(i)}$ , the failure set computed from the $i$ th surrogate model. We compute the biasing densities $q_{1},q_{2},q_{3}$ via MFIS (see Section 2.3) as Gaussian mixture distributions with a single component. Table 3 shows the computational cost in CPU time of computing the biasing distributions from the various ROMs and the HFM. Computing a biasing density using the high-fidelity model with $\hat{m}=20,000$ samples costs approximately $2.1$ CPU-hours. Constructing the biasing density via the low-fidelity models ROM2 and ROM3 reduces the computational time by a factor of 66 and 58, respectively. Note, that ROM1 is the reduced-order model that is cheapest to execute per model evaluation, but it is also the least accurate. In our case, ROM1 did not produce any samples in the failure region, even after $\hat{m}=10^{5}$ samples. It is not unexpected that ROM1 is so inaccurate, since only two POD modes are not enough to resolve the important character of this problem. ROM1 is included to demonstrate how the fusion approach can be effective even in the presence of highly inaccurate surrogate models.

In Figure 2 we show the quantity of interest, i.e., the maximum temperature. The plots are obtained by generating $m=10^{5}$ samples from the nominal distribution (left) and the respective biasing distributions (right), and evaluating the HFM at those samples. Figure 2, left, shows that the typical range of the quantity of interest is between approximately 1200K and 2440K. However, only the events where the quantity of interest is above 2430K are relevant for the failure probability computation. By using the biasing distributions in Figure 2, right, a large portion of the outputs leads to a failure of the system. This indicates that the biasing distributions are successful in generating samples at the failure region of the high-fidelity model.

Next, we show results for the fused multifidelity estimator $P_{\boldsymbol{\alpha}}$ with $n$ samples and compare it with importance sampling estimators $\hat{P}_{n_{i}}^{\text{IS}}$ that only use a single biasing density and also $n$ samples. The fused estimator is obtained via Algorithm 1 with $n_{i}=\lfloor n/3\rfloor,i=1,2,3,$ samples by fusing the three surrogate-model-based importance sampling estimators. For reference purposes, a biasing density is constructed as described above using the HFM with $\hat{m}=20,000$ samples. Based on this density, we compute an importance sampling estimate of the failure probability with $n=10^{5}$ samples, resulting in $\hat{P}_{10^{5}}^{\text{IS}}=8.42\times 10^{-4}$ .

To assess the quality of the fused estimator $P_{\boldsymbol{\alpha}}$ , we consider the error measures introduced in Section 3.5. In Figure 3, left, we show the root mean-squared error of the importance sampling estimators $\hat{P}_{n_{i}}^{\text{IS}}$ as well as the combined estimator $\hat{P}_{\boldsymbol{\alpha}}$ . Figure 3, right, shows the coefficient of variation defined in (24) for the estimators. The fused estimator is competitive in RMSE and coefficient of variation with the estimator using the high-fidelity biasing density, but comes at a much lower computational cost.

Note, that the fused estimator does not use any of the high-fidelity information. We only plotted the high-fidelity estimator for comparison reasons, but the high-fidelity density is not used in our algorithm. Heuristically, we could expect the fused estimator to perform better than the MFIS estimator with high-fidelity-derived biasing density in the following situation. Let the HFM be so expensive that the HF biasing density is built only from a few failure samples, and assume the low-fidelity models are good surrogates, hence able to cheaply explore the failure region. Then the low-fidelity biasing density could be better than the high-fidelity biasing density.

In Table 4 we show the weights for the fused estimator $\hat{P}_{\boldsymbol{\alpha}}$ . The fused estimator assigns only a small weight $\alpha_{1}$ to the estimator $\hat{P}_{n_{1}}^{\text{IS}}$ which uses biasing density $q_{1}$ . This was expected, as the estimator has large variance due to the fact that biasing density $q_{1}$ is actually the nominal density, see Table 3 as the ROM1 evaluation did not yield any samples in the failure domain.

4.4 Comparison to subset simulation methods

To demonstrate the efficiency of our proposed multifidelity method compared to state-of-the-art existing methods in failure probability estimation, we compare our results to subset simulation [3], a widely used method for reliability analysis and failure probability estimation. The method defines intermediate failure events

[TABLE]

for a sequence of threshold levels $b_{1}>b_{2}>\ldots>b_{L}=0$ and $L$ being the final level. This ensures that the intermediate failure events are nested as $\mathcal{G}_{1}\supset\mathcal{G}_{2}\supset\ldots\supset\mathcal{G}_{L}=\mathcal{G}$ . The failure probability can then be expressed as

[TABLE]

Thus, this method requires sampling from the conditional events $\mathcal{G}_{j}|\mathcal{G}_{j-1}$ , and the efficiency of this sampling is pivotal to the success of subset simulation. Markov Chain Monte Carlo (MCMC) methods provide efficient solutions to this problem [34]. Note, that the $b_{j}$ cannot be determined in advance, but are found adaptively by specifying an intermediate failure probability $p_{0}=P(\mathcal{G}_{j}|\mathcal{G}_{j-1})$ . A typical choice is $p_{0}=0.1$ which yields efficient subset simulation results, see [3].

Here, we compare our fused importance sampling approach for failure probability estimation to a direct application of subset simulation to the full model. We follow the recent MCMC implementation for subset simulation of [34]. Table 5 lists the computational results that include the number of levels $L$ that subset simulation needed to arrive at the failure probability estimate, the samples at each level (user defined), the failure probability estimate, and the overall number of samples needed (not known beforehand). All results were averaged over ten independent runs. We also give an approximate coefficient of variation, although we caution that this is not the same coefficient of variation defined in (2), since at the intermediate levels, subset simulation produces correlated samples. Thus, we used an approximated coefficient of variation as suggested in [45, Eq. (19)]. For a thorough discussion of the coefficient of variation estimation within subset simulation we refer to [6, Sec.5.3]. We observe from Table 5 that the coefficient of variation monotonically decreases as more samples are added. To compare our proposed multifidelity fusion method with subset simulation, we first note that the estimate from subset simulation $\hat{P}_{f}$ is biased for finite $N$ (see [3, Sec.6.3]), whereas our fused estimator $\hat{P}_{\alpha}$ is unbiased. Moreover, the numerical results in Table 5 show that the estimated coefficients of variation are about one order of magnitude larger than the coefficients of variation we reported in Figure 3, right. From a computational cost perspective, the estimator with 20,000 samples in subset simulation produces an approximated coefficient of variation of $1.18\times 10^{-1}$ whereas our fused estimator $\hat{P}_{\boldsymbol{\alpha}}$ produces a coefficient of variation of $1.34\times 10^{-2}$ for the same number of high-fidelity model evaluations. Thus, the fused estimator outperforms subset simulation in this particular example. In sum, our method can successfully take advantage of cheaper low-fidelity methods to get accurate estimators, while the subset simulation method works directly with the full model and therefore does not have access to cheaper surrogate model information.

5 Failure probability estimation related to a free plane jet

We apply the proposed fusion of estimators to quantify the influence of uncertain parameters on the amount of turbulent mixing produced by a free plane jet.

This is a challenging problem, since it involves an expensive-to-evaluate model for which the naive computation of low probabilities requires thousands of hours of computation. We reduce this number significantly with our multifidelity importance sampling framework via fusion of estimators.

The remainder of this section is organized as follows. Section 5.1 introduces the free plane jet, followed by details of the model and its governing equations in Section 5.2. The uncertain parameters and quantity of interest are defined in Section 5.3. The low-fidelity surrogate models used in this investigation are discussed in Section 5.4. Finally, the results for multifidelity fusion of small probability estimators are presented in Section 5.5.

5.1 Large-scale application: Free plane jet

Free turbulent jets are prototypical flows believed to represent the dynamics in many engineering applications, such as combustion and propulsion. As such, free jet flows are the subject of several experimental [18, 17, 23] and numerical investigations [44, 39, 41, 21, 22] and constitute an important benchmark for turbulent flows.

Our expensive-to-evaluate model of a free plane jet is based on the two-dimensional incompressible Reynolds-averaged Navier-Stokes (RANS) equations, complemented by the $k-\epsilon$ turbulence model. Although a RANS model does not resolve all relevant turbulent features of the flow, it represents a challenging large-scale application for the computation of small probabilities. We use this model to investigate the influence of five uncertain parameters on the amount of turbulent mixing produced by the jet. We quantify turbulent mixing using a relatively simple metric: the integral jet width. One of the uncertain parameters is the Reynolds number at the inlet of the jet, which is assumed to vary from 5,000 to 15,000. The other four uncertain parameters correspond to coefficients of the $k-\epsilon$ turbulence model and its boundary condition, as detailed in Section 5.3. Figure 4 shows a flow field typical of the cases considered here.

5.2 Modeling and governing equations

We consider a free plane jet in conditions similar to the ones reported in [21, 22]. Namely, the flow exits a rectangular nozzle into quiescent surroundings with a prescribed top-hat velocity profile and turbulence intensity. The nozzle has width $D$ , and is infinite along the span-wise direction. The main difference between the free plane jet we considered here and the one described in [21, 22] is the Reynolds number at the exit nozzle. Here the Reynolds number varies between 5,000 and 15,000.

Our simulation model computes the flow in a rectangular domain $\Omega$ located at a distance $5D$ downstream from the exit of the jet nozzle, as illustrated in Figure 5. By doing so, modeling the conditions at the exit plane of the jet nozzle is avoided. Instead, direct numerical simulation data are used to define inlet conditions at the surface $\Gamma_{\text{in}}$ .

The dynamics are modeled with the incompressible Reynolds-averaged Navier-Stokes equations, complemented by the $k-\epsilon$ turbulence model [25]:

[TABLE]

where $\boldsymbol{v}=[v_{x},v_{y}]$ denotes the velocity vector, $p$ denotes pressure, $\rho$ is the density, $\nu$ is the kinematic viscosity, and $\bar{\bar{S}}$ is the strain rate tensor given by

[TABLE]

In the $k-\epsilon$ turbulence model, $k$ denotes the turbulent kinetic energy, $\epsilon$ denotes the turbulent dissipation, and $\nu_{t}$ denotes the turbulent kinematic viscosity, defined as

[TABLE]

The coefficients111We use $\sigma_{k}$ and $\sigma_{\epsilon}$ here as model coefficients, which is typical notation in fluids community. These are only used in this section, and throughout the paper $\sigma$ ’s are variances. $C_{\mu}$ , $C_{1\epsilon}$ , $C_{2\epsilon}$ , $\sigma_{k}$ , $\sigma_{\epsilon}$ in (31)–(33) are either considered as uncertain parameters, or are functions of uncertain parameters, as detailed in Section 5.3.

At the inlet surface $\Gamma_{\text{in}}$ Dirichlet boundary conditions are imposed. Data obtained by the direct numerical simulation described in [22] (Reynolds number 10,000) are used to determine reference inlet profiles for velocity, $\boldsymbol{v_{\text{ref}}}$ , and for turbulent kinetic energy, $k_{\text{ref}}$ . Inlet conditions are allowed to vary by defining a velocity intensity ( $U$ ) scale, which is applied to the reference profiles. Turbulent dissipation at the inlet is estimated by assuming a mixing length model. Thus, the boundary conditions at the inlet surface are given by

[TABLE]

where $\ell_{m}$ denotes the mixing length parameter.

At the symmetry axis surface, $\Gamma_{\text{sym}}$ , no-flux boundary conditions are imposed through a combination of Dirichlet and Neumann conditions of the form

[TABLE]

Finally, at the surface $\Gamma_{\text{ff}}$ “far-field” conditions that allow the entrainment of air around the jet are imposed through weak Dirichlet conditions, as detailed in [43].

The complete model includes additional features that make it more amenable to numerical discretization. The most delicate issue in the solution of the RANS model is the possible loss of positivity of the turbulence variables. To avoid this issue, we introduce an appropriately mollified (and thus smoothly differentiable) max function to ensure positivity of $k$ and $\varepsilon$ . In addition, if inflow is detected at any point on the far-field boundary, the boundary condition is switched from Neumann to Dirichlet by means of a suitably mollified indicator of the inflow region. Finally, we stabilize the discrete equations using a strongly consistent stabilization technique (Galerkin Least Squares, GLS, stabilization) to address the convection-dominated nature of the RANS equations. The complete formulation is shown in [43].

The model equations described above are solved numerically using a finite element discretization. The discretization is implemented in FEniCS [1] by specifying the weak form of the residual, including the GLS stabilization and mollified versions of the positivity constraints on $k$ and $\epsilon$ and the switching boundary condition on the outflow boundary. To solve the nonlinear system of equations that arise from the finite element discretization, we employ a damped Newton method. The bilinear form of the state Jacobian operator is computed using FEniCS’s symbolic differentiation capabilities. Finally, we use pseudo-time continuation to guarantee global convergence of the Newton method to a physically stable solution (if such solution exists) [20]. The finite element solver is detailed in [43].

5.3 Uncertain parameters and quantity of interest

In this investigation five uncertain parameters are considered: velocity intensity at inlet222Since we keep other physical parameters constant, by varying the velocity intensity we effectively change the Reynolds number. ( $U$ ), mixing length at inlet ( $\ell_{m}$ ), and the $k-\epsilon$ turbulence model coefficients $C_{\mu}$ , $C_{2\epsilon}$ , and $\sigma_{k}$ :

[TABLE]

The parameter domain is $\mathbf{z}\in\mathcal{D}=[0.5,1.5]\times[0.05,0.15]\times[0.01,0.15]\times[1.1,2.5]\times[0.5,2.5]$ , and the nominal distribution of parameters is uniform in $\mathcal{D}$ .

The other two coefficients of the $k-\epsilon$ turbulence model, $C_{1\epsilon}$ and $\sigma_{\epsilon}$ , are also uncertain but do not vary independently. According to Dunn et al. [12], empirical evidence suggests that $C_{1\epsilon}$ is related to $C_{2\epsilon}$ by

[TABLE]

In addition, as noted in [12, 16], the log-law implies that $\sigma_{\epsilon}$ must follow from

[TABLE]

where $\kappa=0.41$ is the von Kárman constant.

The quantity of interest is the integral jet width measured at $x=27.5D$ :

[TABLE]

where $v_{x_{0}}=v_{x}(x=27.5D,y=0;\mathbf{z})$ . Figure 6 illustrates a typical solution behavior for this turbulent jet by plotting contours of the turbulent kinetic energy for selected samples in $\mathcal{D}$ .

5.4 Simplified-physics surrogate models

We consider four surrogate models to represent the dynamics of the free plane jet flow. The models are based on two distinct computational grids (fine and coarse), and on two representations of turbulence effects. The fine computational grid contains 10,000 elements and 5,151 nodes, while the coarse grid contains 2,500 elements and 1,326 nodes. Furthermore, the models are based either on the complete $k-\epsilon$ turbulence model described in the previous section, or on a prescribed turbulent viscosity field.

In the latter case, the turbulent viscosity field is estimated by a linear interpolation based on 243 conditions that span the input parameter space $\mathcal{D}$ on a uniform grid (3 points along each of the 5 dimensions). At each of these 243 conditions, the turbulent viscosity field is computed with the $k-\epsilon$ turbulence model and the fine computational grid.

The following four low-fidelity models are increasingly complex in terms of either modeled physics or grid resolution:

•

LFM1–CI: Coarse, interpolated; combines the interpolated turbulence viscosity field with the coarse computational grid (3,978 degrees of freedom); average computational time $25$ s;

•

LFM2–FI: Fine, interpolated; combines the interpolated turbulence viscosity field with the fine computational grid (15,453 degrees of freedom); average computational time $72$ s;

•

LFM3–CKE: Coarse $k-\epsilon$ ; combines the $k-\epsilon$ turbulence model with the coarse computational grid (6,630 degrees of freedom); average computational time $109$ s;

•

HFM: High-fidelity model; combines the $k-\epsilon$ turbulence model with the fine computational grid (25,755 degrees of freedom); average computational time $590$ s.

Note that the models based on an interpolated turbulent viscosity field run four to eight times faster than the corresponding models based on the $k-\epsilon$ turbulence model.

This speedup results from eliminating (31)–(32) from the governing equations, which leads to a reduction in the total number of degrees of freedom (elimination of variables $k$ and $\epsilon$ ) and simplifications in the numerical discretization.

Let $\boldsymbol{v}_{i}$ , $i=$ HFM, LFM1, LFM2, LFM3, denote the velocity field computed with the models above. The high-fidelity model is the mapping from the inputs to the quantity of interest (jet width from (34)) for a velocity field computed with the most complex representation of the flow dynamics, $\boldsymbol{v}_{\text{HFM}}$ :

[TABLE]

The surrogate models are defined in a similar fashion as

[TABLE]

5.5 Results for multifidelity fusion of small probability estimators

We define a design failure when the jet width is below the value 0.98. Hence, the limit state function is given by

[TABLE]

We compute the biasing distributions $q_{i}$ for $i=\text{LFM1, LFM2, LFM3}$ from the three low-fidelity surrogate models via MFIS (see Section 2.3). For each surrogate, we draw $\hat{m}=20,000$ parameter samples from the uniform distribution on $\mathcal{D}$ and evaluate the limit state function applied to the resulting quantity of interest. If the limit state function indicates failure of the system for a solution obtained from the $i$ th surrogate model, the corresponding parameter is added to $\mathcal{G}^{(i)}$ , the failure set computed from the $i$ th surrogate model. We then fit a multivariate Gaussian to the samples in $\mathcal{G}^{(i)}$ , resulting in the biasing densities $q_{\text{LFM1}},q_{\text{LFM2}},q_{\text{LFM3}}$ .

Evaluation of the limit state function with the threshold value of 0.98 resulted in few samples in the failure region, so we increased it to 1.12 to obtain more samples to compute the biasing density from. For the three surrogate models and the high-fidelity model, the $\hat{m}=20,000$ evaluations yield 21, 21, 30 and 76 samples, respectively, where the QoI falls below that increased threshold. This strategy yields an efficient biasing density as we see below. As reference, we repeat the same process with the high-fidelity model, resulting in the biasing distribution $q_{\text{HFM}}$ .

First, we investigate the quality of the biasing distributions. For reference, Figure 7, left, shows the result of $10^{3}$ uniform sample evaluations with the four computational models. Note that hardly any samples are below the failure threshold. In contrast, the quantity of interest computed from samples of the four biasing distributions is shown in Figure 7, right. The biasing distributions give between 10%-50% of the 1000 drawn samples in the failure domain. Note, that the y-axis scaling of both figures is different, which also shows that the biased samples result in a tighter range of QoI values than the unbiased samples. Thus, the biasing distributions are indeed biased towards the failure region, and therefore the multifidelity strategy provides a viable way of saving computational time to inform a biasing distribution.

The reference failure probability is computed via importance sampling with ${n=10^{4}}$ samples drawn from the HFM biasing distribution and is ${\hat{P}_{7,500,q_{\text{HFM}}}^{\text{IS}}=7.25\times 10^{-4}}$ . We compute the estimators $P_{n_{i}}^{\text{IS}},i=1,2,3$ with $n_{i}$ samples using the biasing densities $q_{\text{LFM1}},q_{\text{LFM2}},q_{\text{LFM3}}$ derived from the three surrogate models. We obtain the fused multifidelity estimator $P_{\boldsymbol{\alpha}}$ as described above in Algorithm 1 with $n_{i}=\lfloor n/3\rfloor,i=1,2,3,$ samples by fusing the three surrogate-model-based importance sampling estimators. The fused estimator thus uses a total of $n$ samples. We compare these estimators with an estimator that uses $n$ samples from the HFM biasing density $q_{\text{HFM}}$ . The estimators and the error measures are averaged over three independent runs.

The coefficient of variation (24) is shown in Figure 8, left. The biasing density derived from the high-fidelity model yields the best estimator among all the models, as expected. The fused estimator yields a better coefficient of variation than LFM2 and LFM3, shows almost identical convergence as the estimator using $q_{\text{LFM1}}$ . Table 6 shows the three weights for the fused estimator $\hat{P}_{\boldsymbol{\alpha}}$ as given in Proposition 1, according to which, the estimates with the lowest variance get assigned the largest weights.

The CPU-hours to compute the biasing densities via this approach are shown in Figure 8, right. Since MC methods are embarrassingly parallel, any practical implementation can take advantage of this. Our numerical experiments were parallelized on a computing cluster with 55 nodes. Each node is a quad-core Intel Xenon E5-1620 with 3.6 GHz and 10MB Cache. The nodes have either 32GB or 64GB RAM. To put the CPU-hours savings achieved by using the high-fidelity model versus the lower-fidelity models to construct biasing densities into perspective, we see that using LFM1 reduces the computational cost by 96%, LFM2 by 88% and LFM3 by 81.5%. If we are using a fused estimator of all three models, we still save more than 65% computational effort compared to using the HFM, see Figure 8, right. This significant time difference can have important implications for engineering practice, as it translates into faster evaluation time and savings in CPU-hours .

6 Conclusions

We enabled the estimation of small probabilities for expensive-to-evaluate models via a new approach drawing from importance sampling, multifidelity modeling and information fusion. The effectiveness of our proposed approach is demonstrated on a convection-diffusion-reaction PDE, where asymptotic numerical results could be obtained. The strength of the proposed framework is then shown on the target application of the turbulent jet, a challenging problem for small-probability computation due to its high computational cost. The proposed framework was illustrated for the special case of importance-sampling based estimators, but applies to a much broader class of estimators, as long as the estimators are unbiased. An investigation of correlated estimators and the effect of correlation for variance reduction would be an interesting future direction. By fusing different estimators, we avoid the difficult biasing density selection problem. We also showed that this strategy always outperforms sampling from the worst biasing density. The numerical results suggest that the fused estimator is often comparable to an estimator that samples from the best biasing density only.

Acknowledgements

The authors thank Prof. M. Klein for sharing the DNS data in [22] with us. This work was supported by the Defense Advanced Research Projects Agency [EQUiPS program, award W911NF-15-2-0121, Program Manager F. Fahroo]; the Air Force [Center of Excellence on Multi-Fidelity Modeling of Rocket Combustor Dynamics, award FA9550-17-1-0195]; and the US Department of Energy, Office of Advanced Scientific Computing Research (ASCR) [Applied Mathematics Program, awards DE-FG02-08ER2585 and DE-SC0009297, as part of the DiaMonD Multifaceted Mathematics Integrated Capability Center].

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. S. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richardson, J. Ring, M. E. Rognes, and G. N. Wells. The F Eni CS project version 1.5. Archive of Numerical Software , 3(100), 2015.
2[2] L. J. Aslett, T. Nagapetyan, and S. J. Vollmer. Multilevel Monte Carlo for reliability theory. Reliability Engineering & System Safety , 165:188–196, 2017.
3[3] S.-K. Au and J. L. Beck. Estimation of small failure probabilities in high dimensions by subset simulation. Probabilistic Engineering Mechanics , 16(4):263–277, 2001.
4[4] P. Benner, M. Ohlberger, A. Cohen, and K. Willcox. Model Reduction and Approximation: Theory and Algorithms . Society for Industrial and Applied Mathematics, Philadelphia, PA, 2017.
5[5] G. Berkooz, P. Holmes, and J. L. Lumley. The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics , 25(1):539–575, 1993.
6[6] W. Betz. Bayesian inference of engineering models . Ph D thesis, Technische Universität München, 2017.
7[7] M. Buffoni and K. Willcox. Projection-based model reduction for reacting flows. In 40th Fluid Dynamics Conference and Exhibit , page 5008, 2010.
8[8] P. Chen and A. Quarteroni. Accurate and efficient evaluation of failure probability for partial different equations with random input data. Computer Methods in Applied Mechanics and Engineering , 267:233–260, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Multifidelity probability estimation

Abstract

keywords:

1 Introduction

2 Small probability events and importance sampling estimators

2.1 Small probability events

2.2 Importance sampling

2.3 Multifidelity Importance Sampling (MFIS)

3 Fusion of multifidelity estimators

Problem 1**.**

3.1 Mean and variance of fused estimator

3.2 Optimizing the weights for minimum-variance estimate

Proposition 1**.**

Proof.

3.3 The special case of uncorrelated estimators

Corollary 1**.**

3.4 Fused multifidelity importance sampling: Algorithm and Analysis

Proposition 2**.**

Proof.

3.5 Error measures and practical computation

4 Test case: Convection-diffusion-reaction

4.1 Convection-diffusion-reaction PDE model

4.2 Discretization and reduced-order models

4.3 Results for multifidelity fusion of failure probabilities

4.4 Comparison to subset simulation methods

5 Failure probability estimation related to a free plane jet

5.1 Large-scale application: Free plane jet

5.2 Modeling and governing equations

5.3 Uncertain parameters and quantity of interest

5.4 Simplified-physics surrogate models

5.5 Results for multifidelity fusion of small probability estimators

6 Conclusions

Acknowledgements

Problem 1.

Proposition 1.

Corollary 1.

Proposition 2.