Transform-based particle filtering for elliptic Bayesian inverse   problems

Sangeetika Ruchi; Svetlana Dubinkina; Marco Iglesias

arXiv:1901.04706·stat.CO·January 8, 2020

Transform-based particle filtering for elliptic Bayesian inverse problems

Sangeetika Ruchi, Svetlana Dubinkina, Marco Iglesias

PDF

TL;DR

This paper presents an advanced particle filtering method using optimal transport resampling for elliptic Bayesian inverse problems, demonstrating improved performance for high-dimensional Gaussian fields and non-Gaussian parameters.

Contribution

It introduces optimal transport based resampling in adaptive SMC and compares its effectiveness with existing methods across different parametrizations.

Findings

01

Optimal transport SMC performs well for high-dimensional Gaussian fields.

02

For scalar parameters, optimal transport SMC is comparable to monomial SMC.

03

Outperforms EKI for non-Gaussian parameters.

Abstract

We introduce optimal transport based resampling in adaptive SMC. We consider elliptic inverse problems of inferring hydraulic conductivity from pressure measurements. We consider two parametrizations of hydraulic conductivity: by Gaussian random field, and by a set of scalar (non-)Gaussian distributed parameters and Gaussian random fields. We show that for scalar parameters optimal transport based SMC performs comparably to monomial based SMC but for Gaussian high-dimensional random fields optimal transport based SMC outperforms monomial based SMC. When comparing to ensemble Kalman inversion with mutation (EKI), we observe that for Gaussian random fields, optimal transport based SMC gives comparable or worse performance than EKI depending on the complexity of the parametrization. For non-Gaussian distributed parameters optimal transport based SMC outperforms EKI.

Equations96

- \nabla \cdot κ \nabla h

- \nabla \cdot κ \nabla h

\displaystyle f(x_{1},x_{2})=\left\{\begin{array}[]{ccc}0&\textrm{if}&0<x_{2}\leq 4,\\ 137&\textrm{if}&4<x_{2}<5,\\ 274&\textrm{if}&5\leq x_{2}<6.\end{array}\right.

\displaystyle f(x_{1},x_{2})=\left\{\begin{array}[]{ccc}0&\textrm{if}&0<x_{2}\leq 4,\\ 137&\textrm{if}&4<x_{2}<5,\\ 274&\textrm{if}&5\leq x_{2}<6.\end{array}\right.

h (x_{1}, 0) = 100, \frac{\partial h}{\partial x} (6, x_{2}) = 0, - κ \frac{\partial h}{\partial x} (0, x_{2}) = 500, \frac{\partial h}{\partial y} (x_{1}, 6) = 0.

h (x_{1}, 0) = 100, \frac{\partial h}{\partial x} (6, x_{2}) = 0, - κ \frac{\partial h}{\partial x} (0, x_{2}) = 500, \frac{\partial h}{\partial y} (x_{1}, 6) = 0.

ℓ_{j} (h) = \int_{D} \frac{1}{2 π ε ^{2}} e^{- \frac{1}{2 ε ^{2}} (x - x_{i})^{2}} h (x) d x

ℓ_{j} (h) = \int_{D} \frac{1}{2 π ε ^{2}} e^{- \frac{1}{2 ε ^{2}} (x - x_{i})^{2}} h (x) d x

G (κ) = (ℓ_{1} (h), \dots, ℓ_{M} (h)) .

G (κ) = (ℓ_{1} (h), \dots, ℓ_{M} (h)) .

y_{j} = l_{j} (h) + η_{j}, j = 1, \dots, M

y_{j} = l_{j} (h) + η_{j}, j = 1, \dots, M

κ (x) = exp (u_{1} (x)) χ_{D_{c}} (x) + exp (u_{2} (x)) χ_{D ∖ D_{c}} (x)

κ (x) = exp (u_{1} (x)) χ_{D_{c}} (x) + exp (u_{2} (x)) χ_{D ∖ D_{c}} (x)

x_{2} = d_{1} sin (d_{2} x_{1} /6) + tan (d_{3}) x_{1} + d_{4}

x_{2} = d_{1} sin (d_{2} x_{1} /6) + tan (d_{3}) x_{1} + d_{4}

u = (d_{1}, \dots, d_{5}, u_{1}, u_{2})

u = (d_{1}, \dots, d_{5}, u_{1}, u_{2})

U=\left\{\begin{array}[]{cc}L^{\infty}(D;\mathbb{R})&\textrm{for P1},\\ \prod_{i=1}^{5}A_{i}\times L^{\infty}(D;\mathbb{R}^{2})&\textrm{for P2},\end{array}\right.

U=\left\{\begin{array}[]{cc}L^{\infty}(D;\mathbb{R})&\textrm{for P1},\\ \prod_{i=1}^{5}A_{i}\times L^{\infty}(D;\mathbb{R}^{2})&\textrm{for P2},\end{array}\right.

|u|_{U}=\left\{\begin{array}[]{cc}||u||_{\infty}&\textrm{for P1},\\ \sum_{i=1}^{5}|d_{i}|+||u_{1}||_{\infty}+||u_{2}||_{\infty}&\textrm{for P2},\end{array}\right.

|u|_{U}=\left\{\begin{array}[]{cc}||u||_{\infty}&\textrm{for P1},\\ \sum_{i=1}^{5}|d_{i}|+||u_{1}||_{\infty}+||u_{2}||_{\infty}&\textrm{for P2},\end{array}\right.

F (u) = κ .

F (u) = κ .

y = G (u) + η

y = G (u) + η

c(x,y)=\sigma_{0}^{2}\frac{2^{1-\nu}}{\Gamma(\nu)}\Bigg{(}\frac{|x-y|}{l}\Bigg{)}^{\nu}K_{\nu}\Bigg{(}\frac{|x-y|}{l}\Bigg{)},

c(x,y)=\sigma_{0}^{2}\frac{2^{1-\nu}}{\Gamma(\nu)}\Bigg{(}\frac{|x-y|}{l}\Bigg{)}^{\nu}K_{\nu}\Bigg{(}\frac{|x-y|}{l}\Bigg{)},

μ_{0} (d u) = Π_{i = 1}^{5} π_{0}^{A_{i}} (d_{i}) \otimes N (m_{1}, C_{1}) N (m_{2}, C_{2})

μ_{0} (d u) = Π_{i = 1}^{5} π_{0}^{A_{i}} (d_{i}) \otimes N (m_{1}, C_{1}) N (m_{2}, C_{2})

\pi_{0}^{A}(x)=\left\{\begin{array}[]{cc}\frac{1}{|A|}&x\in A,\\ 0&x\notin A.\end{array}\right.

\pi_{0}^{A}(x)=\left\{\begin{array}[]{cc}\frac{1}{|A|}&x\in A,\\ 0&x\notin A.\end{array}\right.

l (u, y) \propto exp (- Φ (u, y))

l (u, y) \propto exp (- Φ (u, y))

Φ (u, y) = \frac{1}{2 σ ^{2}} ∣∣ y - G (u) ∣ ∣^{2}

Φ (u, y) = \frac{1}{2 σ ^{2}} ∣∣ y - G (u) ∣ ∣^{2}

\frac{\rmd μ}{\rmd μ _{0}} = \frac{1}{Z} l (u, y)

\frac{\rmd μ}{\rmd μ _{0}} = \frac{1}{Z} l (u, y)

Z = \int_{U} l (u, y) μ_{0} (\rmd u)

Z = \int_{U} l (u, y) μ_{0} (\rmd u)

\frac{\rmd μ _{n}}{\rmd μ _{0}} (u) \propto l_{n} (u, y) \equiv l (u, y)^{ϕ_{n}}

\frac{\rmd μ _{n}}{\rmd μ _{0}} (u) \propto l_{n} (u, y) \equiv l (u, y)^{ϕ_{n}}

\frac{\rmd μ _{n}}{\rmd μ _{n - 1}} (u) = \frac{1}{Z _{n}} l (u, y)^{(ϕ_{n} - ϕ_{n - 1})}

\frac{\rmd μ _{n}}{\rmd μ _{n - 1}} (u) = \frac{1}{Z _{n}} l (u, y)^{(ϕ_{n} - ϕ_{n - 1})}

Z_{n} \equiv \int_{X} l (u, y)^{(ϕ_{n} - ϕ_{n - 1})} μ_{n - 1} (\rmd u)

Z_{n} \equiv \int_{X} l (u, y)^{(ϕ_{n} - ϕ_{n - 1})} μ_{n - 1} (\rmd u)

μ_{n - 1}^{J} (u) \equiv \frac{1}{J} j = 1 \sum J δ_{u_{n - 1}^{(j)}} (u) ≃ μ_{n - 1} (u) .

μ_{n - 1}^{J} (u) \equiv \frac{1}{J} j = 1 \sum J δ_{u_{n - 1}^{(j)}} (u) ≃ μ_{n - 1} (u) .

Z_{n} ≃ j = 1 \sum J l (u_{n - 1}^{(j)}, y)^{(ϕ_{n} - ϕ_{n - 1})}

Z_{n} ≃ j = 1 \sum J l (u_{n - 1}^{(j)}, y)^{(ϕ_{n} - ϕ_{n - 1})}

E^{μ_{n}} (f (u)) \equiv \int_{X} f (u) μ_{n} (d u) = \frac{1}{Z _{n}} \int_{X} f (u) l (u, y)^{(ϕ_{n} - ϕ_{n - 1})} μ_{n - 1} (\rmd u)

E^{μ_{n}} (f (u)) \equiv \int_{X} f (u) μ_{n} (d u) = \frac{1}{Z _{n}} \int_{X} f (u) l (u, y)^{(ϕ_{n} - ϕ_{n - 1})} μ_{n - 1} (\rmd u)

\displaystyle\simeq\Big{[}\sum_{j=1}^{J}l(u_{n-1}^{(j)},y)^{(\phi_{n}-\phi_{n-1})}\Big{]}^{-1}\sum_{j=1}^{J}l(u_{n-1}^{(j)},y)^{(\phi_{n}-\phi_{n-1})}f(u_{n-1}^{(j)}),

= j = 1 \sum J W_{n}^{(j)} f (u^{(j)}),

W_{n}^{(j)} = W_{n - 1}^{(j)} [ϕ_{n}] \equiv \frac{l ( u _{n - 1}^{(j)} , y ) ^{ϕ_{n} - ϕ_{n - 1}}}{\sum _{s = 1}^{J} l ( u _{n - 1}^{(s)} , y ) ^{ϕ_{n} - ϕ_{n - 1}}} .

W_{n}^{(j)} = W_{n - 1}^{(j)} [ϕ_{n}] \equiv \frac{l ( u _{n - 1}^{(j)} , y ) ^{ϕ_{n} - ϕ_{n - 1}}}{\sum _{s = 1}^{J} l ( u _{n - 1}^{(s)} , y ) ^{ϕ_{n} - ϕ_{n - 1}}} .

μ_{n}^{J} (u) \equiv j = 1 \sum J W_{n}^{(j)} δ_{u_{n - 1}^{(j)}} (u) .

μ_{n}^{J} (u) \equiv j = 1 \sum J W_{n}^{(j)} δ_{u_{n - 1}^{(j)}} (u) .

\displaystyle{\rm ESS}_{n}(\phi)\equiv\Bigg{[}\sum_{j=1}^{J}(\mathcal{W}_{n-1}^{(j)}[\phi])^{2}\Bigg{]}^{-1}

\displaystyle{\rm ESS}_{n}(\phi)\equiv\Bigg{[}\sum_{j=1}^{J}(\mathcal{W}_{n-1}^{(j)}[\phi])^{2}\Bigg{]}^{-1}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Transform-based particle filtering for elliptic Bayesian inverse problems

S Ruchi1, S Dubinkina1 and M A Iglesias2

1 Centrum Wiskunde & Informatica, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands

2 School of Mathematical Sciences, The University of Nottingham, University Park, Nottingham, NG7 2RD, UK

[email protected]

Abstract

We introduce optimal transport based resampling in adaptive SMC. We consider elliptic inverse problems of inferring hydraulic conductivity from pressure measurements. We consider two parametrizations of hydraulic conductivity: by Gaussian random field, and by a set of scalar (non-)Gaussian distributed parameters and Gaussian random fields. We show that for scalar parameters optimal transport based SMC performs comparably to monomial based SMC but for Gaussian high-dimensional random fields optimal transport based SMC outperforms monomial based SMC. When comparing to ensemble Kalman inversion with mutation (EKI), we observe that for Gaussian random fields, optimal transport based SMC gives comparable or worse performance than EKI depending on the complexity of the parametrization. For non-Gaussian distributed parameters optimal transport based SMC outperforms EKI.

August 2017

Keywords: parameter estimation, non-Gaussian posterior, tempering, particle approximation, Ensemble Transform Particle filter, Darcy flow

1 Introduction

We consider the inverse problem of inferring unknown parameters in models described by partial differential equations (PDEs), given incomplete noisy data/observations of the model outputs. We adopt the Bayesian approach where the unknowns are random functions with a prescribed prior measure that encompasses our prior statistical knowledge of the unknown. The solution to the Bayesian inversion problem is the posterior, i.e. the conditional distribution of the unknown parameters given the observed data. We can use the posterior to compute estimates of the unknown together with the degree of confidence in those estimates. We are interested in problems where the parameter-to-output map from the underlying PDE model is nonlinear. These are particularly challenging problems since the resulting posterior cannot be obtained analytically even when the prior and the noise distributions are assumed Gaussian. Hence, sampling methods are required to approximate (expectations under) the posterior which, in turn, is defined on a very high dimensional space after discretisation of the PDEs that define the forward problem.

Markov chain Monte Carlo (MCMC) is the method of choice to sample the Bayesian posterior [1]. In particular, there is a class of MCMC methods constructed in functional settings with mesh-invariant properties suitable for PDE-constrained identification problems [2]. However, the most standard version of these methods often exhibit excessively long correlations (e.g. up to $10^{4}$ [3, 4]), a situation particularly exacerbated with highly-peaked (possibly multimodal) posteriors such as those arising when observational noise is small. Very long MCMC long chains (e.g. over $10^{7}$ steps) are thus required to (i) ensure that MCMC fully explores the posterior measure thus capturing possibly multiple modes and (ii) produce sufficient independent samples to compute accurate posterior statistics. Since every step of MCMC involves at least one PDE solve, these methods become impractical for costly large-scale simulations. While more efficient MCMC can be used to approximate the posterior [5, 6], their proposals often required high-order derivatives of the likelihood which are not available in many applications where the simulator is accessible only in a black-box fashion.

Sequential Monte Carlo (SMC) samplers [7] offer a different sampling approach for approximating the Bayesian posterior. In the context of large-scale Bayesian inversion, adaptive SMC methods construct particle approximations of a sequence of intermediate measures that interpolate (e.g. via tempering) between the prior and the posterior. Particles and their weights are adapted on-the-fly to enable a controlled transition between those intermediate measures, thus facilitating to gradually move from a simple prior to a possibly complex posterior. The transition between two intermediate measures involves an importance resampling (IR) step by which the particles are weighted according to the tempered likelihood and then resampled according to those weights. This step is then followed by mutation of particles induced by sampling from a kernel with the IR measure as its invariant measure; this is typically conducted via running MCMC chains with the aforementioned target measure.

Adaptive SMC samplers for solving Bayesian inverse problems have been proposed in [4] and applied for the identification of the initial condition in the Navier-Stokes equations. This work showed that SMC can produce accurate approximations of the Bayesian posterior at a computational cost an oder of magnitude smaller than those obtained via state-of-the-art MCMC. The same adaptive SMC sampler was used in [8] to infer permeability in a moving boundary problem arising in porous media flow. A theoretical framework for adaptive SMC framework was developed in [9] and tested numerically by inferring hydraulic conductivity in a groundwater flow model.

Despite of the computational advantages of using SMC samplers, their computational cost still poses severe limitations for its application to practical large-scale inverse problems. The cost of a single iteration (IR+mutation) within SMC is $J\times N_{\mu}$ where $J$ is the number of particles and $N_{\mu}$ is the number of mutation MCMC moves. Therefore, each iteration could involve over $10^{4}$ PDE solves even for relatively small $J$ and $N_{\mu}$ (i.e. $J=10^{3}$ and $N_{\mu}=10$ ). Hence, if the posterior is complex hence requiring several intermediate measures, the cost of SMC is prohibited unless high performance (HPC) resources are available to scale the cost of SMC with respect to $J$ . While parallelisation is indeed one of the main advantages of SMC, the availability of HPC with $10^{4}-10^{5}$ processors for typical engineering and geophysical (practical) applications is the exception rather than norm. It is worth mentioning that reducing the cost of SMC via using small number of samples and/or reducing the number of mutation steps can be substantially detrimental to the accuracy of the particle approximation provided by SMC; see for example the work of [8] where SMC with limited number of particles ( $10^{2}-10^{3}$ ) results in very poor approximations of the Bayesian posterior. Recent work aimed at reducing the computational cost of SMC samplers includes the development of multilevel versions [10, 11].

1.1 Contribution of this work

Our aim is to investigate the feasibility of an alternative, potentially more computationally affordable, approach to approximate the Bayesian posterior within the adaptive tempering SMC setting for Bayesian PDE-constrained inverse problems [4, 9]. The proposed approach consist of replacing the resampling step in SMC with a deterministic linear transformation that maps the system of particles that approximate two consecutive measures. At each iteration step within SMC, the transformation is obtained via solving an optimal transportation problem which, in turn, defines a deterministic coupling between two discrete random variables with realisations defined by the particles and with probabilities determined by their corresponding weights. Replacing resampling by an optimal transformation within Bayesian algorithms was proposed in [12] where it was shown that the linear transport map leads to samples that converge to the posterior measures in large ensemble limit. In the context of data assimilation of partially observed dynamic systems, the idea of replacing IR by optimal transport maps is at the core of the so-called ensemble Transform Particle filter (ETPF) [12, 13]. The novelty of our approach lies in transfering the application of optimal transport to compute the transition between measures in the tempering scheme within SMC.

Numerous work on data assimilation has shown that, when relatively small number of particles are used, ETPF provides more accurate state estimations compared to standard IR-based particles filters due to the sampling errors introduced by resampling. While methods such as ensemble Kalman filter (EnKF) can work well for small ensemble sizes compared to IR-based methods, they rely on Gaussian approximations which is often a severe limitation when the underlying distribution is, for example, multimodal. In contrast, the optimal transport within ETPF does not rely on Gaussian approximations and has been shown to be 1st order consistent for the mean, and to converge to the posterior measure in the large-ensemble size limit [12]. Here we investigate whether those well known advantages of ETPF can be exploited within the setting of adaptive SMC for Bayesian inversion. As a proof-of-concept we apply the proposed algorithm to a Bayesian elliptic inverse problem arising in groundwater flow. The goal is to infer hydraulic conductivity from pressure measurements. We consider two parameterisations of the conductivity field aimed at assessing the method under two levels of complexity. In the first one we assume that the log-conductivity is a smooth function characterised by Gaussian random field under the prior. The second parameterisation consist of a channelised permeability that is described by a set of geometric parameters together with two random fields in the regions inside and outside the channel. While the first parameterisation yields posteriors which are relatively well approximated by Gaussians, the second parameterisation can result in multimodal distributions which are more difficult to capture with Gaussian approximations.

We compared the performance of the proposed technique against a fully resolved posterior computed by the preconditioned Crank-Nicolson (pcn)-MCMC with sufficient steps to ensure that a chain is properly converged. We then compare the proposed technique against monomial based SMC as well as an ensemble Kalman inversion (EKI) technique that arises naturally from the adaptive SMC setting. This EKI methodology has been proposed in [14] as an alternative of [15]. Here this approach is modified to incorporate a mutation with the invariant measure.

2 Forward and Inverse Problem

Since we consider Bayesian inversion, it demands formulation of both a forward problem and an inverse problem. The forward problem consists of finding pressure from hydraulic conductivity. The ”inverse” problem consists of two parts. First part is parametrization of hydraulic conductivity by a random variable. Second part is employment of the Bayes’ rule to obtain the posterior distribution of the random variable from a given prior and a likelihood. The likelihood involves forward problem evaluation. Thus the Bayesian inversion employs the forward problem within the inverse problem.

2.1 Forward Model

The forward problem consist of the identification of the hydraulic conductivity, $\kappa(x)$ , of a two-dimensional confined aquifer for which the physical domain is $D=[0,6]\times[0,6]$ . Assuming that the flow within the aquifer is single-phase steady-state Darcy flow, the piezometric head $h(x)$ , is given by the solution of [16]

[TABLE]

where $f$ represents recharge term. We use the Benchmark from [17, 18, 15] where $f$ has the following form

[TABLE]

and where the boundary conditions are given by

[TABLE]

We wish to infer $\kappa\in X:=\{f\in L^{\infty}(D;\mathbb{R})|\textrm{ess}\inf_{x\in D}f(x)>0\}$ from point observations of $h$ collected at $M$ locations denoted by $\{x_{i}\}_{i=1}^{M}\subseteq D$ . To this end, we consider smoothed point observations defined by

[TABLE]

where $\varepsilon>0$ . Let us define the forward map $G:X\rightarrow\mathbb{R}^{M}$ by

[TABLE]

which maps permeability into predictions of hydraulic head at measurement locations. Assume that we have noisy measurements of $\{\ell_{j}(h)\}_{j=1}^{M}$ of the form

[TABLE]

where $\eta_{j}$ represents measurement noise. Our aim is to reconstruct $\kappa\in X$ given $y=(y_{1},\dots,y_{M})\in\mathbb{R}^{M}$ .

2.1.1 Parameterisation of permeability

We consider the following two parameterisations of the permeability function $\kappa(x)$ that we wish to identify from observations of the Darcy flow model (1)-(6).

P1:

For the first model the parameter that we consider is simply the natural logarithm of $\kappa$ , i.e. $u(x)=\log{\kappa(x)}$ .

P2:

The second model consist of parameterisation of a piecewise continuous permeability of the form

[TABLE]

where $\kappa_{1}=\exp(u_{1}(x))$ and $\kappa_{2}=\exp(u_{2}(x))$ are continuous permeabilities inside and outside a sinusoidal channel with domain denoted by $D_{c}$ . The geometry of the channel is parameterized by five parameters $\{d_{i}\}_{i=1}^{5}$ as described in Figure 1. The lower boundary of the channel is given by

[TABLE]

where we use the notation $x=(x_{1},x_{2})\in D$ in terms of the horizontal and vertical components. The upper boundary of the channel is given by $x_{2}+d_{5}$ . For this permeability model the parameters of interest are comprised in

[TABLE]

where we assume that each $d_{i}$ is restricted to an interval $A_{i}\equiv[{d_{i}^{-}},d_{i}^{+}]$ .

We define the following parameter space

[TABLE]

with metric

[TABLE]

The parameterizations described earlier define an abstract map $F:U\rightarrow X$ from the space of parameter to the space of admissible permeabilities, via

[TABLE]

We define the parameter-to-observations map $\mathcal{G}:U\rightarrow\mathbb{R}^{M}$ by $\mathcal{G}=G\circ F$ and reformulate the inverse problem (7) in terms of finding the parameter $u\in U$ , given $y\in\mathbb{R}^{M}$ that satisfies

[TABLE]

for $\eta=(\eta_{1},\dots,\eta_{M})\in\mathbb{R}^{M}$ . The continuity of the parameter-to-observations map $\mathcal{G}$ for this, and more general cases, has been established in [19, 3].

2.2 The Bayesian Inverse Problem

In order to address the inverse problem formulated via (9) we adop the Bayesian framework [19] where $\eta$ is a random vector and $u$ is a random function. We put a prior, $\mu_{0}(u)$ , on the unknown $u$ , and define the random variable $y|u$ under the standard assumption that $\eta\sim N(0,\sigma^{2}I)$ independent of $u$ . The solution to the inverse problem in the Bayesian setting is the posterior measure on $u|y$ . In the following sections we introduce the prior and likelihood which by the infinite-dimensional framework of [19] ensure that the posterior measure exists and is continuous with respect to appropriate metrics.

2.2.1 The Prior

For P1 we consider Gaussian prior $\mu_{0}=N(m,C)$ with mean $m$ and covariance $C$ . We define $C$ via a correlation function given by the Wittle-Matern correlation function defined by [20]:

[TABLE]

where $\Gamma$ is the gamma function, $l$ is the characteristic length scale, $\sigma_{0}^{2}$ is an amplitude scale and $K_{\nu}$ is the modified Bessel function of the second kind of order $\nu$ . The parameter $\nu$ controls the regularity of the samples.

For P2 we assume independence between geometric parameters and log-permeabilities and thus consider a prior of the form

[TABLE]

where $\pi_{0}^{A}(x)$ is the uniform density defined by

[TABLE]

In expression (11) $N(m_{1},C_{1})$ and $N(m_{2},C_{2})$ are two Gaussians such as those described earlier in terms of the correlations function from (10).

2.2.2 The likelihood

We assume the unknown $u$ is independent of the observational noise $\eta\sim N(0,\sigma^{2})$ . We note that $y|u\sim N(\mathcal{G}(u),\sigma^{2}I)$ , hence the likelihood is given by

[TABLE]

where $\Phi(u,y)$ is the data misfit defined by

[TABLE]

2.2.3 The Posterior

The selection of prior measures from subsection 2.2.1 satisfies that $\mu_{0}(U)=1$ ; i.e. samples from $\mu_{0}$ are in $U$ almost surely [19, 3]. This property, together with the continuity of the forward map defined in subsection 2.1, can be used in the Bayesian framework of [19, 3] to conclude that (i) the posterior measure $\mu(u)$ on $u|y$ exists and is absolutely continuous with respect to the prior; and(ii) $\mu_{0}$ and has a density with respect to $\mu_{0}$ given by the following Bayes’ rule

[TABLE]

where

[TABLE]

3 Sequential Monte Carlo for Bayesian inversion

Since we consider a highly nonlinear model, an iterative approach to Bayesian inversion is essential. In the framework of SMC it is performed by tempering (or annealing), when the prior measure bridged to the posterior measure not at once but through tempered measures. It should be noted that the number of tempered measures is not predefined, which could be a potential computational burden. In order to avoid filter degeneracy both resampling and mutation (or jittering) has to be performed. In the ”classical” approach we perform monomial resampling, which we propose to replace by resampling based on optimal transport.

3.1 Adaptive SMC

The SMC approach to Bayesian inversion involves bridging the prior $\mu_{0}$ and the posterior $\mu$ via a sequence of intermediate artificial measures $\{\mu_{n}\}_{n=0}^{N}$ , with $\mu_{N}=\mu$ , defined by

[TABLE]

where $\{\phi_{n}\}_{n=0}^{N}$ is a set of tempering parameters that satisfy $0=\phi_{0}<\phi_{1}<\cdots<\phi_{N}=1$ . Expression (17) formally implies

[TABLE]

where

[TABLE]

Let us then assume that at the iteration level $n-1$ , the tempering parameter $\phi_{n-1}$ has been specified, and that a set of particles $\{u_{n-1}^{(j)}\}_{j=1}^{J}$ provides the following approximation (with equal weights) of the intermediate measure $\mu_{n-1}$ :

[TABLE]

Then from (19) it follows that

[TABLE]

and thus, for any measureable $f$ , we have that

[TABLE]

where the importance weights for the approximation of $\mu_{n}$ are given by

[TABLE]

From (3.1) we see that the importance (normalized) weights $W_{n}^{(j)}$ assigned to each particle $u_{n-1}^{(j)}$ define the following empirical (particle) approximation of $\mu_{n}$ :

[TABLE]

3.1.1 Selection-Resampling Step

From the previous subsection it follows that adaptive SMC requires then to select the tempering parameters $\phi_{n}$ so that the two consecutive measures $\mu_{n-1}$ and $\mu_{n}$ are sufficiently close for the IS approximating to be accurate. To this end, a common procedure [21] involves imposing a threshold on the effective sample size (ESS) defined by

[TABLE]

which, in turn, provides a measure of the quality of the population. In other words, $\phi_{n}$ is defined by the solution to

[TABLE]

for a user-defined parameter $J_{\rm thresh}$ on the ESS. A bisection algorithm on the interval $(\phi_{n-1},1]$ can be used to solve (26) [15]. If ${\rm ESS}_{n}(1)>J_{\rm thresh}$ , then then we can simply set $\phi_{n}=1$ as no further tempering is thus required.

Once the tempering parameter $\phi_{n}$ has been computed via (26), normalised weights (23) can be computed. Since some of these can be very low, resampling with replacement according to these weights is then required to discard particles associated with those low weights. After resampling, a new set of equally-weighted particles denoted by $\hat{u}_{n}^{(j)}$ ( $j=1,\dots,J$ ) provide a particle approximation of the measure $\mu_{n}$ .

3.1.2 Mutation Phase

In order to add diversity to the resampled particles $\hat{u}_{n}^{(j)}$ computed in the selection-resampling step, a mutation step is included in most SMC methodologies. This mutation consists of sampling from a Markov kernel $\mathcal{K}_{n}$ with invariant distribution $\mu_{n}$ . This can be achieved by running $N_{\mu}$ steps of an MCMC algorithm that has target distribution equal to $\mu_{n}$ . An example of MCMC suitable for the parameterisation P1 of section 2.1.1 is the preconditioned Crank-Nicolson (pcn)-MCMC [2] displayed in Algorithm 1. This algorithm samples from the target $\mu_{n}$ with reference measure $\mu_{0}=N(m,C)$ ; we recall these two measures are related by (15). The resulting particles denoted by $\{u_{n}^{(j)}\}_{j=1}^{J}$ ( $u_{n}^{(j)}\sim\mathcal{K}_{n}(\hat{u}_{n}^{(j)},\cdot)$ ) provide a particle approximation of $\mu_{n}$ in the form

[TABLE]

Convergence of (27) to $\mu_{n}$ in the large ensemble size limit can be found in [9]. The complete adaptive SMC sampler is displayed in Algorithm 2.

3.2 Optimal Transport within SMC

In this section we assume that $X=\mathbb{R}^{K}$ . We denote $U_{n-1}$ a discrete random variable with realisations $\{u_{n-1}^{(j)}\}_{j=1}^{J}$ and probabilities $\{W_{n}^{(j)}\}_{j=1}^{J}$ . We denote $U_{n}$ the random variable with samples $\{\hat{u}_{n-1}^{(j)}\}_{j=1}^{J}$ with equal weights. The aim is to replace the resampling step in the method above with resampling that maximizes the covariance between $U_{n-1}$ and $U_{n}$ . Such a resampling is performed by finding a coupling between the posterior defined by the weights $\{W_{n}^{(j)}\}_{j=1}^{J}$ and the uniform probability density such that it maximizes the covariance between $U_{n-1}$ and $U_{n}$ .

Let us assume that the two consecutive measures $\mu_{n-1}$ and $\mu_{n}$ are defined on a measurable space $(\Omega,\mathcal{F})$ such that $\mu_{n-1}$ is the law of $U_{n-1}:\Omega\rightarrow\mathcal{U}_{n-1}$ and $\mu_{n}$ is the law of $U_{n}:\Omega\rightarrow\mathcal{U}_{n}$ . Here, the couple $(U_{n-1},U_{n})$ is called the coupling of $(\mu_{n-1},\mu_{n})$ , i.e. the coupling of the posterior defined by the weights $\{W_{n}^{(j)}\}_{j=1}^{J}$ and the uniform probability density. A coupling is called deterministic if there exists a measurable function $\Psi:\mathcal{U}_{n-1}\rightarrow\mathcal{U}_{n}$ such that $U_{n}=\Psi(U_{n-1})$ and $\Psi$ is called transport map. Unlike couplings, deterministic couplings do not always exist. On the other hand there may be an infinitely many deterministic couplings. An example of deterministic coupling is an optimal coupling. Optimal coupling is a solution of the Monge-Kantorovitch miminization problem

[TABLE]

where minimum runs over all joint probability measures $\ell$ on $\mathcal{U}_{n-1}\times\mathcal{U}_{n}$ with marginals $\mu_{n-1}$ and $\mu_{n}$ , and $c(u_{n-1},\hat{u}_{n-1})$ is a cost function on $\mathcal{U}_{n-1}\times\mathcal{U}_{n}$ . The joint measures achieving the infinum are called optimal transference plans. The optimal coupling is unique if the measure $\mu_{n-1}$ possess some regularity properties and the cost function $c(u_{n-1},\hat{u}_{n-1})$ is convex [22]. It appeared that such a coupling simultaneously minimizes the expectation between $||u_{n-1}-\hat{u}_{n}||^{2}$ and is defined as the solution of the Monge-Kantorovitch problem with cost function $c(u_{n-1},\hat{u}_{n})=||u_{n-1}-\hat{u}_{n}||^{2}$ .

Thus the above described coupling is a $J\times J$ matrix $T^{*}$ with non-negative entries $T_{ij}^{*}$ that satisfy

[TABLE]

and minimizes

[TABLE]

for $T_{ij}^{*}$ . This is a linear transport problem of finding $J^{2}$ unknowns. Then the linear transformation gives new samples according to

[TABLE]

where $P_{ij}=JT_{ij}^{*}$ .

The deterministic optimal transformation (30) converges weakly to the solution of the underlying continuous Monge-Kantorovitch problem as $J\to\infty$ [12]. ETPF is first order consistent, since

[TABLE]

There also exists a second-order accurate ETPF [23], which however does not satisfy $T_{ij}^{*}\geq 0$ . The main difference between resampling based on optimal transport and monomial resampling is that the former one is optimal in the sense of the Monge-Kantorovitch problem, while the latter one is non-optimal in that sense.

The computational complexity of finding the minimizer of (30) is in general $\Or(J^{3}\ln J)$ , which has been reduced to $\Or(J^{2}\ln J)$ in [24]. The wall clock time at $J=100$ is 0.3 seconds for SMC with optimal resampling, while 0.03 seconds for both SMC with monomial resampling and EKI. It can be further improved by employing fast iterative methods for finding approximate minimizers using the Sinkhorn distance [25], which was implemented in [23] for the second-order accurate ETPF. The algorithm of Earth’s moving distances of [24] is available as both MATLAB and Python codes and is used here. The complete adaptive optimal transport based SMC sampler is displayed in Algorithm 3.

3.3 Gaussian Approximation of SMC via ensemble Kalman inversion

A natural approximation that arises from the adaptive SMC framework described in subsection 3.1 involves ensemble Kalman inversion (EKI) [8]. More specifically, let us assume that at the $n-1$ iteration level, we approximate $\mu_{n-1}$ with a Gaussian $\hat{\mu}_{n-1}=N(m_{n-1},C_{n-1})$ where the mean $m_{n-1}$ and covariance $C_{n-1}$ are the empirial mean and covariance of the particles (assumed with equal weights) at the current iteration level. That is,

[TABLE]

If we now linearise the forward map around $m_{n-1}$ and replace Frechet derivatives of the forward map with covariances/crosscovariances as in [15], it can be shown that the application to Bayes rule yields an approximate posterior $\hat{\mu}_{n}=N(m_{n},C_{n})$ with mean and covariance given by

[TABLE]

where

[TABLE]

and where

[TABLE]

Since we are interested in a particle approximation of $\hat{\mu}_{n}=N(m_{n},C_{n})$ , we can use the following expression

[TABLE]

where

[TABLE]

Standard Kalman filter arguments [26] can be used to show that the particle approximation provided by (37)-(38) converges to $\hat{\mu}_{n}$ as $J\to\infty$ . We note in passing that, within the adaptive SMC framework used here, the regularisation/inflation parameter $\alpha_{n}$ in formulas (36) is computed based on the ESS criteria discussed in subsection 3.1.1.

It is important to emphasize that, in general, the approximate Gaussian measure $\hat{\mu}_{n}$ coincides with $\mu_{n}$ only when the forward map is linear and the prior $\mu_{0}$ is Gaussian. The approximation provided by EKI will deteriorate when we depart from Gaussian-linear assumptions. Therefore, we propose to conduct MCMC mutations to each of the particles in (37) with the aim of improving the approximation of each posterior measure $\mu_{n}$ . The complete EKI-based algorithm is displayed in Algorithm 4. We recognise that this is only an ad-hoc approach for which exact sampling of the posterior (as $J\to\infty$ ) is not ensured. A more rigorous (i.e. fully-Bayesian approach) that we leave for future work is to use EKI in the proposal design for the importance sampling step within SMC; this is done for data assimilation settings in [27].

4 Numerical experiments

In this section we perform numerical experiments to infer P1 and P2 parameters. We compare optimal transport based SMC to both monomial based SMC and EKI, which we denote optimal, monomial, and Kalman, respectively. We analyze methods performance with respect to a pcn-MCMC solution, which we denote as reference. We combine 50 independent chains each of the length $10^{6}$ and $10^{5}$ burn-in period and thinning $10^{3}$ .

Observations of pressure were obtained from the true permeability with observation noise from normal distribution with zero mean and standard deviation of 2% of $L^{2}$ -norm of the true pressure. We should note that both the true random variable and an initial ensemble of parameterized permeability are drawn from the same prior distribution as the prior includes knowledge about geological properties. However, the true solution is computed on a fine grid and an initial guess on a coarse grid, which is half the resolution of the fine grid. The uncertain parameter for P1 inference has the dimension of the coarse grid, i.e. $4900=70^{2}$ . The uncertain parameter for P2 inference has the dimension of the coarse grid twice, due to permeability defined inside and outside channel but on the whole grid, plus the dimension of the geometrical parameters, i.e. $5005=50^{2}+50^{2}+5$ .

For log-permeability parameters, the prior is normal distribution with mean 5 for P1, and for P2 with mean 15 outside channel and 100 inside channel. For geometrical parameters, the prior is uniform: $d_{1}\sim U[0.05\times 6,\ 0.35\times 6]$ , $d_{2}\sim U[\pi/2,\ 6\pi]$ , $d_{3}\sim U[-\pi/2,\ \pi/2]$ , $d_{4}\sim U[0,\ 6]$ , $d_{5}\sim U[0.02\times 6,\ 0.7\times 6]$ . For tempering we choose the effective ensemble size threshold $J_{\rm thresh}=J/3$ and for mutations the length of Markov chain $N_{\mu}=10$ to save computational costs. For P2, we use Metropolis-within-Gibbs methodology of [3] to separate geometrical parameters and log-permeability parameters within the mutation step, since it allows to better exploit the structure of the prior. The proposal design for the geometric parameters within the Metropolis-within-Gibs consist of local moves within the intervals of the prior with a step size that we tune to achieve acceptance rates between 20% and 30%. Geometrical parameters that fall outside those intervals are projected back via a projection that preserves reversibility of the proposal with respect to the prior [3]. We perform numerical experiments with different ensemble sizes of 100, 500, and 1000. We perform 10 simulations with different realizations of the initial ensemble to check the robustness of results.

For log-permeability, we compute $L^{2}$ norm of the error in the mean with respect to the reference

[TABLE]

We investigate the performance of the proposed approach to approximate the marginal posterior, $p(d_{i})$ , of each geometric parameter $d_{i}$ ( $i=1,\dots,5$ ) defined in parameterisation P2. To this end, we compute Kullback-Leibler divergence with respect to the reference/true posterior marginal (denoted by $p^{\rm ref}(d_{i})$ ) computed via MCMC:

[TABLE]

where $J_{\rm b}=J/10$ is chosen number of bins and $p(d_{i}^{j})$ is approximated by the weights. The results (median, 25 and 75 percentiles) that we report below for both the error in the mean and the KL divergence are computed over 10 experiments corresponding to independent choices of the prior ensemble.

4.1 Numerical inference for P1

For P1, we perform a numerical experiment using 36 uniformly distributed observations. In Figure 2, we plot error in the mean log-permeability with respect to reference. We observe that while optimal transport based SMC outperforms monomial based SMC for all ensemble sizes, EKI outperforms both SMC methods. This is due to the nature of P1 parametrization and only two degrees of freedom (mean and variance) of EKI.

In Figure 3, we plot mean log-permeability for a simulation with smallest error at ensemble size 100 and reference mean log-permeability. We see that monomial based SMC gives a less smooth estimation compared to optimal transport based SMC, EKI, and reference, which leads to larger error.

For ensemble sizes considered here, the number of tempering steps on average is 15 for optimal transport based SMC, and 17 for both monomial based SMC and EKI. Thus in terms of computational cost optimal transport based SMC is equivalent to monomial based SMC, since computational complexity of the forward model is higher than $O(J\ln J)$ .

4.2 Numerical inference for P2

For P2, we perform a numerical experiment using 9 uniformly distributed observations. For ensemble size considered here, the number of tempering steps on average is 8 for EKI, and 7 for both optimal transport based SMC and monomial based SMC. In Figure 4, we plot error in the mean log-permeability with respect to reference for permeability outside channel on the left and for permeability inside channel on the right. We observe that while optimal transport based SMC still outperforms monomial based SMC for all ensemble sizes, it is now comparable to EKI. This is due to a small number of observations.

In Figures 5–6, we plot mean log-permeability for a simulation with smallest error at ensemble size 100 and reference mean log-permeability for permeability outside channel and for permeability inside channel, respectively. We see that monomial based SMC gives a less smooth estimation compared to optimal transport based SMC, EKI, and reference, which leads to larger error.

In Figure 7, we show posterior estimations of geometrical parameters. We see that all the parameters except amplitude and width exhibit strongly non-Gaussian behaviour. In Figure 8, we show a trace plot of frequency from a chain of the reference to check whether two modes are being sampled within each chain. We observe that the chain is properly mixed.

In Figure 9, we plot KL divergence for geometrical parameters. We observe that EKI performs better than optimal transport based SMC for amplitude and width, while worse for other parameters. We should note that the two different modes of frequency shown in Figure 7 provide two significantly different channel configuration, thus it is important to correctly estimate the pdf. Monomial based SMC performs comparably to optimal transport based SMC though not consistently better or worse. We should recall, however, that optimal transport based SMC outperforms monomial based SMC for log-permeability both inside and outside channel.

In Figure 10, we show mean field of permeability over the channelized domain for the lowest error at ensemble size 1000.

5 Conclusions

Accurate estimation of the posterior distribution of uncertain model parameters of strongly nonlinear problems remains a challenging problem. Parameters are high dimensional, they are not observed, and they do not have a dynamical equation. Moreover, due to nonlinearity of models even Gaussian prior of parameters might result in non-Gaussian posterior. Since MCMC is computationally unfeasible for high-dimensional problems, adaptive SMC is an alternative to estimate posterior distributions in the Bayesian framework. However, adaptive SMC still requires large ensembles.

In order to reduce computational cost, we proposed to introduce optimal transport based resampling from [12] to adaptive SMC. Optimal transport based resampling creates new samples by maximizing variance between prior and posterior. It has been already shown for state estimation and parameter estimation with low dimension, that particle filter with optimal transport based resampling outperforms particle filter with monomial based resampling. As it was aimed to estimate time-evolving model states of chaotic systems, simple inflation was sufficient to mutate particles.

Here we have adopted optimal transportation to elliptic Bayesian inverse problems. We have shown that optimal transport-based SMC has a high potential for Bayesian inversion of high-dimensional parameters. The parameterisation of the channelised permeability was particularly useful since it involves geometric parameters with marginal posteriors that display non-Gaussian features (e.g. bimodality in the frequency parameter; see Figure 7) which are often difficult to characterise via EKI. Indeed, for this case the proposed approach provides more accurate approximations to the marginal posteriors (quantified via KL divergence) than those approximated with EKI. Compared to the standard monomial-based SMC we did not observe substantial differences in the level of approximation of the aforementioned marginals. However, the proposed transport-based SMC outperforms the monomial-based version in approximating the high-dimensional (marginal) posteriors of the two spatially-variable log-permeability fields that we infer in the present setting (measured in terms of the error in the mean error and variance).

Moreover, optimal transport based SMC still underestimates variance (not shown), which could be improved by considering second order consistent optimal transport resampling instead of first order. However, second order consistent optimal transport resampling does not necessary provide with non-negative transformations. Finally, optimal transport resampling does not need to be restricted to finite dimensions, at least theoretically [28], with the challenge of finding such a minimizer computationally.

This work is part of the research programme Shell-NWO/FOM Computational Sciences for Energy Research (CSER) with project number 14CSER007 which is partly financed by the Netherlands Organization for Scientific Research (NWO).

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Kaipio J and Somersalo E 2005 Statistical and computational inverse problems (Springer Science+ Business Media, Inc.)
2[2] Cotter S, Roberts G, Stuart A and White D 2013 Statistical Science 28 424–446
3[3] Iglesias M, Lin K and Stuart A 2014 Inverse Problems 30 114001
4[4] Kantas N, Beskos A and Jasra A 2014 SIAM/ASA Journal Uncertainty Quantification 2 464–489
5[5] Bui-Thanh T and Girolami M 2014 Inverse Problems 30 114014
6[6] Martin J, Wilcox L, Burstedde C and Ghattas O 2012 SIAM Journal on Scientific Computing 34 A 1460–A 1487
7[7] Del Moral P, Doucet A and Jasra A 2006 Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 411–436
8[8] Iglesias M, Park M and Tretyakov M 2018 Inverse Problems 34 105002