Convergence Analysis of Ensemble Kalman Inversion: The Linear, Noisy   Case

Claudia Schillings; Andrew Stuart

arXiv:1702.07894·math.NA·August 9, 2017

Convergence Analysis of Ensemble Kalman Inversion: The Linear, Noisy Case

Claudia Schillings, Andrew Stuart

PDF

Open Access

TL;DR

This paper analyzes the convergence of ensemble Kalman inversion in linear, noisy settings, providing theoretical and numerical insights into its behavior and robustness.

Contribution

It extends existing analysis to noisy data, establishing well-posedness and convergence results for ensemble Kalman inversion in linear inverse problems.

Findings

01

Convergence is established for fixed ensemble size.

02

Noise impacts the convergence behavior.

03

Numerical experiments confirm theoretical predictions.

Abstract

We present an analysis of ensemble Kalman inversion, based on the continuous time limit of the algorithm. The analysis of the dynamical behaviour of the ensemble allows us to establish well-posedness and convergence results for a fixed ensemble size. We will build on the results presented in [26] and generalise them to the case of noisy observational data, in particular the influence of the noise on the convergence will be investigated, both theoretically and numerically. We focus on linear inverse problems where a very complete theoretical analysis is possible.

Equations58

y = G (u) + η

y = G (u) + η

y = A u + η .

y = A u + η .

Φ (u; y^{†}) = \frac{1}{2} ∥ y^{†} - A u ∥_{Γ}^{2},

Φ (u; y^{†}) = \frac{1}{2} ∥ y^{†} - A u ∥_{Γ}^{2},

{u_{n+1}^{(j)}=u_{n}^{(j)}+C(u_{n})A^{*}(AC(u_{n})A^{*}+\frac{1}{h}\Gamma)^{-1}({\color[rgb]{0,0,0}{y^{\dagger}}}-Au_{n}^{(j)})}.

{u_{n+1}^{(j)}=u_{n}^{(j)}+C(u_{n})A^{*}(AC(u_{n})A^{*}+\frac{1}{h}\Gamma)^{-1}({\color[rgb]{0,0,0}{y^{\dagger}}}-Au_{n}^{(j)})}.

\bar{u}_{n}=\frac{1}{J}\sum_{j=1}^{J}u_{n}^{(j)},\quad C(u_{n})=\frac{1}{J}\sum_{j=1}^{J}\bigl{(}u^{(j)}_{n}-\overline{u}_{n}\bigr{)}\otimes\bigl{(}u^{(j)}_{n}-\overline{u}_{n}\bigr{)}.

\bar{u}_{n}=\frac{1}{J}\sum_{j=1}^{J}u_{n}^{(j)},\quad C(u_{n})=\frac{1}{J}\sum_{j=1}^{J}\bigl{(}u^{(j)}_{n}-\overline{u}_{n}\bigr{)}\otimes\bigl{(}u^{(j)}_{n}-\overline{u}_{n}\bigr{)}.

\frac{{\mathrm{d}}u^{(j)}}{{\mathrm{d}}t}=\frac{1}{J}\sum_{k=1}^{J}\bigl{\langle}A(u^{(k)}-\overline{u}),{\color[rgb]{0,0,0}{y^{\dagger}}}-Au^{(j)}\bigr{\rangle}_{\Gamma}\bigl{(}u^{(k)}-\overline{u}\bigr{)},\quad j=1,\cdots,J.

\frac{{\mathrm{d}}u^{(j)}}{{\mathrm{d}}t}=\frac{1}{J}\sum_{k=1}^{J}\bigl{\langle}A(u^{(k)}-\overline{u}),{\color[rgb]{0,0,0}{y^{\dagger}}}-Au^{(j)}\bigr{\rangle}_{\Gamma}\bigl{(}u^{(k)}-\overline{u}\bigr{)},\quad j=1,\cdots,J.

\frac{{\mathrm{d}}u^{(j)}}{{\mathrm{d}}t}=-C(u)D_{u}\Phi(u^{(j)};{\color[rgb]{0,0,0}{y^{\dagger}}})

\frac{{\mathrm{d}}u^{(j)}}{{\mathrm{d}}t}=-C(u)D_{u}\Phi(u^{(j)};{\color[rgb]{0,0,0}{y^{\dagger}}})

y^{†} = A u^{†} + η^{†},

y^{†} = A u^{†} + η^{†},

e^{(j)} = u^{(j)} - \overset{u}{ˉ}, r^{(j)} = u^{(j)} - u^{†} j = 1, \dots, J

e^{(j)} = u^{(j)} - \overset{u}{ˉ}, r^{(j)} = u^{(j)} - u^{†} j = 1, \dots, J

E_{l j} = ⟨ A e^{(l)}, A e^{(j)} ⟩_{Γ}, R_{l j} = ⟨ A r^{(l)}, A r^{(j)} ⟩_{Γ}, F_{l j} = ⟨ A r^{(l)}, A e^{(j)} ⟩_{Γ} l, j = 1, \dots, J,

ϑ^{(j)} = A r^{(j)} - η^{†} j = 1, \dots, J;

ϑ^{(j)} = A r^{(j)} - η^{†} j = 1, \dots, J;

D_{l j} = ⟨ ϑ^{(l)}, A e^{(j)} ⟩_{Γ} l, j = 1, \dots, J .

D_{l j} = ⟨ ϑ^{(l)}, A e^{(j)} ⟩_{Γ} l, j = 1, \dots, J .

\frac{d u ^{(j)}}{d t}

\frac{d u ^{(j)}}{d t}

\frac{d e ^{(j)}}{d t} = - \frac{1}{J} k = 1 \sum J E_{j k} e^{(k)} = - \frac{1}{J} k = 1 \sum J E_{j k} r^{(k)} .

\frac{d e ^{(j)}}{d t} = - \frac{1}{J} k = 1 \sum J E_{j k} e^{(k)} = - \frac{1}{J} k = 1 \sum J E_{j k} r^{(k)} .

\frac{d}{d t} E = - \frac{2}{J} E^{2} .

\frac{d}{d t} E = - \frac{2}{J} E^{2} .

E (t) = X Λ (t) X^{⊤}

E (t) = X Λ (t) X^{⊤}

\displaystyle\lambda^{(j)}(t)=\Big{(}{\frac{2}{J}t+\frac{1}{\lambda_{0}^{(j)}}}\Big{)}^{-1}\;,

\displaystyle\lambda^{(j)}(t)=\Big{(}{\frac{2}{J}t+\frac{1}{\lambda_{0}^{(j)}}}\Big{)}^{-1}\;,

\frac{d ϑ ^{(j)}}{d t} = - \frac{1}{J} k = 1 \sum J D_{j k} A e^{(k)}

\frac{d ϑ ^{(j)}}{d t} = - \frac{1}{J} k = 1 \sum J D_{j k} A e^{(k)}

\frac{d}{d t} D = - \frac{2}{J} D E .

\frac{d}{d t} D = - \frac{2}{J} D E .

{\color[rgb]{0,0,0}{\frac{1}{2}\frac{{\mathrm{d}}\|\vartheta^{(j)}\|_{\Gamma}^{2}}{{\mathrm{d}}t}=-\frac{1}{J}\sum_{k=1}^{J}D_{jk}D_{jk}}}\,.

{\color[rgb]{0,0,0}{\frac{1}{2}\frac{{\mathrm{d}}\|\vartheta^{(j)}\|_{\Gamma}^{2}}{{\mathrm{d}}t}=-\frac{1}{J}\sum_{k=1}^{J}D_{jk}D_{jk}}}\,.

\displaystyle{\color[rgb]{0,0,0}{D_{ij}^{2}=\langle\vartheta^{(i)},Ae^{(j)}\rangle_{\Gamma}^{2}\leq\|\vartheta^{(i)}\|_{\Gamma}^{2}\cdot\|Ae^{(j)}\|_{\Gamma}^{2}\leq C\|Ae^{(j)}\|_{\Gamma}^{2}\;}}

\displaystyle{\color[rgb]{0,0,0}{D_{ij}^{2}=\langle\vartheta^{(i)},Ae^{(j)}\rangle_{\Gamma}^{2}\leq\|\vartheta^{(i)}\|_{\Gamma}^{2}\cdot\|Ae^{(j)}\|_{\Gamma}^{2}\leq C\|Ae^{(j)}\|_{\Gamma}^{2}\;}}

\frac{1}{2} \frac{d}{d t} ∥ A r^{(j)} ∥_{Γ}^{2} = - \frac{1}{J} k = 1 \sum J F_{j k}^{2} + \frac{1}{J} k = 1 \sum J ⟨ A r^{(k)}, A e^{(k)} ⟩_{Γ} ⟨ η^{†}, A e^{(k)} ⟩_{Γ} .

\frac{1}{2} \frac{d}{d t} ∥ A r^{(j)} ∥_{Γ}^{2} = - \frac{1}{J} k = 1 \sum J F_{j k}^{2} + \frac{1}{J} k = 1 \sum J ⟨ A r^{(k)}, A e^{(k)} ⟩_{Γ} ⟨ η^{†}, A e^{(k)} ⟩_{Γ} .

A r^{(j)} (t) = k = 1 \sum J α_{k} A e^{(k)} (t) + A r_{⊥}^{(1)}

A r^{(j)} (t) = k = 1 \sum J α_{k} A e^{(k)} (t) + A r_{⊥}^{(1)}

η^{†} = k = 1 \sum J η_{k} A e^{(k)} (t) + A η_{⊥}^{(1)},

\frac{1}{2} \frac{d}{d t} ∥ A r^{(j)} ∥_{Γ}^{2}

\frac{1}{2} \frac{d}{d t} ∥ A r^{(j)} ∥_{Γ}^{2}

∥ A \overset{u}{ˉ} (t) - y^{†} ∥_{2} \leq τ trace (Γ)

∥ A \overset{u}{ˉ} (t) - y^{†} ∥_{2} \leq τ trace (Γ)

∥ A^{*} Γ^{- 1} A \overset{u}{ˉ} (t) - A^{*} Γ^{- 1/2} y^{†} ∥ \leq τ trace (A^{*} Γ^{- 1} A) .

∥ A^{*} Γ^{- 1} A \overset{u}{ˉ} (t) - A^{*} Γ^{- 1/2} y^{†} ∥ \leq τ trace (A^{*} Γ^{- 1} A) .

∥ (λ I + A^{*} Γ^{- 1} A)^{- 1/2} (A \overset{u}{ˉ} (t) - y^{†}) ∥ \leq τ trace ((λ I + A^{*} Γ^{- 1} A)^{- 1} A^{*} Γ^{- 1} A),

∥ (λ I + A^{*} Γ^{- 1} A)^{- 1/2} (A \overset{u}{ˉ} (t) - y^{†}) ∥ \leq τ trace ((λ I + A^{*} Γ^{- 1} A)^{- 1} A^{*} Γ^{- 1} A),

- \frac{d ^{2} p}{d x ^{2}} + p = u \mbox in D := (0, π), p = 0 \mbox in \partial D .

- \frac{d ^{2} p}{d x ^{2}} + p = u \mbox in D := (0, π), p = 0 \mbox in \partial D .

y^{†}

y^{†}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and numerical algorithms · Image and Signal Denoising Methods · Numerical methods in inverse problems

Full text

Convergence Analysis of Ensemble Kalman Inversion:

The Linear, Noisy Case

C. Schillingsa*∗* and A.M. Stuartb

a**Institute for Mathematics, University of Mannheim, A5, 6, 68131 Mannheim, Germany; *b**Department of Computing and Mathematical Sciences, California Institute of Technology, CA 91125, USA

∗Corresponding author. Email: [email protected]

Abstract

We present an analysis of ensemble Kalman inversion, based on the continuous time limit of the algorithm. The analysis of the dynamical behaviour of the ensemble allows us to establish well-posedness and convergence results for a fixed ensemble size. We will build on the results presented in [26] and generalise them to the case of noisy observational data, in particular the influence of the noise on the convergence will be investigated, both theoretically and numerically. We focus on linear inverse problems where a very complete theoretical analysis is possible.

{classcode}

65N21, 62F15, 65N75

keywords:

Bayesian Inverse Problems, Ensemble Kalman Filter, Parameter Identification

1 Introduction

The Kalman filter has been enormously successful since its introduction in the 1960s as a state estimation tool for linear Gaussian systems in both discrete or continuous time; see [21, sections 4.1 and 8.1] and the references therein. A natural generalisation to nonlinear state estimation is the extended Kalman filter [21, sections 4.2.2 and 8.2.2] and this was proposed as a method for numerical weather prediction in [11]. The ensemble Kalman filter [9] was introduced in state estimation problems as a way of circumventing the need to compute enormous covariance matrices when applying the extended Kalman filter to large problems such as those arising in atmosphere or ocean dynamics [8, 10, 13]. The inherent parallelisability of the method, together with its effectiveness in state estimation, has made it very popular and its use spread outside the atmosphere-ocean sciences community. In particular it has been widely adopted by the oil industry for subsurface inversion [23]. Building on this applied work in subsurface inversion, in [15] a generic ensemble Kalman inversion tool for inverse problems in the form

[TABLE]

was formulated; here the objective is to recover $u$ from $y$ , a noisy observation of ${\mathcal{G}}(u)$ and $\eta$ denotes the noise. Despite documented success as a solver for such inverse problems, there is very little analysis of the algorithm. Essentially two facts are known about the finite ensemble size regime in which it is used: that the basic form of the iteration preserves the linear span of the initial ensemble [22, 15]; and that for the linear noise free problem the method is a discretisation of a set of interacting gradient flows for the output least squares objective function associated with the linear inverse problem [26]. The combination of these two facts allows an almost complete analysis of the algorithm in the setting of the linear noise free inverse problem. The purpose of this paper is to extend those results to include the effect of noise.

It is of interest to give some insight into where the gradient flow structure comes from in this problem. Inspection of the Kalman-Bucy filter [21, section 8.1] reveals that when the drift of the signal is zero and the observed data is constant, then the equation for the mean is a gradient flow for the output least squares function related to the observation operator, preconditioned by the covariance. In [1] this observation was used to create algorithms for the analysis step in state estimation problems employing the Kalman filter, essentially by replacing the Kalman variance by the empirical covariance; a resulting gradient structure was noted and exploited. The Kalman-Bucy filter with no drift in the signal, and the analysis phase of the general filter with linear observations, are closely related to solution of a linear inverse problem. As a consequence it not unnatural that in [26] it was demonstrated that the continuous time limit of the ensemble Kalman inversion algorithm is an interacting set of gradient flows.

There are two ways of viewing algorithms for ensemble Kalman inversion. The first is simply as derivative free optimisers, in which the ensemble is used as a proxy for derivative information; this is the view put forward in [15]. The second is as a method to solve a Bayesian inverse problem. We adopt the first viewpoint throughout the paper, essentially because, as the literature survey in the next paragraph explains, there is little hope of rigorous uncertainty quantification via ensemble methods, except for linear problems. And, although our analysis is limited to the linear problem, our goal is to obtain insight into ensemble inversion methods in general.

The Bayesian approach to distributed parameter inversion allows incorporation of both model and data uncertainties and leads to a complete characterisation of the uncertainty via the posterior distribution; see [27, 4]. However, for computationally intensive applications, the computation or approximation of the posterior is, even with today’s supercomputers, often intractable. Thus ensemble inversion provides an attractive alternative which, through the ensemble, may include some information about uncertainties. The low computational costs, the straightforward implementation and its non-intrusive nature make the method appealing. In the state estimation context [25], well-posedness results for the EnKF can be found in [18, 29, 28, 19] and a large-time convergence analysis in the case of a fully observed system is presented in [5]; other interesting methods and analyses may be found in [1, 2, 24]. The analysis of the large ensemble size limit can be found in [20, 12]. For inverse problems, the large ensemble size limit is studied in [7] and, importantly for the optimisation perspective we take in this paper, demonstrated to differ from the true posterior distribution except in the linear case. In ensemble inversion, the connection to deterministic regularisation techniques and step-size strategies for nonlinear forward problems is developed in [15, 14, 16].

The linear inverse problem which we study in this paper is defined as follows: let ${\color[rgb]{0,0,0}{\mathcal{X}}}$ denote a separable Hilbert space. Furthermore, we denote by $A\in\mathcal{L}({\color[rgb]{0,0,0}{\mathcal{X}}},\mathbb{R}^{K})$ the forward response operator mapping from the parameter space ${\color[rgb]{0,0,0}{\mathcal{X}}}$ to the data space $\mathbb{R}^{K}$ . The observations are assumed to be finite-dimensional, i.e. the forward response operator maps to $\mathbb{R}^{K}$ , where $K\in\mathbb{N}$ denotes the number of observations. The goal of computation is to recover the unknown parameters $u$ from noisy observations $y$ , where

[TABLE]

The noise $\eta$ in the observations is assumed to be normally distributed with $\eta\sim\mathcal{N}(0,\Gamma)$ , $\Gamma\in\mathbb{R}^{K\times K}$ symmetric positive definite. In the Bayesian setting, the unknown parameter $u$ is interpreted as a random variable or random field, distributed according to prior $\mu_{0}$ . The Bayesian solution to the inverse problem is the conditional distribution of $u$ given $y$ , and to define this it is necessary to make an assumption on the a priori dependence structure between $u$ and $\eta;$ it is often assumed that the noise $\eta$ is independent of $u$ . As mentioned above, in this paper we present an analysis of ensemble inversion viewed as a minimisation method applied to the least-squares functional

[TABLE]

where the norm $\|\cdot\|_{\Gamma}=\|\Gamma^{-1/2}\cdot\|_{2}$ corresponds to the Euclidean norm weighted by the square-root of the inverse noise covariance matrix. Accordingly, we define by $\langle\cdot,\cdot\rangle_{\Gamma}=\langle\Gamma^{-1/2}\cdot,\Gamma^{-1/2}\cdot\rangle$ the corresponding inner product. The realisation of the random variable $y$ , i.e. the observed data, is denoted by $y^{\dagger}$ . The prior $\mu_{0}$ plays a role in the optimisation perspective as the initial ensemble is typically drawn from $\mu_{0}$ .

In order to facilitate analysis we work with continuous time limit of the ensemble inversion algorithm [26]. The classic implementation of ensemble Kalman inversion, in which the observed data $y^{\dagger}$ is perturbed by the addition of independent draws from the distribution of $\eta$ , leads to a stochastic differential equation (SDE) limit; the simplification in which the observed data $y^{\dagger}$ is unperturbed leads to an ordinary differential equation (ODE) in the limit. We work with the ODE limit in this paper. What distinguishes our analysis from that appearing in [26] is that we study the case where the observed data $y^{\dagger}$ appearing in the ODE is assumed to contain noise– i.e. it is not simply the image of a truth $u^{\dagger}$ under $A$ ; we refer to this as the noisy, linear setting.

The paper is structured as follows. In Section 2, we introduce ensemble Kalman inversion and derive the continuous time limit of the algorithm. We study the properties of the method by analysing the dynamical behaviour of the ensemble and derive convergence results by considering the long-time behaviour. We present, in Section 3, well-posedness results, quantification of the ensemble collapse and convergence results for the noisy, linear setting. Numerical experiments illustrating the findings are presented in Section 4.

2 The Ensemble Kalman Inversion and its Continuous Time Limit

The ensemble inversion method that we study is given in [15]. By introducing an artificial time $h=1/N$ for a given integer $N$ , the method propagates an ensemble $\{u_{n}^{(j)}\}_{n=0}^{N}$ of $J$ particles, $J\in\mathbb{N}$ , at discrete time $nh$ into an ensemble at time $(n+1)h$ according to the formula

[TABLE]

Here

[TABLE]

The analysis we present here relies on the continuous time limit of ensemble Kalman inversion. We therefore interpret the iterates $u_{n}^{(j)}$ as a discretisation of a continuous function $u^{(j)}(nh)$ . In this context the argument for the appearance of scaling $h^{-1}$ multiplying $\Gamma$ in the update formula is given in [15]. If we let $h\to 0$ and interpret the iterations as a timestepping scheme, then the continuous time limit is given by

[TABLE]

or equivalently

[TABLE]

with potential $\Phi(u;y^{\dagger})$ given by (2). Equation (4) reveals the well-known subspace property of ensemble Kalman inversion [15], since the vector field is in the linear span of the ensemble itself. We re-emphasize that the derivation is based on the simplified version of the classic ensemble Kalman inversion scheme in which perturbations of the observed data $y^{\dagger}$ are set to zero.

3 Convergence Analysis

This section is devoted to a generalisation of the results from [26] to allow for noise in the observational data; specifically we consider the case that the observational data $y^{\dagger}$ is polluted by additive noise $\eta^{\dagger}\in\mathbb{R}^{K}$ in the following way:

[TABLE]

where $u^{\dagger}$ denotes the truth and $\eta^{\dagger}$ a realisation of noise. In subsection 3.1 we will demonstrate the undesirable effect of noise on the inversion methodology, and in subsection 3.2 we will suggest a stopping criterion to ameliorate the effect.

3.1 Analysis of Ensemble Kalman Inversion With Noisy Data

Following the notation introduced in [26], we introduce the quantities

[TABLE]

and the misfit $\vartheta^{(j)}=Au^{(j)}-y^{\dagger}{\color[rgb]{0,0,0}{=A(u^{(j)}-u^{\dagger})-\eta^{\dagger}}},\ j=1,\ldots,J$ . The quantity $e^{(j)}$ measures, for each particle $j$ , the difference to the empirical mean (computed from the ensemble) and the quantity $r^{(j)}$ measures the difference from particle $j$ to the truth, i.e. the residuals. The matrix-valued quantities describe the interaction of these quantities mapped to the observation space. Note that the mapped residuals $Ar^{(j)}=A(u^{(j)}-u^{\dagger})$ are related to the misfit by

[TABLE]

from this it is apaprent that the misfit is a finite dimensional quantity in $\mathbb{R}^{K}$ . Furthermore, we define the matrix-valued quantity $D$ by

[TABLE]

Theorem 3.1.

Let $y^{\dagger}$ denote the perturbed image of a truth $u^{\dagger}\in\mathcal{X}:y^{\dagger}=Au^{\dagger}+\eta^{\dagger}$ for some $\eta^{\dagger}\in\mathbb{R}^{K}$ . Furthermore, an initial ensemble $u^{(j)}(0)\in{\color[rgb]{0,0,0}{\mathcal{X}}}$ for $j=1,\dots,J$ is given, and we denote by ${\color[rgb]{0,0,0}{\mathcal{X}_{0}}}$ the linear span of the $\{u^{(j)}(0)\}_{j=1}^{J}.$ Then, equation (4) has a unique solution $u^{(j)}(\cdot)\in C([0,T);{\color[rgb]{0,0,0}{\mathcal{X}_{0}}})$ for $j=1,\dots,J.$

Proof.

The preservation of $\mathcal{X}_{0}$ by the ensemble Kalman iteration, and its continuous time limit, is not affacted by the presence of noise in the data $y^{d}agger.$ Each particle $u^{(j)}$ satisfies

[TABLE]

We have used the fact that $\sum_{k=1}^{J}D_{jk}=0$ , thus $\sum_{k=1}^{J}D_{jk}\bar{u}=0$ . The preservation of $\mathcal{X}_{0}$ and the local Lipschitz continuity of the right-hand side of (10) ensures the local existence of a solution in $C([0,T);{\color[rgb]{0,0,0}{\mathcal{X}_{0}}})$ for $T>0$ . To establish global existence of solutions, we now show the boundedness of the right-hand side of (10).

The following differential equation holds for the quantity $e^{(j)}$ :

[TABLE]

For the matrix-valued quantity $E$ , we obtain

[TABLE]

Thus, the dynamical behaviour of the quantities $e^{(j)}$ and $Ae^{(j)}$ is not influenced by the noise in the data. Therefore, the results presented in [26] for the noise free case still hold: for the orthogonal matrix $X$ defined through the eigendecomposition of $E(0)$ it follows that

[TABLE]

with $\Lambda(t)=\mbox{diag}\{\lambda^{(1)}(t),\ldots,\lambda^{(J)}(t)\}$ , $\Lambda(0)=\mbox{diag}\{\lambda_{0}^{(1)},\ldots,\lambda_{0}^{(J)}\}$ and

[TABLE]

if $\lambda_{0}^{(j)}\neq 0$ , otherwise $\lambda^{(j)}(t)=0$ . This proves that the matrix $E$ , and hence all its elements, are globally bounded in time.

The misfit $\vartheta^{(j)}$ satisfies

[TABLE]

and the dynamical behaviour of the corresponding matrix-valued quantity $D$ is given by

[TABLE]

The boundedness of $D(t)$ follows from the boundedness of the misfit $\vartheta^{(j)}$ , which can be derived from

[TABLE]

Hence, the misfit $\vartheta^{(j)}$ is bounded uniformly in time. By the Cauchy-Schwarz inequality, the bound on $D$ follows with

[TABLE]

for a constant $C>0$ independent of $T$ . This establishes that $D_{ij}\rightarrow 0$ at least as fast as $\frac{1}{\sqrt{t}}$ as $t\rightarrow\infty$ , in particular, $D$ is uniformly bounded in time. Note that the convergence rate follows from the convergence rate $1$ of the quantity $\|Ae^{(j)}\|_{\Gamma}^{2}$ established in (13). Global existence for $u^{(j)}$ (and $e^{(j)}$ , $r^{(j)}$ ) follows. ∎

The proof of Theorem 3.1 reveals that the behaviour of the quantity $e^{(j)}$ , which is an indicator of the ensemble collapse, is not affected by the noise. Hence, [26, Theorem 3] can be directly generalised to the perturbed case.

Corollary 3.2.

Let $y^{\dagger}$ denote the perturbed image of a truth $u^{\dagger}\in\mathcal{X}:y^{\dagger}=Au^{\dagger}+\eta^{\dagger}$ for some $\eta^{\dagger}\in\mathbb{R}^{K}$ . Furthermore, assume that an initial ensemble $u^{(j)}(0)\in{\color[rgb]{0,0,0}{\mathcal{X}}}$ for $j=1,\dots,J$ is given. Then, the matrix valued quantity $E(t)$ converges to [math] for $t\to\infty$ with an algebraic rate of convergence: $\|E(t)\|={\mathcal{O}}(Jt^{-1}).$

The ensemble collapse is a further form of regularisation as the solution not only remains in the linear span of the initial ensemble, but actually asymptotically lives in the span of a single element, provided that the forward response operator $A$ is one-to-one . The preceding result shows that the ensemble collapse, namely the fact that all particles converge to their common mean, does not depend on the realisation of the noise. We now discuss the convergence properties of ensemble Kalman inversion in the noisy case. The analysis presented in [26, Theorem 4] indicates that we can transfer the convergence result straightforwardly to the mismatch $\vartheta^{(j)}$ . However, the convergence of the residuals $r^{(j)}$ depends on the realisation of the noise.

Theorem 3.3.

Let $y^{\dagger}$ denote the noisy image of a truth $u^{\dagger}\in\mathcal{X}:y^{\dagger}=Au^{\dagger}+\eta^{\dagger}$ for some $\eta^{\dagger}\in\mathbb{R}^{K}$ . Assume further that the forward operator $A$ is one-to-one. Let $\mathcal{Y}^{\|}$ denote the linear span of the $\{{A}e^{(j)}(0)\}_{j=1}^{J}$ and let $\mathcal{Y}^{\perp}$ denote the orthogonal complement of $\mathcal{Y}^{\|}$ in $\mathbb{R}^{K}$ and assume that the initial ensemble members are chosen so that $\mathcal{Y}^{\|}$ has the maximal dimension $\min\{J-1,\dim(Y)\}.$ Then $\vartheta^{(j)}(t)$ may be decomposed uniquely as $\vartheta^{(j)}_{\|}(t)+\vartheta^{(j)}_{\perp}(t)$ with $\vartheta^{(j)}_{\|}\in\mathcal{Y}^{\|}$ and $\vartheta^{(j)}_{\perp}\in\mathcal{Y}^{\perp}$ , where ${\vartheta^{(j)}_{\|}}(t)\to 0$ as $t\to\infty$ and $\vartheta^{(j)}_{\perp}(t)=\vartheta^{(j)}_{\perp}(0)={\vartheta^{(1)}_{\perp}}.$

Furthermore, if $\langle\eta^{\dagger},Ae^{(k)}\rangle\leq\langle Ar^{(k)},Ae^{(k)}\rangle$ , the mapped residual is monotonically decreasing. The rate of convergence of the component of the residual mapped forward to the observational space, which belongs to $\mathcal{Y}^{\|}$ , can be arbitrarily slow, i.e. depending on the realisation of the noise, the rate of convergence can be arbitrarily close to [math].

Proof.

The first part of the theorem follows with the same arguments as used for the proof of [26, Theorem 4]. For the second part we observe that the norm of the mapped residuals satisfies the following differential equation:

[TABLE]

Provided that $\langle\eta^{\dagger},Ae^{(k)}\rangle\leq\langle Ar^{(k)},Ae^{(k)}\rangle_{\Gamma}$ for $k=1,\ldots,J$ , i.e. $\|\eta^{\dagger}\|_{\Gamma}\cos(\theta_{1})\leq{\color[rgb]{0,0,0}{\|Ar^{(k)}\|_{\Gamma}}}\cos(\theta_{2})$ with $\theta_{1}$ and $\theta_{2}$ denoting the angle between $\eta^{\dagger}$ and $Ae^{(k)}$ , and between $Ar^{(k)}$ and $Ae^{(k)}$ , respectively, the residuals mapped to the image space of the forward operator are monotonically decreasing. Expanding the quantities $Ar^{(k)}$ and $\eta^{\dagger}$ in $\mathcal{Y}^{\|}$ and the orthogonal complement $\mathcal{Y}^{\perp}$

[TABLE]

cp. [26, Lemma 8] yields

[TABLE]

If the coefficients of the noise are of the size of $\alpha_{k}$ , the right hand side becomes zero and the claim follows. ∎

3.2 Stopping Criteria for Ensemble Kalman Inversion

The Bayesian derivation of the ensemble Kalman inversion algorithm given in [26] suggests an integration of the limiting equation (4) up to time $T=1$ . This can be interpreted as an a priori regularisation strategy motivated by the probabilistic viewpoint. However, this stopping rule does not take into account the actual realisation of the noise nor the additional regularisation effect due to the ensemble collapse. Indeed ensemble collapse is caused by removing random noisy perturbations within the algorithm, causing an underestimation of the variance for linear Gaussian problems, suggesting that stopping at time $T=1$ may no longer be the right choice as the Bayesian connection can no longer be justified. Our numerical experiments will indeed show that the Bayesian stopping strategy often leads to a stopping criterion for the unperturbed algorithm which is too early.

The papers [14, 16] suggest an approach to regularising discrete-time ensemble Kalman inversion methods, based on an analogy with deterministic iterative methods such as Levenberg-Marquardt. Unfortunately this methodology does not transfer directly to our continuous time setting as it corresponds to an adaptive time step, rather than the fixed time-step $h$ used in the derivation above. The proof of Theorem 3.3 suggests an a posteriori stopping criterion for the method. In the deterministic setting, Morozov’s discrepancy principle is a widely used and well understood stopping rule, see [6] and the references therein. The idea of this stopping rule is that, due to noisy data, the information in the observations cannot be distinguished from the noise for a mapped residual which is on the order of the noise level $\delta$ . This suggests that asking for a mapped residual with discrepancy smaller than $\delta$ may lead to fitting of the unknown parameters to the noise. We will numerically investigate the discrepancy principle as a suitable criterion in the presented setting. Furthermore, we note that if the noise is orthogonal to the space spanned by the linear ensemble, then Theorem 3.3 shows the convergence of the mapped residuals in the image space.

Motivated by the deterministic regularisation methods, the discrepancy principle is generalised to statistical noise; see [17] for example. The iterations of the iterative ensemble method will be stopped when

[TABLE]

where $\tau>1$ is a given parameter and $\bar{u}(t)$ denotes the empirical mean of the ensemble at artificial time $t$ . Here, the average noise level $\mathbb{E}(\|\eta\|_{2}^{2})={\rm trace}{(\Gamma)}$ is taken into account. (Since the noise in the observations is assumed to be normally distributed realisations of the noise cannot be bounded from above and below.)

The discrepancy principle for statistical noise (18) does not generalise to the infinite or high-dimensional setting, as the residual is no longer a well-defined quantity. In [3], symmetrisation is suggested to overcome this problem leading to the stopping criterion

[TABLE]

In order to obtain optimal rates, the authors in [3] suggest modifying this discrepancy principle to

[TABLE]

where $\lambda>0$ is a given fixed parameter. The analysis presented in [3] proving optimality of the strategy is not directly applicable to the ensemble Kalman inversion methodology that we study here, due to the nonlinear nature of the ensemble algorithms. However, we will observe in the numerical experiments that the modified version of the discrepancy principle leads to satisfactory results.

We also remark that stopping strategies taking into account the Bayesian viewpoint on the inverse problem lead to appealing alternatives. Assuming a Gaussian prior distribution for example, the parametrised variance can be modelled as a hyperparameter, which can then be estimated from the data. Due to the regularisation effect of the ensemble, this can be viewed as an alternative stopping / regularisation strategy. Closely related is the idea of variance inflation, which can be interpreted in a similar way. The work presented here, however, is restricted to the deterministic setting, not taking into account the Bayesian viewpoint. The analysis of the stopping rules requires therefore a different setting, which is beyond the scope of the paper.

4 Numerical Experiments

The forward model is described by the one dimensional elliptic equation

[TABLE]

The solution operator of the model is a mapping $G:L^{2}(D)\to H^{2}(D)\cap H^{1}_{0}(D)$ taking $u$ into $p.$ The solution is observed at $K=2^{4}-1$ equispaced observation points at $x_{k}=\frac{k}{2^{4}},k=1,\ldots,2^{4}-1$ , which defines the observation operator ${\mathcal{O}}:H^{2}(D)\cap H^{1}_{0}\to\mathbb{R}^{K}$ , i.e. the operator $A$ is a mapping from $L^{2}(D)$ to $\mathbb{R}^{K}$ defined by the composition of the solution operator and the (pointwise) observation operator. We use a finite element method with continuous, piecewise linear ansatz functions on a uniform mesh with meshwidth $h=2^{-8}$ to solve the forward problem (the spatial discretisation leads to a discretisation of $u$ , i.e. $u\in\mathbb{R}^{2^{8}-1}$ ).

Then, the inverse problem consists of recovering the unknown data $u$ from noisy observations

[TABLE]

The measurement noise is chosen to be normally distributed, $\eta\sim\mathcal{N}(0,\gamma I)$ , $\gamma=0.01^{2}\in\mathbb{R},\ I\in\mathbb{R}^{K\times K}$ . Furthermore, the prior is $\mu_{0}=N(0,C_{0})$ with covariance operator $C_{0}=10(-\Delta)^{-1}$ . Here, we consider the Laplacian $\Delta$ with domain $H^{2}(D)\cap H^{1}_{0}(D)$ . The initial ensemble is based on the eigendecomposition of the covariance operator $C_{0}$ , i.e. $u^{(j)}(0)=\sqrt{\lambda_{j}}\zeta_{j}z_{j}$ with $\zeta_{j}\sim\mathcal{N}(0,1)$ for $j=1,\ldots,J$ and $\{\lambda_{j},z_{j}\}_{j\in\mathbb{N}}$ denoting (the explicitly known) eigenvalues and eigenfunctions of $C_{0}$ .

To illustrate and numerically verify the results presented in this paper, we investigate the dynamical behaviour of the quantities $e,r$ and the misfit $\vartheta$ . The theoretical results presented hold true for any ensemble size; we consider in the following a rather small ensemble of size $J=5$ . For the sake of presentation, the empirical mean (and minimum and maximum deviations) of the ensemble is shown.

To investigate the convergence results further, we compare the performance of three ensembles (all of size $J=5$ ): the first one (shown in red) is based on the first five terms in the Karhunen-Loève (KL) expansion of the covariance operator $C_{0}$ , the second one (shown in blue) is chosen such that the contribution of $Ar_{\perp}(t)$ in Theorem 3.3 is minimised (i.e. $Ar^{(1)}=\sum_{k=1}^{J}\alpha_{k}Ae^{(k)}$ for some coefficients $\alpha_{k}\in\mathbb{R}$ . Given $u^{(2)},\ldots,u^{(J)}$ and coefficients $\alpha_{1},\ldots,\alpha_{J}$ , we define $u^{(1)}=(1-\alpha_{1}+\sum_{k=1}^{J}\alpha_{k}/J)^{-1}(u^{\dagger}-\alpha_{1}/J\sum_{j=2}^{J}\ u^{(j)}+\sum_{k=2}^{J}\alpha_{k}u^{(k)}-\alpha_{k}/J\sum_{j=2}^{J}u^{(j)})$ ), the third ensemble (shown in grey) is chosen such that the contribution of $\vartheta(t)_{\perp}$ in Theorem 3.3 is minimised (i.e. $\vartheta^{(1)}=\sum_{k=1}^{J}\alpha_{k}Ae^{(k)}$ for some coefficients $\alpha_{k}\in\mathbb{R}$ . Given $u^{(2)},\ldots,u^{(J)}$ and coefficients $\alpha_{1},\ldots,\alpha_{J}$ , we define $u^{(1)}=(1-\alpha_{1}+\sum_{k=1}^{J}\alpha_{k}/J)^{-1}(\tilde{u}-\alpha_{1}/J\sum_{j=2}^{J}\ u^{(j)}+\sum_{k=2}^{J}\alpha_{k}u^{(k)}-\alpha_{k}/J\sum_{j=2}^{J}u^{(j)})$ , where $\tilde{u}$ is the minimiser of the underdetermined least-squares problem).

In practice, the second strategy is not implementable, since the truth is used to construct the ensemble. However, the performance of the second strategy gives useful insight into the convergence behaviour of ensemble Kalman inversion.

The ensemble collapse is not affected by the choice of the initial ensemble. We observe the predicted algebraic rate of convergence to the empirical mean, cp Figure 1.

The convergence behaviour of the mapped residuals and the misfit, both projected to the subspace spanned by the initial ensemble and the complement are shown in the Figures 2 and 3.

The algebraic rate of the misfit is clearly confirmed. Furthermore, the convergence behaviour of the mapped residuals for the KL based ensemble (shown in red in Figure 3) illustrates the arbitrarily slow convergence predicted by the theory, i.e. we observe a convergence rate deteriorating to [math]. For the other two ensembles, we even observe an increase in the mapped residual, since the angle conditions are not satisfied. The comparison of the resulting estimates with the truth reveals the strong overfitting effect of the third ensemble, cp Figure 4. This behaviour is expected due to the construction of the ensemble, which implies an amplification of the noise in the data.

To illustrate the effect of the angle condition and the resulting degradation of the convergence order of the mapped residuals, we repeat the experiments with noise in the data, which is orthogonal to the subspace spanned by the initial ensemble. The theoretical results suggest an algebraic rate of convergence, which can be confirmed by the results presented in Figure 5.

The result on the ensemble collapse Corollary 3.2 indicates that the regularisation effect of the method strongly depends on the number of particles in the ensemble. The Bayesian stopping rule, which can be interpreted as an a priori stopping rule, does not reflect this behaviour. The results of the Bayesian stopping rule are summarised in Figures 6-9.

We will show in the following that the discrepancy principle leads to suitable stopping strategy, in particular, it has the potential to substantially improve the accuracy of the ensemble Kalman inversion estimate. To do so, we repeat the experiments with $10$ randomly chosen ensembles (based on the KL expansion of the prior covariance operator) of size $J=5$ and $J=50$ . The noise in the data is randomly chosen from $\mathcal{N}(0,\gamma I)$ with $\gamma=0.01^{2}\in{\color[rgb]{0,0,0}{\mathbb{R}}}$ . Motivated by the previous discussion on the discrepancy principle, we implement a stopping rule of the form $\|A\bar{u}(t)-y^{\dagger}\|_{2}\leq 1.2\sqrt{K}\gamma$ , where $K$ denotes the number of observations. Figures 10 - 13 show the comparison of the estimates based on the discrepancy principle with the truth.

As the deterministic discrepancy principle is not well-defined in the high- and infinite dimensional setting, the experiments are repeated with the modified, symmetrised discrepancy principle. Results are presented in Figures 14-17.

We observe that the overfitting effect is much more pronounced for the larger ensemble of size $50$ , cp. the empirical residuals in Figure 11 and Figure 13. The KL expansion of the first $50$ terms includes more fine-scale (oscillatory) details, which can be fitted to the noise in the observational data and therefore cause the overfitting effect. The smaller ensemble based on the first $5$ terms of the KL expansion avoids the overfitting effect due to the smaller ensemble size leading to a faster ensemble collapse, but also due to the smoothness of the first KL terms, i.e. the subspace property preserves the smoothness of the KL terms. Furthermore, we note that the discrepancy principle leads in all experiments to a stopping time larger than $1$ (Bayesian stopping rule), which leads for all experiments to a further improvement in the estimate. Due to the delayed ensemble collapse, the stopping times for the larger ensemble are on average greater than the ones for the smaller ensemble. The experiments suggest that an a posteriori stopping rule can significantly improve the performance of the ensemble Kalman inversion. This observation is consistent with previous works on stopping rules for ensemble Kalman inversion, cp. [14].

5 Conclusions

The presented analysis of the ensemble Kalman filter for inverse problems shows that the well-posedness results and the quantification of the ensemble collapse derived in [26] can be straightforwardly generalised to the noisy case. However, the convergence behaviour of the ensemble is strongly affected by the noise in the observational data and no convergence rate of the mapped residuals can be proven: the convergence rate can be arbitrarily slow. The numerical experiments confirm the theory. In addition, the numerical experiments demonstrate the importance of an appropriate stopping rule in the presence of noise in order to avoid the well-known overfitting effect. It is also shown that the ensemble itself has a regularisation effect, caused by the ensemble collapse as well as by the chosen initialisation of the ensemble in terms of the KL expansion. Variants of the methods such as variance inflation or localisation may delay or prevent the ensemble collapse, thus they strongly influence the regularisation of the method. Stopping rules need to take this into account in order to avoid overfitting; however, the optimal strategy to use may strongly depend on the variant of the algorithm which is used. Even though the presented results are confined to the linear case, they provide useful insights into the performance of the filter in the presence of noise and can also enhance our understanding of the nonlinear case.

Acknowledgments Both authors are grateful to the careful reading, and suggestions, of two referees. They are also grateful to the EPSRC Programme Grant EQUIP for funding of this research. AMS is also thanks DARPA (W911NF-15-2-0121) and ONR (N00014-17-1-2079) for funding parts of this research.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. Bergemann and S. Reich , A localization technique for ensemble Kalman filters , Quarterly Journal of the Royal Meteorological Society, 136 (2010), pp. 701–707.
2[2] , A mollified ensemble Kalman filter , Quarterly Journal of the Royal Meteorological Society, 136 (2010), pp. 1636–1643.
3[3] G. Blanchard and P. Mathé , Discrepancy principle for statistical inverse problems with application to conjugate gradient iteration , Inverse Problems, 28 (2012), p. 115011.
4[4] M. Dashti and A.M.Stuart , The Bayesian approach to inverse problems , in Handbook of Uncertainty Quantification, R. Ghanem, D. Higdon, and H. Owhadi, eds., Springer, 2015.
5[5] J. de Wiljes, S. Reich, and W. Stannat , Long-time stability and accuracy of the ensemble Kalman-Bucy filter for fully observed processes and small measurement noise , Ar Xiv e-prints, (2016).
6[6] H. Engl, M. Hanke, and A. Neubauer , Regularization of Inverse Problems , vol. 375, Springer Science & Business Media, 1996.
7[7] O. Ernst, B. Sprungk, and H. Starkloff , Analysis of the ensemble and polynomial chaos Kalman filters in Bayesian inverse problems , ar Xiv preprint ar Xiv:1504.03529, (2015).
8[8] G. Evensen , The ensemble Kalman filter: Theoretical formulation and practical implementation , Ocean dynamics, 53 (2003), pp. 343–367.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Convergence Analysis of Ensemble Kalman Inversion:

Abstract

keywords:

1 Introduction

2 The Ensemble Kalman Inversion and its Continuous Time Limit

3 Convergence Analysis

3.1 Analysis of Ensemble Kalman Inversion With Noisy Data

Theorem 3.1**.**

Proof.

Corollary 3.2**.**

Theorem 3.3**.**

Proof.

3.2 Stopping Criteria for Ensemble Kalman Inversion

4 Numerical Experiments

5 Conclusions

Theorem 3.1.

Corollary 3.2.

Theorem 3.3.