Diffusion map-based algorithm for Gain function approximation in the   Feedback Particle Filter

Amirhossein Taghvaei; Prashant G. Mehta; Sean P. Meyn

arXiv:1902.07263·math.OC·October 1, 2019

Diffusion map-based algorithm for Gain function approximation in the Feedback Particle Filter

Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

PDF

TL;DR

This paper presents a rigorous error analysis of a diffusion map-based algorithm for approximating the gain function in the Feedback Particle Filter, addressing bias and variance components with numerical validation.

Contribution

The paper provides the first rigorous error bounds for the diffusion map-based gain function approximation in FPF, including bias and variance analysis.

Findings

01

Bias and variance bounds derived for the algorithm

02

Numerical experiments illustrate effects of dimension and sample size

03

Algorithm applied successfully to filtering examples and compared with SIR filter

Abstract

Feedback particle filter (FPF) is a numerical algorithm to approximate the solution of the nonlinear filtering problem in continuous-time settings. In any numerical implementation of the FPF algorithm, the main challenge is to numerically approximate the so-called gain function. A numerical algorithm for gain function approximation is the subject of this paper. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian $Δ_{ρ}$ . The numerical problem is to approximate this solution using {\em only} finitely many particles sampled from the probability distribution $ρ$ . A diffusion map-based algorithm was proposed by the authors in a prior work to solve this problem. The algorithm is named as such because it involves, as an intermediate step, a diffusion map approximation of the exact semigroup $e^{Δ_{ρ}}$ . The original…

Equations388

State process: d X_{t}

State process: d X_{t}

Observation process: d Z_{t}

exactness condition E [f (X_{t}) ∣ Z_{t}] = Step 1 E [f (\overset{ˉ}{X}_{t}) ∣ Z_{t}] \approx Step 2 \frac{1}{N} i = 1 \sum N f (X_{t}^{i}) .

exactness condition E [f (X_{t}) ∣ Z_{t}] = Step 1 E [f (\overset{ˉ}{X}_{t}) ∣ Z_{t}] \approx Step 2 \frac{1}{N} i = 1 \sum N f (X_{t}^{i}) .

d \overset{ˉ}{X}_{t} = propagation a (\overset{ˉ}{X}_{t}) d t + d \overset{ˉ}{B}_{t} + feedback control law K_{t} (\overset{ˉ}{X}_{t}) \circ (d Z_{t} - \frac{h ( X ˉ _{t} ) + h ^ _{t}}{2} d t), \overset{ˉ}{X}_{0} \sim p_{0},

d \overset{ˉ}{X}_{t} = propagation a (\overset{ˉ}{X}_{t}) d t + d \overset{ˉ}{B}_{t} + feedback control law K_{t} (\overset{ˉ}{X}_{t}) \circ (d Z_{t} - \frac{h ( X ˉ _{t} ) + h ^ _{t}}{2} d t), \overset{ˉ}{X}_{0} \sim p_{0},

Poisson equation: \frac{1}{p _{t} ( x )} \nabla \cdot (p_{t} (x) \nabla ϕ_{t} (x)) = - (h (x) - \hat{h}_{t}), \forall x \in R^{d},

Poisson equation: \frac{1}{p _{t} ( x )} \nabla \cdot (p_{t} (x) \nabla ϕ_{t} (x)) = - (h (x) - \hat{h}_{t}), \forall x \in R^{d},

d X_{t}^{i} = a (X_{t}^{i}) d t + d B_{t}^{i} + K_{t}^{(N)} (X_{t}^{i}) \circ (d Z_{t} - \frac{h ( X _{t}^{i} ) + h ^ _{t}^{(N)}}{2} d t), X_{0}^{i} \sim i.i.d p_{0},

d X_{t}^{i} = a (X_{t}^{i}) d t + d B_{t}^{i} + K_{t}^{(N)} (X_{t}^{i}) \circ (d Z_{t} - \frac{h ( X _{t}^{i} ) + h ^ _{t}^{(N)}}{2} d t), X_{0}^{i} \sim i.i.d p_{0},

Gain function approximation: K_{t}^{(N)} := Algorithm ({X_{t}^{i}}_{i = 1}^{N}; h) .

Gain function approximation: K_{t}^{(N)} := Algorithm ({X_{t}^{i}}_{i = 1}^{N}; h) .

K_{t} (x) \equiv \overset{ˉ}{Σ}_{t} H^{⊤}, \forall x \in R^{d} .

K_{t} (x) \equiv \overset{ˉ}{Σ}_{t} H^{⊤}, \forall x \in R^{d} .

d \overset{ˉ}{X}_{t} = A \overset{ˉ}{X}_{t} d t + d \overset{ˉ}{B}_{t} + \overset{ˉ}{Σ}_{t} H^{⊤} (d Z_{t} - \frac{H X ˉ _{t} + H m ˉ _{t}}{2} d t), \overset{ˉ}{X}_{0} \sim p_{0} .

d \overset{ˉ}{X}_{t} = A \overset{ˉ}{X}_{t} d t + d \overset{ˉ}{B}_{t} + \overset{ˉ}{Σ}_{t} H^{⊤} (d Z_{t} - \frac{H X ˉ _{t} + H m ˉ _{t}}{2} d t), \overset{ˉ}{X}_{0} \sim p_{0} .

d X_{t}^{i} = A X_{t}^{i} d t + d B_{t}^{i} + K_{t}^{(N)} (d Z_{t} - \frac{H X _{t}^{i} + H m _{t}^{(N)}}{2} d t), X_{0}^{i} \sim i.i.d p_{0},

d X_{t}^{i} = A X_{t}^{i} d t + d B_{t}^{i} + K_{t}^{(N)} (d Z_{t} - \frac{H X _{t}^{i} + H m _{t}^{(N)}}{2} d t), X_{0}^{i} \sim i.i.d p_{0},

m_{t}^{(N)}

m_{t}^{(N)}

Const. gain approx: E [K_{t} (X_{t}) ∣ Z_{t}]

Const. gain approx: E [K_{t} (X_{t}) ∣ Z_{t}]

\approx \frac{1}{N} i = 1 \sum N (h (X_{t}^{i}) - \hat{h}_{t}^{(N)}) X_{t}^{i} .

\nabla \cdot (p_{t} (x) K (x)) = (rhs),

\nabla \cdot (p_{t} (x) K (x)) = (rhs),

- Δ_{ρ} ϕ = h - \hat{h}_{ρ},

- Δ_{ρ} ϕ = h - \hat{h}_{ρ},

- Δ_{ρ} f = m = 1 \sum \infty λ_{m} ⟨ e_{m}, f ⟩ e_{m} .

- Δ_{ρ} f = m = 1 \sum \infty λ_{m} ⟨ e_{m}, f ⟩ e_{m} .

\int_{R^{d}} (f - \hat{f}_{ρ})^{2} ρ d x \leq \frac{1}{λ _{1}} \int_{R^{d}} ∣\nabla f ∣^{2} ρ d x, \forall f \in H^{1} (ρ),

\int_{R^{d}} (f - \hat{f}_{ρ})^{2} ρ d x \leq \frac{1}{λ _{1}} \int_{R^{d}} ∣\nabla f ∣^{2} ρ d x, \forall f \in H^{1} (ρ),

\int \nabla ϕ (x) \cdot \nabla ψ (x) ρ (x) d x = \int (h (x) - \hat{h}_{ρ}) ψ (x) ρ (x) d x \forall ψ \in H^{1} (ρ) .

\int \nabla ϕ (x) \cdot \nabla ψ (x) ρ (x) d x = \int (h (x) - \hat{h}_{ρ}) ψ (x) ρ (x) d x \forall ψ \in H^{1} (ρ) .

\int ∣\nabla ϕ (x) ∣^{2} ρ (x) d x \leq \frac{1}{λ _{1}} \int (h (x) - \hat{h}_{ρ})^{2} ρ (x) d x .

\int ∣\nabla ϕ (x) ∣^{2} ρ (x) d x \leq \frac{1}{λ _{1}} \int (h (x) - \hat{h}_{ρ})^{2} ρ (x) d x .

\int \frac{\partial ϕ}{\partial x _{m}} (x) ρ (x) d x = \int (h (x) - \hat{h}_{ρ}) x_{m} ρ (x) d x, for m = 1, \dots, d,

\int \frac{\partial ϕ}{\partial x _{m}} (x) ρ (x) d x = \int (h (x) - \hat{h}_{ρ}) x_{m} ρ (x) d x, for m = 1, \dots, d,

d S_{t} = - \nabla V (S_{t}) d t + 2 d B_{t},

d S_{t} = - \nabla V (S_{t}) d t + 2 d B_{t},

P_{t} f (x) = E [f (S_{t}) ∣ S_{0} = x] .

P_{t} f (x) = E [f (S_{t}) ∣ S_{0} = x] .

P_{t} f (x) = m = 1 \sum \infty e^{- t λ_{m}} ⟨ e_{m}, f ⟩ e_{m} (x) = \int_{R^{d}} \overset{ˉ}{k}_{t} (x, y) f (y) ρ (y) d y,

P_{t} f (x) = m = 1 \sum \infty e^{- t λ_{m}} ⟨ e_{m}, f ⟩ e_{m} (x) = \int_{R^{d}} \overset{ˉ}{k}_{t} (x, y) f (y) ρ (y) d y,

\frac{\partial u}{\partial t} = Δ_{ρ} u + (h - \hat{h}_{ρ}), u (0, x) = f (x) .

\frac{\partial u}{\partial t} = Δ_{ρ} u + (h - \hat{h}_{ρ}), u (0, x) = f (x) .

u (t, x) = P_{t} f (x) + \int_{0}^{t} P_{t - s} (h - \hat{h}_{ρ}) (x) d s .

u (t, x) = P_{t} f (x) + \int_{0}^{t} P_{t - s} (h - \hat{h}_{ρ}) (x) d s .

(exact fixed-point equation) ϕ = P_{ϵ} ϕ + \int_{0}^{ϵ} P_{s} (h - \hat{h}_{ρ}) d s .

(exact fixed-point equation) ϕ = P_{ϵ} ϕ + \int_{0}^{ϵ} P_{s} (h - \hat{h}_{ρ}) d s .

T_{ϵ} f (x) := \frac{1}{n _{ϵ} ( x )} \int_{R^{d}} k_{ϵ} (x, y) f (y) ρ (y) d y,

T_{ϵ} f (x) := \frac{1}{n _{ϵ} ( x )} \int_{R^{d}} k_{ϵ} (x, y) f (y) ρ (y) d y,

k_{ϵ} (x, y) := \frac{g _{ϵ} ( x , y )}{\int g _{ϵ} ( x , z ) ρ ( z ) d z \int g _{ϵ} ( y , z ) ρ ( z ) d z},

k_{ϵ} (x, y) := \frac{g _{ϵ} ( x , y )}{\int g _{ϵ} ( x , z ) ρ ( z ) d z \int g _{ϵ} ( y , z ) ρ ( z ) d z},

T_{ϵ}^{(N)} f (x) := \frac{1}{n _{ϵ}^{(N)} ( x )} j = 1 \sum N k_{ϵ}^{(N)} (x, X^{j}) f (X^{j}),

T_{ϵ}^{(N)} f (x) := \frac{1}{n _{ϵ}^{(N)} ( x )} j = 1 \sum N k_{ϵ}^{(N)} (x, X^{j}) f (X^{j}),

k_{ϵ}^{(N)} (x, y) := \frac{g _{ϵ} ( x , y )}{\sum _{j = 1}^{N} g _{ϵ} ( x , X ^{j} ) \sum _{j = 1}^{N} g _{ϵ} ( y , X ^{j} )} .

k_{ϵ}^{(N)} (x, y) := \frac{g _{ϵ} ( x , y )}{\sum _{j = 1}^{N} g _{ϵ} ( x , X ^{j} ) \sum _{j = 1}^{N} g _{ϵ} ( y , X ^{j} )} .

T_{ij} = \frac{1}{n _{ϵ}^{(N)} ( X ^{i} )} K_{ϵ}^{(N)} (X^{i}, X^{j}) .

T_{ij} = \frac{1}{n _{ϵ}^{(N)} ( X ^{i} )} K_{ϵ}^{(N)} (X^{i}, X^{j}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim

\headersGain function approximation in the FPFA.Taghvaei, P. G. Mehta, and S. P. Meyn

Diffusion map-based algorithm for Gain function approximation in the Feedback Particle Filter††thanks: Financial support from the NSF CMMI grants 1334987 and 1462773 is gratefully acknowledged.

Amirhossein Taghvaei Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL (, ). [email protected]

[email protected]

Prashant G. Mehta22footnotemark: 2

Sean P. Meyn Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL () [email protected]

Abstract

Feedback particle filter (FPF) is a numerical algorithm to approximate the solution of the nonlinear filtering problem in continuous-time settings. In any numerical implementation of the FPF algorithm, the main challenge is to numerically approximate the so-called gain function. A numerical algorithm for gain function approximation is the subject of this paper. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian $\Delta_{\rho}$ . The numerical problem is to approximate this solution using only finitely many particles sampled from the probability distribution $\rho$ . A diffusion map-based algorithm was proposed by the authors in a prior work [60, 62] to solve this problem. The algorithm is named as such because it involves, as an intermediate step, a diffusion map approximation of the exact semigroup $e^{\Delta_{\rho}}$ . The original contribution of this paper is to carry out a rigorous error analysis of the diffusion map-based algorithm. The error is shown to include two components: bias and variance. The bias results from the diffusion map approximation of the exact semigroup. The variance arises because of finite sample size. Scalings and upper bounds are derived for bias and variance. These bounds are then illustrated with numerical experiments that serve to emphasize the effects of problem dimension and sample size. The proposed algorithm is applied to two filtering examples and comparisons provided with the sequential importance resampling (SIR) particle filter.

keywords:

Stochastic Processes, Nonlinear filtering, Poisson equation

{AMS}

93E11, 65N75, 65N15

1 Introduction

This paper is concerned with a numerical solution of a certain linear partial differential equation (PDE) that arises in nonlinear filtering problem in continuous-time settings.

Nonlinear filtering problem: The standard model of the nonlinear filtering problem is given by the following stochastic differential equations (SDE) [67]:

[TABLE]

where $X_{t}\in\mathbb{R}^{d}$ is the (hidden) state at time $t$ , $Z_{t}\in\mathbb{R}$ is the observation, and $B_{t}$ , $W_{t}$ are two mutually independent standard Wiener processes taking values in $\mathbb{R}^{d}$ and $\mathbb{R}$ , respectively. The mappings $a(\cdot):\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ and $h(\cdot):\mathbb{R}^{d}\rightarrow\mathbb{R}$ are known $C^{1}$ functions, and $p_{0}$ is the density of the prior probability distribution.

The objective of the filtering problem is to compute the posterior distribution of the state $X_{t}$ given the time history of observations (filtration) ${\cal Z}_{t}:=\sigma(Z_{s}:0\leq s\leq t)$ .

The problem is linear Gaussian if $a(\cdot)$ , and $h(\cdot)$ are linear functions and $p_{0}$ is a Gaussian density. We use $A$ and $H$ to denote the matrices that define these linear functions, i.e, $a(x)=Ax$ and $h(x)=Hx$ . The background on the linear Gaussian problem, along with its solution given by the Kalman-Bucy filter [35], appears in [40].

Feedback particle filter (FPF) is a numerical algorithm to approximate the posterior distribution in nonlinear non-Gaussian settings [69, 68]. The FPF algorithm is an alternative to the sequential importance resampling (SIR) particle filters [30, 25, 3, 22]. The distinguishing feature of the FPF is that the importance sampling step is replaced with feedback control. Steps such as resampling, reproduction, death or birth of particles are altogether avoided. The particles in FPF have uniform importance weights by construction. Therefore, the FPF does not suffer from the particle degeneracy issue that is commonly observed in implementations of the SIR particle filters [25]. In independent numerical evaluations and comparisons, it has been observed that FPF exhibits smaller simulation variance and better scaling properties with the problem dimension [9, 55, 57].

The construction of FPF is based on the following two steps: {romannum}

Construct a stochastic process, denoted by $\bar{X}_{t}\in\mathbb{R}^{d}$ , whose conditional distribution (given ${\cal Z}_{t}$ ) is equal to the conditional distribution of $X_{t}$ ;

Simulate $N$ stochastic processes, denoted by $\{X^{i}_{t}\}_{i=1}^{N}$ , to empirically approximate the distribution of $\bar{X}_{t}$ .

[TABLE]

The process $\bar{X}_{t}$ is referred to as mean-field process and the $N$ processes $\{X^{i}_{t}\}_{i=1}^{N}$ are referred to as particles. The construction ensures that the filter is exact in the mean-field ( $N=\infty$ ) limit.

The details of the two steps are as follows:

Mean-field process: In the FPF, the mean-field process $\bar{X}_{t}$ evolves according to the SDE given by

[TABLE]

where $\bar{B}_{t}$ is a standard Wiener processes independent of $\bar{X}_{0}$ and $\hat{h}_{t}:={\sf E}[h(\bar{X}_{t})|\mathcal{Z}_{t}]$ . The $\circ$ indicates that the sde is expressed in its Stratonovich form. The gain function is ${\sf K}_{t}(x):=\nabla\phi_{t}(x)$ where $\phi_{t}$ is the solution of the Poisson equation:

[TABLE]

where $\nabla$ and $\nabla\cdot$ denote the gradient and the divergence operators, respectively, and $p_{t}$ denotes the conditional density of $\bar{X}_{t}$ given $\mathcal{Z}_{t}$ . The operator on the left-hand side of the Poisson equation (3) is referred to as the probability-weighted Laplacian. It is denoted as $\Delta_{\rho}$ where the probability density $\rho$ is the conditional density $p_{t}$ .

Particles: The particles $\{X^{i}_{t}\}_{i=1}^{N}$ evolve according to:

[TABLE]

for $i=1,\ldots N$ , where $\{B^{i}_{t}\}_{i=1}^{N}$ are mutually independent Wiener processes, $\hat{h}^{(N)}_{t}:=\frac{1}{N}\sum_{i=1}^{N}h(X^{i}_{t})$ , and ${\sf K}^{(N)}_{t}$ is the output of an algorithm that approximates the solution to the Poisson equation Eq. 3

[TABLE]

The notation is suggestive of the fact that algorithm is adapted to the ensemble $\{X^{i}_{t}\}_{i=1}^{N}$ and the function $h$ ; the density $p_{t}(x)$ is not known in an explicit manner.

Development and error analysis of one such gain function approximation algorithm is the subject of the present paper. Before describing the general case, it is useful to review the filter for the linear Gaussian case where the solution of the Poisson equation is explicitly known.

FPF for Linear Gaussian setting: Suppose $h(x)=Hx$ and $p_{t}$ is a Gaussian density with mean $\bar{m}_{t}$ and variance $\bar{\Sigma}_{t}$ . Then the solution of the Poisson equation is known in an explicit form [68, Sec. D]. The resulting gain function is constant and equal to the Kalman gain:

[TABLE]

Therefore, the mean-field process Eq. 2 for the linear Gaussian problem is given by:

[TABLE]

Given the explicit form of the gain function Eq. 6, the empirical approximation of the gain is simply ${\sf K}_{t}^{(N)}=\Sigma_{t}^{(N)}H^{\top}$ where $\Sigma_{t}^{(N)}$ is the empirical covariance of the particles. Therefore, the evolution of the particles is:

[TABLE]

for $i=1.\ldots,N$ , where $m_{t}^{(N)}$ is the empirical mean of the particles. The empirical quantities are computed as:

[TABLE]

The linear Gaussian FPF Eq. 7 is identical to the square-root form of the ensemble Kalman filter (EnKF) [8, Eq. 3.3].

One extension of the Kalman gain is the so called constant gain approximation formula whereby the gain ${\sf K}_{t}$ is approximated by its expected value (which represents the best least-squared approximation of the gain by a constant). Remarkably, the expected value admits a closed-form expression which is then readily approximated empirically using the particles (see Remark 2.3 for derivation):

[TABLE]

The constant gain approximation formula has been used in nonlinear extensions of the EnKF algorithm [21]. The connection to the Poisson equation provides a justification for this formula. The formula is attractive because it provides a consistent (as the number of particles $N\rightarrow\infty$ ) approximation of the Kalman gain in the linear Gaussian setting.

Design and analysis of the gain function approximation algorithm (5) in the general case is a challenging problem because of two reasons: (i) Apart from the Gaussian case, there are no known closed-form solutions of Eq. 9; (ii) The density $p_{t}(x)$ is not explicitly known. At each time-step, one only has samples $\{X^{i}_{t}\}_{i=1}^{N}$ . For the purpose of this paper, these samples are assumed to be i.i.d drawn from $p_{t}$ . The assumption is justified because in the limit of large $N$ , the particles are approximately i.i.d (by the propagation of chaos); cf., [58].

1.1 Contributions of this paper

The paper presents a diffusion map-based algorithm for the gain function approximation problem. The algorithm is named as such because it involves, as an intermediate step, a diffusion map approximation of the exact semigroup $e^{\Delta_{\rho}}$ . The following is a summary of specific original contributions made in this paper:

(i)

Error estimates that relate the exact semigroup to its diffusion map approximation. The error estimates are derived by employing a Feynman-Kac representation of the semigroup (Proposition 3.3); 2. (ii)

A uniform spectral gap for the diffusion map based on the use of the Foster-Lyapunov function method from the theory of stochastic stability of Markov processes (Proposition 4.2); and 3. (iii)

Error estimates for the empirical approximation of the diffusion map (Proposition 3.4).

The results from (i) and (ii) are used to derive estimates for the bias and to show that the bias converges to zero in a certain limit (Theorem 4.3). Results from (iii) are used to prove the convergence of the variance error term to zero in the infinite- $N$ limit (Theorem 4.4). The paper contains numerical experiments that serve to illustrate the effects of problem dimension and sample size. The algorithm is applied to two filtering examples and comparisons provided with the sequential importance resampling (SIR) particle filter.

1.2 Relationship to prior work

The gain function algorithm first appeared in the conference version of this paper [60]. Its preliminary error analysis was reported in the conference paper [62]. The important distinction is that the results in these conference papers were preliminary in nature. The proofs were either altogether omitted or based on formal arguments. The main techniques employed in this paper, namely, (i) the use of Feyman-Kac representation to quantify the error due to the diffusion map approximation of the exact semigroup, and (ii) the use of stochastic stability theory to derive uniform spectral gap for the diffusion map, are original and do not appear in the conference papers. These techniques are important to be able to obtain precise estimates as enumerated above in the list of contributions. Since the main technical tools are new, all the proofs, based on these techniques, are new and original contributions of this paper. The diffusion map was introduced in [15], in the context of spectral clustering [6, 65]. Results on its convergence analysis appears in [32, 53, 15, 28, 31, 66, 7]. The use of diffusion map approximations for filtering problems is originally due to the authors.

1.3 Literature survey

Apart from its direct relevance to numerical approximation of the FPF, there are three topics of current research interest that are relevant to the subject of this paper: (i) ensemble Kalman filter; (ii) particle flow algorithms for nonlinear filtering; and (iii) optimal transport. Specifically, the algorithms for gain function approximation described in this paper are also directly applicable to these other topics. These relationships are briefly discussed next:

Ensemble Kalman filter: The EnKF algorithm was first developed in the discrete-time setting [27]. In the continuous-time setting, two formulations of the EnKF have been developed: stochastic EnKF, and the more recent deterministic EnKF [8, 51]. As has already been noted, the deterministic EnKF is in fact identical to the FPF algorithm Eq. 7 in the linear Gaussian setting [8, 59].

The EnKF algorithm provides a consistent approximation in the linear Gaussian setting. Compared to the Kalman filter, the main utility of EnKF is that it does not require propagation of the covariance matrix. This reduces the computational complexity from $O(d^{2})$ for the Kalman filter to $O(Nd)$ . This is clearly advantageous in high dimensional problems when $N<<d$ . This property has made EnKF popular in applications such as weather prediction in high dimensional settings [36, 47]. The disadvantage of the EnKF algorithm, of course, is that it does not provide a consistent approximation for nonlinear problems.

FPF represents a the generalization of the EnKF to the nonlinear non-Gaussian setting [59]: With the constant gain approximation, the algorithms are identical. Given this parallel, the problem of improving the EnKF algorithm in more general nonlinear non-Gaussian settings is directly related to the problem of better approximating the gain function in the FPF. In an application software based on EnKF, it is a relatively simple matter to replace the constant gain formula for the gain by more sophisticated approximations described in this paper. Certain empirical evaluations on the performance of FPF in high-dimensional settings are reported in [57, 55, 54, 9].

Error analysis and stability of EnKF is an active area of research; see [43, 41, 24] for linear models and [21, 23, 37] for nonlinear models. The error analysis for the gain function approximation reported in this paper is a step towards error analysis of the FPF along these lines.

Particle flow algorithms: The following first-order (and hence an under determined) form of the Poisson equation appears in most types of particle flow algorithms:

[TABLE]

where the righthand-side (rhs) is given and ${\sf K}(x)$ defines a vector field that must be obtained to implement the particle flow. The PDE appears in the first interacting particle representation of the continuous-time filtering in [17, 18] and the discrete-time filtering in [19]. Stochastic extensions of these have also recently appeared in [20] where approximate solutions are also described based on Gaussian assumption on the density. The algorithm described here represent an approximation of a particular gradient form solution of the first-order PDE.

Optimal transport: The mean-field SDE Eq. 2 represents a transport that maps the prior distribution at time [math] to the posterior distribution at an (arbitrary) future time $t>0$ . Synthesis of optimal transport maps for implementing the Bayes formula appears in [50, 14, 26, 61, 33, 13]. The relationship with the Poisson equation is through the ensemble transform filter which relies on a linear programming construction to approximate the optimal transport map [14]. As discussed in [59, Sec. 5.5], the solution of the Poisson equation yields an infinitesimal optimal transport map from the “prior” $p_{t}(x)$ to “posterior” $\frac{1}{\gamma}p_{t}(x)e^{-th(x)}$ . Another closely related approach is transportation through Gibbs flow [33].

Directly related to the FPF, the Galerkin method for the numerical solution of the Poisson equation appeared in original papers [68, 69]. The Galerkin algorithm represents the ‘direct” PDE approach to construct a numerical approximation. The constant gain approximation is a particular example of a Galerkin solution. In general, the main problem with the Galerkin approximation is that it requires a selection of basis functions. This becomes intractable in high dimensions. To mitigate this issue, a proper orthogonal decomposition (POD)-based procedure to select basis functions is introduced in [11]. Other existing approaches are a continuation scheme for approximation [44], a probabilistic approach based on dynamic programming [48], and a procedure based on expressing the gain function in a reproducible Hilbert kernel space [49]. A comparison of different gain function approximation methods appears in [10].

1.4 Paper outline

The outline of the remainder of this paper is as follows: The mathematical problem of the gain function approximation together with a summary of known results on this topic appears in Section 2. The diffusion-map based algorithm is described in a self-contained fashion in Section 3. The main theoretical results of this paper including the bias and variance estimates appear in Section 4. Some numerical experiments for the same appear in Section 5. All the proofs appear in the Appendix.

1.5 Notation

For vectors $x,y\in\mathbb{R}^{d}$ , the dot product is denoted as $x\cdot y$ and $|x|:=\sqrt{x\cdot x}$ . The space of positive definite $d\times d$ matrices is denoted as $S^{d}_{++}$ . The Borel $\sigma$ -algebra on $\mathbb{R}^{d}$ is denoted by $\mathcal{B}(\mathbb{R}^{d})$ . The indicator function, for a measurable set $A\in\mathbb{B}(\mathbb{R}^{d})$ , is denoted as $\mathds{1}_{A}(\cdot)$ . The space of measurable functions $f:\mathbb{R}^{d}\to\mathbb{R}$ such that $\|f\|_{L^{p}(\rho)}:=\left(\int|f(x)|^{p}\rho(x)\,\mathrm{d}x\right)^{1/p}<\infty$ is denoted as $L^{p}(\rho)$ . The inner product on $L^{2}(\rho)$ is defined by $\big{<}f,g\big{>}:=\int f(x)g(x)\rho(x)\,\mathrm{d}x$ . The space $H^{1}(\rho)$ is the space functions $f\in L^{2}(\rho)$ whose derivative (defined in the weak sense) is in $L^{2}(\rho)$ . For a (weakly) differentiable function $f$ , $\|\nabla f\|_{L^{p}(\rho)}:=\left(\int|\nabla f(x)|^{p}\rho(x)\,\mathrm{d}x\right)^{1/p}$ . For an integrable function $f$ , $\hat{f}_{\rho}:=\int f(x)\rho(x)\,\mathrm{d}x$ denotes the mean. $L^{2}_{0}(\rho):=\{f\in L^{2}(\rho)\mid\hat{f}_{\rho}=0\}$ and $H_{0}^{1}(\rho):=\{f\in H^{1}(\rho)\mid\hat{f}_{\rho}=0\}$ denote the co-dimension $1$ subspace of functions whose mean is zero. $L^{\infty}(\Omega)$ denotes the space of bounded functions on $\Omega\subset\mathbb{R}^{d}$ with the sup-norm denoted as $\|\cdot\|_{L^{\infty}(\Omega)}$ . The space of continuous and bounded functions on $\Omega\subset\mathbb{R}^{d}$ and the space of continuous and smooth functions on $\Omega$ is denoted as $C_{b}(\Omega)$ and $C^{\infty}_{b}(\Omega)$ respectively. For a linear operator $T$ , on a Banach space $\mathcal{X}$ with norm $\|\cdot\|_{\mathcal{X}}$ , the operator norm is denoted as $\|T\|_{\mathcal{X}}$ . The Gaussian distribution with mean $m$ and covariance $\Sigma$ is denoted as $\mathcal{N}(m,\Sigma)$ . The variance of the random variable $X$ is denoted as $\text{Var}(X)$ .

2 Gain function approximation

2.1 Problem formulation

The mathematical problem is to numerically approximate the solution of the Poisson’s equation Eq. 3 introduced in Section 1 and also repeated below:

[TABLE]

where the weighted Laplacian $\Delta_{\rho}\phi(x):=\frac{1}{\rho(x)}\nabla\cdot(\rho(x)\nabla\phi(x))$ ; $\rho(x)$ is an everywhere positive probability density on $\mathbb{R}^{d}$ ; $h(x)$ is a real-valued function defined on $\mathbb{R}^{d}$ and $\hat{h}_{\rho}:=\int h(x)\rho(x)\,\mathrm{d}x$ . The function $\phi$ is referred to as the solution. Its gradient is referred to as the gain function and denoted as ${\sf K}(x):=\nabla\phi(x)$ . The PDE Eq. 9 is referred to as the Poisson’s equation.

The numerical approximation problem is as follows:

Problem statement: Given $N$ samples $\{X^{1},\ldots,X^{i},\ldots,X^{N}\}$ , drawn i.i.d. from $\rho$ , approximate the gains $\{{\sf K}^{1},\ldots,{\sf K}^{i},\ldots,{\sf K}^{N}\}$ , where ${\sf K}^{i}:={\sf K}(X^{i})=\nabla\phi(X^{i})$ . The density $\rho$ is not known in an explicit form.

2.2 Mathematical preliminaries

Assumptions: The following assumptions are made throughout the paper:

{romannum}

Assumption A1: The probability density $\rho$ is of the form $\rho(x)=e^{-V(x)}$ where the function $V(x)=\frac{1}{2}(x-m)^{\top}\Sigma^{-1}(x-m)+w(x)$ for some $m\in\mathbb{R}^{d}$ , $\Sigma\in S^{d}_{++}$ , and $w\in C^{\infty}_{b}(\mathbb{R}^{d})$ ;

Assumption A2: The function $h:\mathbb{R}^{d}\to\mathbb{R}$ is (weakly) differentiable with $\|h\|_{L^{4}(\rho)},\|\nabla h\|_{L^{4}(\rho)}<\infty$ .

Remark 2.1.

*Assumption A1 is used to prove the approximation result (Proposition 3.3) and to derive the spectral gap (Proposition 4.2) for the diffusion map approximation first introduced in Section 3. In prior literature, a similar assumption has been previously used for studying functional inequalities to obtain Poincaré inequality with a constant that does not depend on the dimension [64, Ch. 8]. Assumption A1 is restrictive, e.g., a mixture of Gaussians does not satisfy the assumption. Based on numerical experiments, it is conjectured that Assumption A1 can be relaxed. A weaker assumption would be to assume $\rho=\rho_{g}*w$ , the convolution of a Gaussian density $\rho_{g}$ with a density $w$ that has a compact support. Proving the theoretical results under this weaker assumption is the subject of future work. *

2.2.1 Spectral representation

Under Assumption (A1), the weighted Laplacian $\Delta_{\rho}$ has a discrete spectrum with an ordered sequence of eigenvalues $0=\lambda_{0}<\lambda_{1}\leq\lambda_{2}\leq\ldots$ and associated eigenfunctions $\{e_{n}\}$ that form a complete orthonormal basis of $L^{2}(\rho)$ [5, Cor. 4.10.9]. The trivial eigenfunction $e_{0}(x)=1$ , and for $f\in L^{2}_{0}(\rho)$ , the spectral representation yields:

[TABLE]

The positivity of the smallest non-trivial eigenvalue ( $\lambda_{1}>0$ ) is referred to as the Poincaré inequality (or the spectral gap condition) [4]. The inequality is equivalently expressed as

[TABLE]

where $\hat{f}_{\rho}=\int f\rho\,\mathrm{d}x$ .

The Poincaré inequality is important to show that the Poisson equation is well-posed and a unique solution exists. The solution to the Poisson equation is defined using the weak formulation.

2.2.2 Weak formulation

A function $\phi\in H_{0}^{1}(\rho)$ is said to be a weak solution of Eq. 9 if

[TABLE]

Equation Eq. 11 is referred to as the weak-form of the Poisson’s equation. The weak-form is expressed succinctly as $\langle\nabla\phi,\nabla\psi\rangle=\langle h-\hat{h}_{\rho},\psi\rangle$ where $\langle\cdot,\cdot\rangle$ is the inner-product in $L^{2}(\rho)$ . The existence and uniqueness of the solution to the weak-form of the Poisson equation is stated in the following Proposition.

Proposition 2.2.

[42*, Thm. 2.2.]**

Suppose $\rho$ satisfies Assumption (A1) and $h$ satisfies Assumption (A2). Then there exists a unique function $\phi\in H_{0}^{1}(\rho)$ that satisfies the weak-form of the Poisson equation Eq. 11. The solution satisfies the bound:*

[TABLE]

Remark 2.3 (Constant gain approximation).

The weak formulation Eq. 11 has led to the Galerkin algorithm presented in the original FPF papers [68]. A special case of the Galerkin solution is the constant gain approximation formula Eq. 8. The formula is obtained upon choosing the test functions in Eq. 11 to be the coordinate functions: $\psi_{m}(x)=x_{m}$ for $m=1,2,\ldots,d$ . Then,

[TABLE]

*which yields the formula Eq. 8.

The diffusion map-based algorithm presented in this paper is based on the semigroup formulation of the Poisson equation.

2.2.3 Semigroup

Let $\{P_{t}\}_{t\geq 0}$ be the semigroup associated with the weighted Laplacian $\Delta_{\rho}$ . The semigroup allows for a probabilistic interpretation which is described next. Consider the following reversible Markov process $\{S_{t}\}_{t\geq 0}$ evolving in $\mathbb{R}^{d}$ :

[TABLE]

where $V(x):=-\log(\rho(x))$ and $\{B_{t}\}_{t\geq 0}$ is a standard Weiner process in $\mathbb{R}^{d}$ . Then

[TABLE]

It is straightforward to verify that $P_{t}:L^{2}(\rho)\to L^{2}(\rho)$ is symmetric, i.e., $\langle P_{t}f,g\rangle=\langle f,P_{t}g\rangle$ for all $f,g\in L^{2}(\rho)$ and $\rho(x)=e^{-V(x)}$ is its invariant density. The semigroup also admits a kernel representation:

[TABLE]

where $\bar{k}_{t}(x,y):=\sum_{m=0}^{\infty}e^{-t\lambda_{m}}e_{m}(x)e_{m}(y)$ .

The spectral gap implies that $\|P_{t}\|_{L^{2}_{0}(\rho)}=e^{-t\lambda_{1}}<1$ . Hence, $P_{t}$ is a strict contraction on $L^{2}_{0}(\rho)$ . For the special case of Gaussian density, the eigenfunctions are given by the Hermite polynomials. This leads to an explicit formula for the kernel $\bar{k}_{t}(x,y)$ in the Gaussian case, as described in Appendix A.

Consider the heat equation

[TABLE]

Its solution is given in terms of the semigroup as follows:

[TABLE]

Letting $f(x)=\phi(x)$ where $\phi$ solves the Poisson equation Eq. 9 yields the following fixed-point equation for $t=\epsilon$ :

[TABLE]

Equation Eq. 12 is referred to as the semigroup form of the Poisson equation Eq. 9.

The following Proposition shows that the weak form Eq. 11 and the semigroup form Eq. 12 are equivalent. The proof appears in the Appendix B.

Proposition 2.4.

*Suppose $\rho$ satisfies Assumption (A1) and $h$ satisfies Assumption (A2). Then the unique solution $\phi\in H_{0}^{1}(\rho)$ to the weak form Eq. 11 is also the unique solution to the fixed-point equation Eq. 12. *

The semigroup formulation has led to the diffusion-map based algorithm which is the main focus of the remainder of this paper.

3 Diffusion map-based Algorithm

The diffusion map-based algorithm is based on a numerical approximation of the fixed-point equation Eq. 12. The main technique is to approximate the semigroup $P_{\epsilon}$ in the following three steps:

Diffusion map approximation: A family of Markov operators $\{{T}_{\epsilon}\}_{\epsilon>0}$ are defined as follows:

[TABLE]

where $n_{\epsilon}(x):=\int{k}_{\epsilon}(x,y)\rho(y)\,\mathrm{d}y$ is the normalization factor,

[TABLE]

and $g_{\epsilon}(x,y):=e^{-\frac{|x-y|^{2}}{4\epsilon}}$ is the Gaussian kernel in $\mathbb{R}$ . For small positive values of $\epsilon$ , the Markov operator $T_{\epsilon}$ is referred to as the diffusion map approximation of the exact semigroup $P_{\epsilon}$ [15, 32]. The precise statement of this approximation is contained in Proposition 3.3. For the special case of Gaussian density, an explicit formula for the diffusion map appears in the Appendix A. 2. 2.

Empirical approximation: The operator ${T}_{\epsilon}$ is approximated empirically by $\{{{T}^{(N)}_{\epsilon}}\}_{\epsilon>0,N\in\mathbb{N}}$ defined as follows:

[TABLE]

where $n_{\epsilon}^{(N)}(x):=\sum_{i=1}^{N}{k}_{\epsilon}(x,X^{i})$ is the normalization factor and

[TABLE]

Recall that $X^{i}\overset{\text{i.i.d}}{\sim}\rho$ for $i=1,\ldots,N$ . So, by law of large numbers (LLN), ${{T}^{(N)}_{\epsilon}}f$ represents an empirical approximation of the diffusion map ${T}_{\epsilon}$ . The precise statement of the empirical approximation is contained in Proposition 3.4. 3. 3.

Approximation as Markov matrix: An $N\times N$ Markov matrix ${\sf T}$ is defined with $(i,j)$ -th element given by

[TABLE]

Finite-dimensional fixed-point equation: Using the three steps above, the original infinite-dimensional fixed-point equation Eq. 12 is approximated as a finite dimensional fixed-point equation

[TABLE]

where ${{\sf h}}:=(h(X^{1}),\ldots,h(X^{N}))$ is a $N\times 1$ column vector, and $\pi(h)=\sum_{i=1}^{N}\pi_{i}h(X^{i})$ where the probability vector $\pi_{i}=\frac{{n}_{\epsilon}^{(N)}(X^{i})}{\sum_{j=1}^{N}{n}_{\epsilon}^{(N)}(X^{j})}$ is the unique stationary distribution of the Markov matrix ${\sf T}$ . The solution $\sf\Phi$ is used to define an approximation to the solution of the Poisson equation as follows:

[TABLE]

The approximation for the gain function is as follows:

[TABLE]

Upon evaluating the gradient in closed-form, the following linear formula results for the gain function evaluated at particle locations:

[TABLE]

where

[TABLE]

The details of the calculation leading to the linear formula appear in the Appendix C.

Remark 3.1 (Numerical procedure).

The fixed-point problem (16) is solved in an iterative manner. The vector ${\sf\Phi}$ is initialized to ${\sf\Phi}_{0}=(0,\ldots,0)\in\mathbb{R}^{N}$ and updated according to

[TABLE]

for $n=1,\ldots,L$ for a finite number of $L$ iterations. The procedure is guaranteed to converge, with a geometric convergence rate, because ${\sf T}$ is a strict contraction on $L^{2}_{0}(\pi)$ (Proposition 4.1-(ii)). The overall algorithm is presented in Algorithm 1.

*The proposed iterative procedure (21) is preferred to other numerical procedures because (i) it is straightforward to implement and does not require matrix inversion; (ii) it may be numerically more efficient than solving a system of $N$ linear equations; and (iii) it allows one to use the solution obtained from the previous filter step, as initialization for the iterative procedure (21), resulting in quick convergence – typically in a few iterations. The reason for quick convergence is that the change in the solution of the fixed point equation (16) is (typically) small from one filtering step to the next. This is because the change in particle locations is (typically) small for a small choice of time increment. *

Remark 3.2.

The computational complexity of the diffusion-map based algorithm is $O(N^{2})$ because of the need to assemble the $N\times N$ matrix ${\sf T}$ . The computational complexity may be reduced using the sparsity structure of the matrix ${\sf T}$ and sub-sampling techniques. Compared to the Galerkin algorithm with computational complexity of $O(Nd^{3})$ , the diffusion-map algorithm is advantageous in high-dimensional problems where $d>>N$ .

3.1 Approximation results

The notation $G_{\epsilon}(f)(x):=\int{g}_{\epsilon}(x,y)f(y)\,\mathrm{d}y$ is used to denote the heat semigroup with a Gaussian kernel ${g}_{\epsilon}(x,y)$ , and

[TABLE]

The proof of the following proposition appears in Appendix E.

Proposition 3.3.

Consider the family of Markov operators $\{{T}_{\epsilon}\}_{\epsilon>0}$ defined according to Eq. 13. Let $n\in\mathbb{N}$ , $t\in(0,t_{0})$ with $t_{0}<\infty$ , and $\epsilon=\frac{t}{n}$ . Then, {romannum}

The semigroup $P_{t}$ and the operator ${T}_{\epsilon}^{n}$ admit the following representations:

[TABLE]

for all $x\in\mathbb{R}^{d}$ where $B^{x}_{t}$ is the Brownian motion with initial condition $B_{0}^{x}=x$ .

In the asymptotic limit as $\epsilon\to 0$ :

[TABLE]

where $|r^{(1)}_{\epsilon}(x)|,|r^{(2)}_{\epsilon}(x)|=O(|x|^{2})$ and $|\nabla r^{(1)}_{\epsilon}(x)|=O(|x|)$ as $|x|\to\infty$ .

For all functions $f$ such that $f,\nabla f\in L^{4}(\rho)$ :

[TABLE]

*where the constant $C$ only depends on $t_{0}$ and $\rho$ .

The proof of the following proposition appears in Appendix H.

Proposition 3.4.

Consider the diffusion map kernel $\{{T}_{\epsilon}\}_{\epsilon>0}$ , and its empirical approximation $\{{{T}^{(N)}_{\epsilon}}\}_{\epsilon>0,N\in\mathbb{N}}$ . Then for any bounded continuous function $f\in C_{b}(\mathbb{R}^{d})$ : {romannum}

(Almost sure convergence) For all $x\in\mathbb{R}^{d}$

[TABLE]

(Convergence rate) For any $\delta\in(0,1)$ , in the asymptotic limit as $N\to\infty$ ,

[TABLE]

*with probability higher than $1-\delta$ .

Remark 3.5 (Related work).

*The key idea in the proof of the Proposition 3.3 is the Feynman-Kac representation of the semigroup Eq. 23. To the best of our knowledge, this representation has not been used before in the analysis of the diffusion map approximation. Most of the existing results concerning the convergence of the diffusion map are based on a Taylor series expansion that would lead to a convergence of the form $\lim_{\epsilon\to 0}\frac{f(x)-{T}_{\epsilon}f(x)}{\epsilon}=\Delta_{\rho}f(x)$ for each $x\in\mathbb{R}^{d}$ [32, 15, 28]. Convergence results of the form $\lim_{n\to\infty}\|T_{\frac{t}{n}}^{n}f-P_{t}f\|_{L^{2}(\rho)}=0$ appear in [15, 63], based on functional analytic arguments. The Taylor series type arguments typically require the distribution to be supported on a compact manifold which not assumed here. *

4 Convergence and error analysis

The analysis of the diffusion-map algorithm involves the consideration of the following four fixed point problems:

[TABLE]

where $\hat{h}_{\rho_{\epsilon}}:=\int h(x)\rho_{\epsilon}(x)\,\mathrm{d}x$ and $\rho_{\epsilon}(x):=\frac{n_{\epsilon}(x)\rho(x)}{\int n_{\epsilon}(x)\rho(x)\,\mathrm{d}x}$ is the density of the invariant probability distribution associated with the Markov operator ${T}_{\epsilon}$ .

In practice, the finite-dimensional problem Eq. 31 is solved. The existence and uniqueness of the solution for this problem is the subject of the following proposition whose proof appears in Appendix D.

Proposition 4.1.

*Consider the finite-dimensional fixed point equation Eq. 31.

Then almost surely {romannum}*

${\sf T}$ * is a reversible Markov matrix with a unique stationary distribution*

[TABLE]

for $i=1,\ldots,N$ .

${\sf T}$ * is a strict contraction on $L^{2}_{0}(\pi)=\{v\in\mathbb{R}^{N};\sum\pi_{i}v_{i}=0\}$ . Hence the fixed point equation Eq. 31 has a unique solution ${\sf\Phi}\in L^{2}_{0}(\pi)$ .*

The (empirical approx.) fixed point equation Eq. 30 has a unique solution given by (see Eq. 17)

[TABLE]

Based on the results in Proposition 2.4 and Proposition 4.1, the exact solution $\phi$ and the numerical solution ${\phi}^{(N)}_{\epsilon}$ are both well-defined. The remaining task is to show the convergence of ${\phi}^{(N)}_{\epsilon}\to\phi$ as $N\to\infty$ and $\epsilon\to 0$ . We break the convergence analysis into two parts, bias and variance:

[TABLE]

Before describing the general result, it is useful to first introduce an example that helps illustrate the bias-variance trade-off in this problem.

4.1 Example - the scalar case

In the scalar case (where $d=1$ ), the Poisson equation is:

[TABLE]

Integrating twice yields the solution explicitly

[TABLE]

For the choice of $\rho$ as the sum of two Gaussians ${\cal N}(-1,\sigma^{2})$ and ${\cal N}(+1,\sigma^{2})$ with $\sigma^{2}=0.2$ and $h(x)=x$ , the solution obtained using Eq. 33 is depicted in Fig. 1 (a). Also depicted is the approximate solution obtained using the diffusion-map algorithm with $N=200$ , for different values of $\epsilon$ . The constant gain approximation is evaluated according to the explicit integral formula (8). As $\epsilon\to\infty$ the approximate gain converges to the constant gain approximation. As $\epsilon$ becomes smaller, the approximation becomes more accurate. However, for very small values of $\epsilon$ the approximation is poor due to the variance error.

The bias-variance trade-off while varying the the parameter $\epsilon$ is depicted in Fig. 1 (b). The $L^{2}$ error is computed as a Monte-Carlo average:

[TABLE]

Fig. 1 (b) depicts the error obtained from averaging over $M=1000$ simulations as a function of the parameter $\epsilon$ . It is observed that for a fixed number of particles $N$ , there is an optimal value of $\epsilon$ that minimizes the error.

The vector counterpart of this example appears in Section 5.1.

4.2 Bias

The analysis of bias has two parts:

To show that the (diffusion-map) fixed-point equation Eq. 29 admits a unique solution ${\phi}_{\epsilon}$ for all positive choices of $\epsilon$ ; 2. 2.

To show that ${\phi}_{\epsilon}\rightarrow\phi$ as $\epsilon\downarrow 0$ .

For $n\in\mathbb{N}$ , iterate the fixed-point equation Eq. 29 $n$ times to obtain:

[TABLE]

We let $\epsilon=\frac{t}{n}$ for some $t>0$ and study the solution of this fixed-point equation as $n\to\infty$ . Note that the solution to the iterated fixed-point equation (35) is identical to the solution to the fixed-point equation Eq. 29.

The fixed-point equation Eq. 35 is the (discrete) Poisson equation that appears in the theory of Markov chain simulation [29, 46] and stochastic control [45, Ch. 9]. Theory presented in these references illustrates how bounds on the solution are obtained under a Foster-Lyapunov drift condition. A similar strategy is adopted here.

In the following proposition, an existence-uniqueness result is described for the fixed-point equation Eq. 35. The technical step in the proof involves a Foster-Lyapunov condition known as DV(3) [39]. The proof appears in Appendix F.

Proposition 4.2.

Consider the family of Markov operators $\{{T}_{\epsilon}\}_{\epsilon>0}$ defined in Eq. 13. Let $n\in\mathbb{N}$ , $t\in(0,t_{0})$ , and $\epsilon=\frac{t}{n}$ , with $t_{0}<\infty$ . Then there exists positive constants $a$ , $b$ , $R$ , $\delta$ , a probability measure $\nu$ , and a number $n_{0}\in\mathbb{N}$ such that for all $n>n_{0}$ :

[TABLE]

Consequently, {romannum}

The chain with transition kernel $T_{\epsilon}^{n}$ is geometrically ergodic with invariant density

[TABLE]

$T_{\epsilon}^{n}$ * is reversible with respect to the density $\rho_{\epsilon}$ It admits a spectral gap as a linear operator $T_{\epsilon}^{n}:L^{2}_{0}(\rho_{\epsilon})\to L^{2}_{0}(\rho_{\epsilon})$ that is uniform with respect to $\epsilon$ . The spectral gap is denoted as $\lambda$ .*

There exists a solution to Eq. 35 with the bound

[TABLE]

The proof of the following main result appears in Appendix G.

Theorem 4.3.

Suppose the assumptions (A1)-(A2) hold for the density $\rho$ and the function $h$ , and $\phi$ denotes the exact solution of Eq. 28. Consider the approximation of this problem defined by the (diffusion-map) fixed-point equation Eq. 29. For the approximate problem: {romannum}

Existence-Uniqueness: For each fixed $\epsilon>0$ , there exists a unique solution ${\phi}_{\epsilon}$ .

Convergence: In the asymptotic limit as $\epsilon\to 0$

[TABLE]

4.3 Variance

The analysis of the variance concerns the (empirical) fixed-point equation Eq. 30 whose solution is denoted as ${\phi}^{(N)}_{\epsilon}$ . The parameter $\epsilon$ is assumed to be positive and fixed and $N$ is assumed to be finite but large.

The existence-uniqueness of ${\phi}^{(N)}_{\epsilon}$ has already been shown as part of Prop. 4.1. The convergence has only been shown below only for the case where the density has a compact support.

Assumption A3: The distribution $\rho$ has compact support given by $\Omega\subset\mathbb{R}^{d}$ .

Theorem 4.4.

Suppose the assumptions (A2)-(A3) hold for the density $\rho$ and the function $h$ , and ${\phi}_{\epsilon}$ denotes the solution of the (kernel) fixed-point equation Eq. 29 for a fixed positive parameter $\epsilon$ . Consider the approximation of this problem defined by the (empirical) fixed-point equation Eq. 30. For the approximate problem: {romannum}

Existence-Uniqueness: For each finite $N$ , there exists (almost surely) a unique solution ${\phi}^{(N)}_{\epsilon}$ .

Convergence: The approximate solution ${\phi}^{(N)}_{\epsilon}$ converges to the kernel solution ${\phi}_{\epsilon}$

[TABLE]

The proof of the convergence ${\phi}^{(N)}_{\epsilon}\to{\phi}_{\epsilon}$ is based on classical results in the numerical analysis of integral equations on a grid [1, 2]. It relies on the verification of the following three conditions: {romannum}

The family of operators $\{{{T}^{(N)}_{\epsilon}}\}_{N=1}^{\infty}$ is collectively compact as linear operators on $C_{b}(\Omega)$ .

For any function $f\in C_{b}(\Omega)$ ,

[TABLE]

The inverse $(I-{T}_{\epsilon})^{-1}$ exists and it is a bounded on $C_{0}(\Omega):=\{f\in C_{b}(\Omega);\hat{f}_{\rho}=0\}$ .

Once these three conditions have been verified, the convergence result Eq. 39 follows from a standard result in the approximation theory of the numerical solutions of integral equations [34, Thm. 7.6.6]. The proof appears in Appendix I.

Remark 4.5 (Convergence rate).

The result in Theorem 4.4 establishes asymptotic convergence of the variance error to zero. However, it does not provide an explicit form for the convergence rate. It is possible to obtain an explicit form based upon a convergence rate estimate for the uniform convergence (40). The latter is difficult because the existing result in [28] holds only under rather strong regularity conditions on $f$ and assumes that the distribution $\rho$ is uniform.

Based upon the approximation result Proposition 3.4, suppose a convergence rate holds for (40) with order $O(\frac{1}{N^{1/2}\epsilon^{d/2}})$ . In this case, it is straightforward to derive the following explicit form of the convergence rate for the variance:

[TABLE]

*The validity and tightness of this bound is studied using numerical experiments in Section 5. *

Remark 4.6.

*(Unbounded domain) Analysis of the variance error for the case where the support of $\rho$ is unbounded has proved to be difficult. In the unbounded case, it is more appropriate to consider ${T}_{\epsilon}$ and ${{T}^{(N)}_{\epsilon}}$ as linear operators on $L^{2}(\rho)$ . Following the same approach as used in the proof of Theorem 4.4, one would need to verify the three conditions noted above. However, for the unbounded case, we could not verify the condition (i) that $\{{{T}^{(N)}_{\epsilon}}\}_{N=1}^{\infty}$ is collectively compact on $L^{2}(\rho)$ . An alternative approach is to follow the spectral method as outlined in [38]. In this approach, one examines the convergence of empirical matrix $[k(X^{i},X^{j})]_{i,j=1}^{N}$ where $k(\cdot,\cdot)$ is a given symmetric kernel. However, this approach does not directly apply to the analysis of the empirical operator ${{T}^{(N)}_{\epsilon}}$ . This is because the form of the kernel ${k}_{\epsilon}^{(N)}(\cdot,\cdot)$ , as it is used in the definition of ${{T}^{(N)}_{\epsilon}}$ , is not explicitly given. It too must be empirically approximated as a ratio whose convergence analysis has proved to be rather challenging. *

4.4 Relationship to the constant gain approximation

Although the convergence and error analysis pertains to the $\epsilon\downarrow 0$ limit, an important property of the diffusion-map approximation is that the numerical procedure yields a unique solution for arbitrary values of $\epsilon$ (see Proposition 4.1). In fact, more can be said: one recovers the constant gain approximation formula in the $\epsilon\rightarrow\infty$ limit.

Before stating the result, it is useful to recall the three formulae for the gain: {romannum}

Exact formula: ${\sf K}=\nabla\phi$ is defined using the exact solution $\phi$ ;

Kernel formula: ${\sf K}_{\epsilon}$ is defined using the solution ${\phi}_{\epsilon}$ to the (diffusion-map) approximation fixed-point equation:

[TABLE]

Empirical formula: ${\sf K}_{\epsilon}^{(N)}$ is the empirical version of the kernel formula. It was defined in Eq. 18 using the solution ${\sf\Phi}$ of the finite-dimensional fixed-point problem.

The proof of the following Proposition appears in the Appendix J.

Proposition 4.7.

Consider the fixed-point problems Eq. 29 and Eq. 30 in the limit as $\epsilon\rightarrow\infty$ . {romannum}

The kernel formula of the gain is given by

[TABLE]

For any finite $N$ , the empirical formula of the gain is given by

[TABLE]

This result serves to highlight the connection between the FPF and the EnKF: With the diffusion map approximation of the gain, the FPF approaches EnKF in the limit of large $\epsilon$ . The parameter $\epsilon$ can then be regarded as the tuning parameter to “improve” the gain. Of course, for any finite value of $N$ , this can only be done up to a point – where variance becomes dominant (see Fig. 1).

5 Numerics

5.1 Example - the vector case

A vector generalization of the scalar example in Section 4.1 is obtained by considering the following form of the probability density function in $d$ -dimensions:

[TABLE]

where $\rho_{b}$ is the bimodal distribution $\frac{1}{2}\mathcal{N}(-1,\sigma^{2})+\frac{1}{2}\mathcal{N}(+1,\sigma^{2})$ introduced in Section 4.1, and $\rho_{g}$ is the Gaussian distribution $\mathcal{N}(0,\sigma^{2})$ . Also suppose the function $h(x)=x_{1}$ . The simple example is illustrative of realistic application scenarios where the density has non-Gaussian features along certain (not necessarily apriori known) low-dimensional subspace. The directions orthogonal to this subspace are modelled here as Gaussian noise.

For this problem, the exact gain function is easily obtained as

[TABLE]

where the function ${\sf K}_{\text{exact}}(x_{1})$ is given by the formula Eq. 33 in Section 4.1. The exact solution is used to compute error properties as dimension increases.

The diffusion-map algorithm (Algorithm 1) is simulated to approximate the gain function for this problem. The number of iterations in Algorithm 1 set to $L=10^{3}$ . For each particle $X^{i}=(X^{i}_{1},\ldots,X^{i}_{d})$ , the first coordinate $X^{i}_{1}\overset{\text{i.i.d}}{\sim}\frac{1}{2}\mathcal{N}(-1,\sigma^{2})+\frac{1}{2}\mathcal{N}(+1,\sigma^{2})$ and other the coordinates $X^{i}_{n}\overset{\text{i.i.d}}{\sim}\mathcal{N}(0,\sigma^{2})$ for $n=2,\ldots,d$ . The constant gain approximation is evaluated according to the explicit integral formula (8).

Fig. 2 depicts the m.s.e Eq. 34 computed from running $M=100$ simulations. A summary of these results is as follows:

Fig. 2-(a) depicts the error as a function of the parameters $\epsilon$ and $d$ for a fixed number of particles $N=1000$ . Also depicted is the error with the constant gain approximation. The constant gain error serves here as baseline.

For large values of $\epsilon$ , the bias error is dominant, and as $\epsilon\to\infty$ the error asymptotes to the error for the constant-gain approximation. This is because (see Proposition 4.7) the diffusion map gain approaches the constant gain as $\epsilon\rightarrow\infty$ . For small values of $\epsilon$ , the variance error dominates. According to Remark 4.5, the upper-bound for m.s.e is expected te be of the order $O(\frac{1}{N\epsilon^{d+2}})$ . However, the numerical error in Fig. 2-(a) is observed to be $O(\frac{1}{N\epsilon^{0.16d+0.3}})$ . Therefore, the upper-bound in Remark 4.5 is not tight for this specific problem. 2. 2.

Fig. 2-(b) depicts the bias-variance trade-off as a function of number of particles $N$ for the fixed $d=1$ . It is not a surprise that the error gets better, for all choices of $\epsilon$ , as the number of particles increase. However, the optimal value of $\epsilon$ – at which the error is the smallest – is relatively insensitive to changes in $N$ . 3. 3.

Fig. 2-(c) depicts the error as function of $N$ for different values of $\epsilon$ . The dimension $d=1$ is fixed. The error goes down as $O(\frac{1}{N})$ and asymptotes to the $O(\epsilon)$ bias. The $O(\frac{1}{N})$ is due to the variance error obtained in Proposition 3.4 and $O(\epsilon)$ bias error is consistent with the conclusion of the Theorem 4.3. 4. 4.

Fig. 2-(d) depicts the run time comparison between the diffusion-map algorithm and the constant gain algorithm. The scaling for the diffusion-map algorithm is $O(N^{2})$ which is significantly more expensive than the $O(N)$ scaling of the constant gain approximation.

Remark 5.1 (Selection of $\epsilon$ ).

The numerical results in Fig. 2 suggest that there is an optimal value of $\epsilon$ such that the error is smallest. Given the fact that the constant gain approximation results in the limit as $\epsilon\to\infty$ , an optimal choice of $\epsilon$ may be possible more generally. At the optimal value, one optimally trades-off the errors due to variance and bias. The difficulty, of course, is that the formula for this optimal choice is not known and may not even be possible in general settings. Instead, in the literature involving kernel methods, a popular heuristic is to set $\epsilon=\frac{4\text{(med)}^{2}}{\log(N)}$ where (med) is the median value of all pairwise distances $\{|X^{i}-X^{j}|\}_{i\neq j}$ [12]. The justification is that, with such a choice, the matrix $[{g}_{\epsilon}(X^{i},X^{j})]_{i,j=1}^{N}$ is not close to the identity matrix (which represents the degenerate case).

Remark 5.2.

*It is worthwhile to also examine the limit as $\epsilon\to 0$ while $N$ is fixed at a finite value. In this limit, the Markov matrix ${\sf T}$ converges to the identity matrix. As a result, the solution ${\sf\Phi}$ to the fixed-point problem (31) is unbounded. However, in practice, value of ${\sf\Phi}$ is large but finite, because the equation (31) is solved in an iterative manner with finite number of iterations. With a finite value of ${\sf\Phi}$ and ${\sf T}$ equal to identity, the gain function given by the formula (19) is zero. Consequently, the feedback correction for each particle is zero. *

5.2 Filtering example

Consider the following filtering problem:

[TABLE]

where $X_{t}\in\mathbb{R}$ , $Z_{t}\in\mathbb{R}$ , $\sigma_{W}>0$ , and $\{W_{t}\}$ is standard Brownian motion, independent of $X_{t}$ . The prior distribution $p_{0}$ is Gaussian $\mathcal{N}(0,1)$ and the observation function $h(x)=|x|$ . For the static filtering problem, the posterior distribution is explicitly given by:

[TABLE]

Three filtering algorithms are implemented for this problem: (i) the FPF algorithm with the diffusion-map gain approximation; (ii) the FPF algorithm with the the constant gain approximation (similar to EnKF); (iii) a sequential importance resampling (SIR) particle filter [25]. The simulation parameters are as follows: The measurement noise $\sigma_{w}=0.1$ . The simulation is carried out for $T=500$ time-steps with step-size $\Delta t=0.001$ . Both the algorithms use $N=200$ particles with identical initialization. For the diffusion-map approximation, the kernel bandwidth was set to $\epsilon=0.1$ , and number of iterations in 1 is set to $L=100$ .

The numerical results are depicted in Figure 3. The distribution of the particles along with the exact posterior distribution are depicted in Figure 3-(a). It is observed that the FPF algorithm with the diffusion map approximation provides a more accurate approximation of the posterior distribution. In contrast, the constant-gain approximation fails to reproduce the bimodal nature of the posterior distribution.

A quantitative estimate of the performance is provided in terms of a mean squared error (m.s.e.). in estimating the conditional expectation of the function $\psi(x)=x\mathds{1}_{x\leq 0}$ . A Monte Carlo estimate of the m.s.e. is depicted in Figure 3-(b) with $M=100$ runs. At time $t$ , it is calculated according to

[TABLE]

At time $t=0$ , the empirical distribution of the particles is an accurate approximation of the prior distribution, because the particles are sampled i.i.d. from the prior distribution. Therefore, the m.s.e at $t=0$ is small. As time progress, the difference between the empirical distribution and the exact posterior becomes larger because the filter update is not exact. For FPF, as the time-step $\Delta t$ is small, the main source of the m.s.e. error is due to the error in the gain function approximation. Therefore, the diffusion map FPF with its more accurate approximation of the gain yields better m.s.e., compared to the EnKF using the constant gain approximation. The particle filter, like FPF with diffusion map approximation, is able to capture the bi-modal distribution. However, due to the stochastic noise, introduced from the resampling step, it admits larger error.

5.3 Benes filter

Consider the following filtering problem:

[TABLE]

where $\{X_{t}\},\{Z_{t}\}\in\mathbb{R}$ are one-dimensional stochastic processes, $\{B_{t}\}$ and $\{W_{t}\}$ are one-dimensional, independent, Brownian motions, $x_{0}$ is a known initial condition, and the constants $\mu,\sigma_{B},h_{1},h_{2}\in\mathbb{R}$ . This filtering problem has a finite-dimensional analytical solution given by a mixture of two Gaussians [3]:

[TABLE]

where

[TABLE]

The three filtering algorithms, as in the previous example, are also implemented and evaluated for this problem. The simulation parameters are chosen according to the values used in [16]: $\mu=0.5$ , $h_{1}=0.4$ , $h_{2}=0$ , $\sigma_{B}=0.8$ , $x_{0}=1.0$ . The simulations are carried out over the time horizon $T=10$ . The stochastic integrals are approximated with a first-order Euler scheme using the discretization step-size $\Delta t=0.01$ . For FPF with DM gain approximation, the kernel bandwidth $\epsilon$ is selected according to the rule described in Remark 5.1 and number of iterations in Algorithm 1 is $L=100$ .

The numerical results are depicted in Fig. 4. It is observed that the FPF with DM and constant gain approximations admit almost the same accuracy. The reason is that the exact bimodal posterior distribution quickly converges to an almost uni-modal distribution. This is because the weight of one of the mixture modes converges to zero. The accuracy of the SIR particle filter is poor because of the stochastic noise introduced from resampling step.

6 Conclusions and Directions for Future Work

In this paper, the diffusion map (DM) algorithm was presented for the problem of gain function approximation in the FPF. It was shown that the approximation error converges to zero in the limit as the number of particles $N\to\infty$ and the kernel bandwidth parameter $\epsilon\to 0$ (Theorems 4.3 and 4.4). In the limit as $\epsilon\to\infty$ , the gain obtained using the DM algorithm was shown to converge to the constant gain approximation (Proposition 4.7). Consequently, in this limit, the FPF using the DM algorithm reduces to an EnKF. This is an important property because it suggests a path to improve the performance of an EnKF algorithm by choosing an appropriate (finite) value of the parameter $\epsilon$ . The bounds, scalings and the numerical experiments described in this paper provide guidance on how to choose the parameter $\epsilon$ for large but finite $N$ . Some directions for future work are as follows:

Relaxing the assumptions: The analysis is based on Assumption A1 which is restrictive because it does not include the mixture of Gaussians. Relaxing this assumption, possibly as suggested in Remark 2.1, is one possible avenue of future work. 2. 2.

Error analysis for the FPF: The error analysis in this paper concerns primarily the convergence of function ${\phi}^{(N)}_{\epsilon}$ to the exact solution $\phi$ . Extending these results to include the convergence analysis of the gain ${\sf K}_{\epsilon}^{(N)}=\nabla{\phi}^{(N)}_{\epsilon}$ to the exact gain ${\sf K}=\nabla\phi$ is important for the complete error analysis of the FPF with finitely many particles.

Appendix A Exact semigroup and and its diffusion map approximation for the Gaussian case

In this section, we provide explicit formulae for the exact semigroup $P_{t}$ and its diffusion map approximation $T_{\epsilon}$ , for the special case when the density $\rho$ is a Gaussian $\mathcal{N}(m,\Sigma)$ . For the Gaussian case, the semigroup is the Ornstein-Uhlenbeck semigroup [5, Sec. 2.7.1] an its spectral representation is obtained in terms of the Hermite polynomials. For notational ease, after an appropriate change of coordinates, we assume $m=0$ and $\Sigma=\text{diag}(\sigma_{1}^{2},\ldots,\sigma_{d}^{2})$ where $\sigma_{1}^{2}\geq\sigma_{2}^{2}\geq\ldots\geq\sigma_{d}^{2}>0$ are ordered eigenvalues of $\Sigma$ .

Definition A.1.

The Hermite polynomials are recursively defined as

[TABLE]

*where the prime ′ denotes the derivative. *

Proposition A.2.

Suppose the density $\rho$ is Gaussian ${\cal N}(0,\Sigma)$ with the variance $\Sigma=\text{diag}(\sigma_{1}^{2},\ldots,\sigma_{d}^{2})$ and $\sigma_{1}^{2}\geq\sigma_{2}^{2}\geq\ldots\geq\sigma_{d}^{2}>0$ . Then {romannum}

The exact semigroup $P_{t}$ and the diffusion map $T_{\epsilon}$ admit the following integral representations:

[TABLE]

where $\delta_{j}:=\epsilon\frac{\sigma_{j}^{2}+4\epsilon}{\sigma_{j}^{4}+3\sigma_{j}^{2}\epsilon+4\epsilon^{2}}$ for $j=1,\ldots,d$ .

The operators $P_{t}$ and $T_{\epsilon}$ each have a unique invariant Gaussian density given by $\mathcal{N}(0,\Sigma)$ and $\mathcal{N}(0,\Sigma_{\epsilon})$ , respectively, where $\Sigma_{\epsilon}=\text{diag}(\sigma_{\epsilon,1}^{2},\ldots,\sigma_{\epsilon,d}^{2})$ with $\sigma_{\epsilon,j}^{2}=\frac{2\epsilon(1-\delta_{j})}{\delta_{j}(2-\delta_{j})}$ for $j=1,\ldots,d$ .

The eigenvalues and the associated eigenfunctions are as follows:

[TABLE]

for $n=(n_{1},\ldots,n_{d})\in\mathbb{Z}_{+}^{d}$ .

*The operator norm $\|P_{t}\|_{L^{2}(\rho)}=e^{-\frac{t}{\sigma_{1}^{2}}}$ and $\|{T}_{\epsilon}\|_{L^{2}(\rho_{\epsilon})}=1-\delta_{1}$ .

Proof A.3.

*Omitted. See [62, Prop. 1]. *

Appendix B Proof of Proposition 2.4

Based on the use of the spectral representation Eq. 10, the weak solution of the Poisson equation is readily seen to be

[TABLE]

This solution Eq. 44 also satisfies the fixed-point equation Eq. 12 because

[TABLE]

The uniqueness of the solution to the fixed-point equation Eq. 12 follows from the contraction mapping principle because $\|P_{t}\|_{L^{2}_{0}(\rho)}=e^{-t\lambda_{1}}<1$ .

Appendix C Derivation of the linear form of the gain Eq. 19

By a direct calculation,

[TABLE]

which evaluated at $x=X^{i}$ yields

[TABLE]

Using the definitions Eq. 18 for ${\sf K}_{\epsilon}^{(N)}$ , and Eq. 20 for $r$ and $s$ ,

[TABLE]

Appendix D Proof of Proposition 4.1

{romannum}

${\sf T}$ is a Markov matrix because ${\sf T}_{ij}=\frac{1}{{n}_{\epsilon}^{(N)}(X^{i})}{k}_{\epsilon}^{(N)}(X^{i},X^{j})>0$ a.s. and

[TABLE]

The stationary distribution is $\pi$ because

[TABLE]

All entries of the Markov matrix are positive. Hence the Markov chain is irreducible and aperiodic. Therefore, the stationary distribution is unique. It is reversible because

[TABLE]

Denote $\delta:=\min_{ij}{\sf T}_{ij}$ . Then $\delta>0$ a.s. Therefore, $\|{\sf T}\|_{L^{2}_{0}(\pi)}\leq 1-\frac{N\delta}{2}<1$ , and is thus a contraction on $L^{2}_{0}(\pi)$ [56, Ch. 5]). It follows, from the contraction mapping principle, that the fixed point equation Eq. 16 has a unique solution.

Evaluating the definition Eq. 17 at $x=X^{i}$ concludes ${\phi}^{(N)}_{\epsilon}(X^{i})={\sf\Phi}_{i}$ because,

[TABLE]

Therefore ${\phi}^{(N)}_{\epsilon}$ solves the fixed-point equation Eq. 30, because

[TABLE]

Appendix E Proof of the Proposition 3.3

Proof E.1.

{romannum}

Let $U=-\frac{1}{2}\log(\rho)$ and $W=|\nabla U|^{2}-\Delta U$ as defined in Eqs. 22a and 22b. To obtain the representation Eq. 23 for the semigroup $P_{t}$ , consider the unitary transformation [5, Sec. 1.15.7]:

[TABLE]

Therefore, for any function $f\in C_{b}(\mathbb{R}^{d})$ ,

[TABLE]

where the stochastic representation (second equality) follows from the Feynman-Kac formula; $B_{t}^{x}$ is a Brownian motion initialized at $x$ . Setting $f(x)=e^{-U(x)}g(x)$ ,

[TABLE]

which is the representation Eq. 23.

Next, the representation Eq. 24 is obtained. Using the definitions, (13) of ${T}_{\epsilon}$ and Eqs. 22a and 22b of $U_{\epsilon}$ and $W_{\epsilon}$ ,

[TABLE]

where the final equality follows from using the stochastic representation of the heat semigroup $G_{\epsilon}$ . The representation Eq. 24 is obtained by iterating this formula $n$ times.

Without loss of generality, upon a change of coordinates, assume $m=0$ and $\Sigma=\text{diag}(\sigma_{1}^{2},\ldots,\sigma_{d}^{2})$ in Assumption A1. Using the definitions

[TABLE]

Now, $\log(\rho(x))=\log(\rho_{g}(x;\Sigma))+w(x)$ . So, the main calculation is to approximate $\log(G_{\epsilon}(\rho))$ . Using the definition

[TABLE]

where $\delta_{n}=\frac{2\epsilon}{\sigma_{n}^{2}+2\epsilon}$ , $\delta=\text{diag}(\delta_{1},\ldots,\delta_{d})$ and $G^{(\delta)}_{\epsilon}$ is the semigroup associated with the PDE $\frac{\partial}{\partial t}G^{(\delta)}_{t}f=G^{(\delta)}_{t}(\text{tr}((I-\delta)\nabla^{2}f))$ .

The Taylor expansion of $G^{(\delta)}_{\epsilon}(e^{-w})$ , about $\epsilon=0$ , is expressed as

[TABLE]

where $\partial^{2}_{m}:=\frac{\partial^{2}}{\partial x_{n}^{2}}$ , and $\nabla^{2}e^{-w}$ denotes is the Hessian matrix of $e^{-w}$ .

Using the property that $G^{(\delta)}_{s}\partial_{n}f=\partial_{n}G^{(\delta)}f$ , $\|G^{(\delta)}_{t}(f)\|_{L^{\infty}}\leq\|f\|_{L^{\infty}}$ and the assumption (A1) that $w\in C^{\infty}_{b}(\mathbb{R}^{d})$ , we conclude that $r_{\epsilon}\in C^{\infty}_{b}(\mathbb{R}^{d})$ . Therefore,

[TABLE]

The asymptotic expansion of $w_{\epsilon}^{(1)}$ , as $\epsilon\to 0$ , is obtained as

[TABLE]

where the remainder term has at most linear growth as $|x|\to\infty$ .

Substituting the asymptotic expression for $\log(G_{\epsilon}\rho(x))$ in Eq. 46,

[TABLE]

where the remainder $O(\epsilon^{2})$ error term has at most quadratic growth as $|x|\to\infty$ . This concludes the proof of approximation Eq. 25a.

Based on this above calculation, the following estimate for an upper bound of the function $U$ is obtained (it is used in the proof of Proposition 4.2):

[TABLE]

where recall $\sigma_{1}^{2}=\lambda_{\text{min}}(\Sigma)$ .

Next, the approximation Eq. 25b is derived. Using the definition

[TABLE]

By repeating the steps, just used to approximate $\log(G_{\epsilon}(\rho))$ , it is shown

[TABLE]

where

[TABLE]

Therefore,

[TABLE]

where the error term has at most quadratic growth as $|x|\to\infty$ . This concludes the proof of the approximation Eq. 25b.

Based on this above calculation, the following estimate for a lower bound of the function $W_{\epsilon}$ is obtained (it is used in the proof of Proposition 4.2):

[TABLE]

where $\alpha=\frac{1}{16\sigma_{d}^{4}}$ , $\beta=\frac{1}{2}(\|\nabla w\|_{L^{\infty}}^{2}+\text{Tr}(\Sigma^{-1})+\|\Delta w\|_{L^{\infty}}+\frac{C_{2}}{8\sigma_{1}^{2}})$ and $\epsilon\leq\frac{1}{16C_{1}\sigma_{d}^{4}}$ (where recall $\sigma_{d}^{2}=\lambda_{\text{max}}(\Sigma)$ ).

Let $\tilde{P}_{t}$ denote the semigroup for the weighted Laplacian $\Delta_{q}$ with the density $q(x)=e^{-2U_{\epsilon}(x)}$ . We break the error into two parts:

[TABLE]

The bounds for the two terms on the right-hand side are derived in the following two steps:

Step 1. Using the stochastic representation Eq. 23-Eq. 24,

[TABLE]

where $\zeta_{t}:=e^{-\epsilon\sum_{k=0}^{n-1}W_{\epsilon}(B^{x}_{2k\epsilon})}-e^{-\int_{0}^{t}W(B^{x}_{2s})\,\mathrm{d}s}$ . By the Cauchy-Schwartz inequality

[TABLE]

Next we obtain a bound for $\zeta_{t}$ . Upon using the inequality $|e^{-x}-e^{-y}|\leq e^{-\min(x,y)}|x-y|$ ,

[TABLE]

where $C=t\min(\min_{x\in\mathbb{R}^{d}}W(x),\min_{x\in\mathbb{R}^{d}}W_{\epsilon}(x))$ . Now, $C$ is finite because, as $|x|\to\infty$ , $W(x)\rightarrow\infty$ (Assumption A1) and $W_{\epsilon}(x)\rightarrow\infty$ (by (25b)). As a result, by triangle inequality, ${\sf E}[|\zeta_{t}|^{2}]^{\frac{1}{2}}\leq e^{-C}{\sf E}[|\text{(first term)}|^{2}]^{\frac{1}{2}}+e^{-C}{\sf E}[|\text{(second term)}|^{2}]^{\frac{1}{2}}$ .

The expectation of the first term is bounded as follows:

[TABLE]

where the second inequality follows from the bound $|W_{\epsilon}(x)-W(x)|=\epsilon|r^{(2)}_{\epsilon}(x)|\leq\epsilon(C_{1}|x|^{2}+C_{2})$ for some constants $C_{1},C_{2}$ (see Eq. 25b).

The expectation of the second term in Eq. 49 is bounded as follows:

[TABLE]

where the Taylor expansion of $W(x)$ is used to obtain the first inequality, and for the second inequality, Assumption (A1) is used to bound $|\nabla W(x)|\leq\|\Sigma^{-1}\||x|+\|\nabla w\|_{L^{\infty}}=C_{3}|x|+C_{4}$ and $\|\Delta W\|_{L^{\infty}}\leq\|\Sigma^{-1}\|+\|\Delta w\|_{L^{\infty}}=C_{5}$ .

Putting together the two expectation bounds ,

[TABLE]

where $C$ is a constant that only depends on $t_{0}$ . Upon taking the $L^{2}(\rho)$ norm

[TABLE]

Step 2. Because $P_{t}$ and $\tilde{P}_{t}$ are semigroups with generators $\Delta_{\rho}$ and $\Delta_{q}$ , respectively, we have the identity: $P_{t}f-\tilde{P}_{t}f=\int_{0}^{t}P_{t-s}(\Delta_{\rho}-\Delta_{q})\tilde{P}_{s}f\,\mathrm{d}s$ . Upon taking the $L^{2}(\rho)$ norm of both sides, using the triangle inequality, because $P_{t}$ is contraction on $L^{2}(\rho)$ ,

[TABLE]

Now,

[TABLE]

where the identity $\Delta_{\rho}f-\Delta_{q}f=2\nabla U\cdot\nabla f-2\nabla U_{\epsilon}\cdot\nabla f$ is used in the first step, the Cauchy-Schwartz inequality in the second step, and the bounds $|\nabla U_{\epsilon}(x)-\nabla U(x)|\leq\epsilon(C_{1}|x|+C_{2})$ and $\|\nabla\tilde{P}_{s}f\|_{L^{4}(q)}\leq\|\nabla f\|_{L^{4}(q)}$ in the third step.

Combining the two sets of bounds in steps 1 and 2, one obtains Eq. 26.

Appendix F Proof of the Proposition 4.2

Proof F.1.

(i) The Lyapunov condition Eq. 36a, known as DV(3) of [39], is the necessary and sufficient condition for geometric ergodicity (and in fact the stronger $U_{\epsilon}$ -uniform ergodicity) [46, Thm. 15.0.1]. The distribution $\rho_{\epsilon}$ is invariant because $\forall f\in C_{b}(\mathbb{R}^{d})$ ,

[TABLE]

(ii) The invariant density $\rho_{\epsilon}$ is reversible because $\forall f,g\in C_{b}(\mathbb{R}^{d})$

[TABLE]

The spectral gap follows from Lyapunov condition Eq. 36a and the fact that the chain is reversible [52, Thm 2.1]. The spectral gap is denoted as $\lambda$ .

(iii) The solution ${\phi}_{\epsilon}$ satisfies the bound:

[TABLE]

It remains to verify the Lyapunov condition Eq. 36a: Using Eq. 24

[TABLE]

where the second inequality follows from using the lower bound $W_{\epsilon}(x)\geq\alpha|x|^{2}-\beta$ derived in Eq. 48.

We now claim that

[TABLE]

for $m=1,\ldots,n$ where $\{\alpha_{m}\}_{m=1}^{n}$ and $\{\beta_{m}\}_{m=1}^{n}$ are defined using the recursions:

[TABLE]

Assuming for now that the claim is true

[TABLE]

An upper-bound for $\beta_{n}$ and a lower-bound for $\alpha_{n}$ are obtained as follows:

For the sequence $\{\beta_{m}\}_{m=1}^{n}$ ,

[TABLE] 2. 2.

For the sequence $\{\alpha_{m}\}_{m=1}^{n}$ ,

[TABLE]

Therefore,

[TABLE]

It then follows

[TABLE]

*Upon using the two bounds *

[TABLE]

where the second inequality follows from using the upper bound $U_{\epsilon}(x)\leq\frac{1}{8\sigma_{1}^{2}}|x|^{2}+\frac{\sigma_{1}^{2}}{8}C$ derived in Eq. 47. The following estimates are obtained for constants

[TABLE]

It remains to prove the claim Eq. 50. The constants $\alpha_{1}$ and $\beta_{1}$ for $m=1$ are easily verified by direct evaluation and for $m>1$ ,

[TABLE]

The minorization inequality Eq. 36b is obtained next. For $|x|\leq R$ :

[TABLE]

where

[TABLE]

*because $\text{Prob}(\underset{s\in[0,2t]}{\sup}B_{s}\geq 10)\leq e^{-\frac{100}{2t}}\leq e^{-\frac{50}{t_{0}}}$ . *

Appendix G Proof of the Theorem 4.3

Proof G.1.

{romannum}

The existence of the solution is proved in Proposition 4.2.

We break the error into two parts:

[TABLE]

where $\tilde{\phi}$ is the solution to the fixed point equation $\tilde{\phi}=P_{\epsilon}\tilde{\phi}+\epsilon(h-\hat{h}_{\rho})$ with the exact semigroup $P_{\epsilon}$ . The bounds for the two terms on the right-hand side are derived in the following two steps:

Step 1. Iterating the formula $\tilde{\phi}=P_{\epsilon}\tilde{\phi}+\epsilon(h-\hat{h}_{\rho})$ for $n=\lfloor\frac{1}{\epsilon}\rfloor$ times yields,

[TABLE]

and subtracting this from Eq. 35 gives

[TABLE]

This forms a (discrete) Poisson equation whose solution exists and is bounded according to Proposition 4.2:

[TABLE]

where we used $\|\cdot\|_{L^{2}(\rho_{\epsilon})}\leq C\|\cdot\|_{L^{2}(\rho)}$ in the second step. This is true because $\rho_{\epsilon}(x)=e^{-U_{\epsilon}(x)}G_{\epsilon}(e^{-U_{\epsilon}})(x)=\rho(x)e^{-3\epsilon W(x)-\epsilon\Delta V(x)+O(\epsilon^{2})}\leq C\rho(x)$ using the formula Eq. 25a.

It remains to bound the three terms inside the bracket in Eq. 51:

[TABLE]

*by using the error estimates Proposition 3.3-(iii). Therefore, *

[TABLE]

Step 2. Both $\phi$ and $\tilde{\phi}$ are solutions with the exact semigroup $P_{\epsilon}$ . Using the spectral representation (10),

[TABLE]

Therefore,

[TABLE]

and thus $\|\tilde{\phi}-\phi\|_{L^{2}(\rho_{\epsilon})}\leq C\|\tilde{\phi}-\phi\|_{L^{2}(\rho)}\leq\epsilon^{2}C\|h\|_{L^{2}(\rho)}^{2}$ .

Combining the estimates from steps 1 and 2,

[TABLE]

Appendix H Proof of the Proposition 3.4

Proof H.1.

Denote $\eta_{j}=(\sqrt{\frac{({g}_{\epsilon}*\rho)(X^{j})}{\frac{1}{N}\sum_{l=1}^{N}{g}_{\epsilon}(X^{j},X^{l})}}-1)$ and express:

[TABLE]

where

[TABLE]

{romannum}

To prove the part-(i) of the Proposition 3.4, the strategy is to show that as $N\to\infty$ the stochastic terms $\xi_{1}^{(N)},\xi^{(N)}_{2},\zeta_{1}^{(N)},\zeta_{2}^{(N)}$ converge to zero almost surely. We do this in two steps below, $\xi_{1}^{(N)},\xi^{(N)}_{2}$ in step 1, and $\zeta_{1}^{(N)},\zeta_{2}^{(N)}$ in step 2.

Step 1:* Convergence of $\xi_{1}^{(N)}$ and $\xi_{1}^{(N)}$ follows from direct application of the strong law of large numbers (SLLN). The SLLN applies because the summand for $\xi_{1}^{(N)}$ and $\xi_{2}^{(N)}$ are independent and identically distributed (i.i.d) and moreover have finite variance:*

[TABLE]

where we used ${g}_{\epsilon}^{2}(x,y)\leq C\epsilon^{-d/2}g_{\epsilon/2}(x,y)$ .

Step 2:* In order to show the almost sure convergence of $\zeta_{1}^{(N)}$ and $\zeta_{1}^{(N)}$ to zero, we first show that in the limit as $N\to\infty$ ,*

[TABLE]

with probability larger than $1-\delta$ for any arbitrary choice of $\delta\in(0,1)$ . Assuming for now that the claim is true, it then follows

[TABLE]

with probability larger than $1-\delta$ . The term inside the bracket converges almost surely to its limit ${\sf E}[{k}_{\epsilon}(x,X)\frac{|f(X)|}{\sqrt{{g}_{\epsilon}*\rho(X)}}]$ , by SLLN, because

[TABLE]

The proof that $\zeta_{1}^{(N)}\overset{\text{a.s.}}{\longrightarrow}0$ is completed by an application of the Borel-Cantelli lemma. Indeed, choose a sequence $\{\delta_{N}\}_{N=1}^{\infty}$ given by $\delta_{N}=\frac{1}{N^{2}}$ . Then $\sum_{N=1}^{\infty}\text{Prob}(\zeta^{(N)}_{1}>\epsilon_{N})\leq\sum_{N=1}^{\infty}\delta_{N}<\infty$ where $\epsilon_{N}=\sqrt{\frac{C\log(N^{3})}{N\epsilon^{d/2}}}$ . Because $\epsilon_{N}\to 0$ , then $\zeta_{1}^{(N)}\overset{\text{a.s}}{\to}0$ . The proof of $\zeta_{2}^{(N)}\overset{\text{a.s}}{\to}0$ is identical.

It remains to prove the claim Eq. 54, which can be established using the Bernstein inequality as follows. We have for any $a>0$ :

[TABLE]

The random variables ${g}_{\epsilon}(X^{i},X^{j})$ are i.i.d, bounded by ${(4\pi\epsilon)^{-\frac{d}{2}}}$ , and the variance

[TABLE]

Therefore by Bernstein inequality,

[TABLE]

with probability higher than $1-\delta$ . The result is obtained by union bound for $i=1,\ldots,N$ and $\|\frac{g_{\epsilon/2}*\rho}{g_{\epsilon}*\rho}\|_{\infty}<\infty$ .

Collecting the estimates Eqs. 52, 53, and 55 and application of the Bernstein inequality yields:

[TABLE]

with probability larger than $1-4\delta$ . Therefore one obtains the bound:

[TABLE]

with probability larger than $1-4\delta$ . Upon squaring and integrating both sides with respect to $\rho(x)$ proves the rate:

[TABLE]

Appendix I Proof of the Theorem 4.4

In the proof of Theorem 4.4, the function space of interest is $C_{b}(\Omega)$ , the Banach space of continuous bounded functions on (a compact set) $\Omega\subset\mathbb{R}^{d}$ equipped with the $\|\cdot\|_{L^{\infty}(\Omega)}$ norm. Also, define the space $C_{0}(\Omega):=\{f\in C(\Omega)\mid\int f\rho_{\epsilon}=0\}$ , as subspace of functions in $C_{b}(\Omega)$ with zero mean. Consider ${T}_{\epsilon}$ and ${{T}^{(N)}_{\epsilon}}$ as linear operators from $C_{b}(\Omega)$ to $C_{b}(\Omega)$ .

Part-(i) has already been proved as part of the Proposition 4.1. The proof of part (ii) relies on the verification of the following three conditions: {romannum}

The family of operators $\{{{T}^{(N)}_{\epsilon}}\}_{N=1}^{\infty}$ is collectively compact, as linear operators on $C_{b}(\Omega)$ .

For any function $f\in C_{b}(\Omega)$ ,

[TABLE]

The operator $(I-{T}_{\epsilon})^{-1}$ is a bounded operator on $C_{0}(\Omega)$ .

Once these three conditions have been verified, the convergence result Eq. 39 follows from a standard result in the approximation theory of the numerical solutions of integral equations [34, Thm. 7.6.6].

The proof of the three conditions is as follows: {romannum}

Condition (i) holds if the set $S=\{{{T}^{(N)}_{\epsilon}}f;~{}\forall f\in C_{b}(\Omega),\|f\|_{\infty}\leq 1,N\in\mathbb{N}\}$ is relatively compact. Relative compactness follows from an application of the Arzela-Ascoli theorem. In order to apply Arzela-Ascoli theorem, we need to show that $S$ is uniformly bounded and equicontinuous. The two conditions hold because

[TABLE]

for all $x,x^{\prime}\in\Omega$ and $f$ such that $\|f\|_{L^{\infty}}\leq 1$ . The detailed calculation to obtain the second inequality appears at the end of the proof.

Fix a function $f\in C_{b}(\Omega)$ . From Proposition 3.4-(i), we know that ${{T}^{(N)}_{\epsilon}}f(x)$ converges to ${T}_{\epsilon}f(x)$ almost surely pointwise for all $x\in\Omega$ . Because $\Omega$ is compact and $\{{{T}^{(N)}_{\epsilon}}f\}$ is equicontinuous, pointwise convergence implies uniform convergence Eq. 56.

From parts (i) and (ii) above, it can be concluded that ${T}_{\epsilon}$ is a compact operator. Therefore, using the Fredholm alternative theorem, in order to show $(I-{T}_{\epsilon})^{-1}$ is bounded, it is enough to show that $I-{T}_{\epsilon}$ is injective. The injectivity property is shown by contradiction. Suppose there exists a function $f\in C_{0}(\Omega)$ such that $f-{T}_{\epsilon}f=0$ . Let $x_{0}\in\Omega$ be a point that achieves the maximum of the function $f$ . Such a point exists because $f$ is continuous and $\Omega$ is compact. Evaluating $f-{T}_{\epsilon}f=0$ at $x=x_{0}$ yields

[TABLE]

Because ${k}_{\epsilon}(x_{0},y)>0$ and $f(y)\leq f(x_{0})$ , this implies $f(y)=f(x_{0})$ for all $y\in\Omega$ . Therefore, the function $f$ is a constant. But the only constant function in $C_{0}(\Omega)$ is zero. Hence $I-{T}_{\epsilon}$ is injective and its inverse $(I-{T}_{\epsilon})^{-1}$ is bounded.

It remains to prove the equicontinuity inequality Eq. 57 which is done next:

[TABLE]

where the last inequality is obtained as follows

[TABLE]

where $R=\max_{x,y\in\Omega}|x-y|$ is the diameter of $\Omega$ .

Appendix J Proof of Proposition 4.7

{romannum}

Consider first the finite- $N$ case. In the asymptotic limit as $\epsilon\rightarrow\infty$ , we have $(2\pi\epsilon)^{d/2}{g}_{\epsilon}(x,y)=1+O(\frac{1}{\epsilon})$ . Therefore,

[TABLE]

and

[TABLE]

It is also easy to see, e.g., by using a Neumann series solution, that in the asymptotic limit as $\epsilon\rightarrow\infty$ , the solution of the fixed-point equation Eq. 31 is given by

[TABLE]

Therefore,

[TABLE]

and using the gain approximation formula Eq. 19,

[TABLE]

The calculations for the kernel formula are entirely analogous. In the asymptotic limit as $\epsilon\rightarrow\infty$ ,

[TABLE]

and, using $\theta(x)=x$ to denote the coordinate function and $\cdot$ to denote function multiplication, the gain approximation formula Eq. 41 evaluates to

[TABLE]

Bibliography69

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. M. Anselone , Collectively compact operator approximation theory and applications to integral equations , Prentice Hall, 1971.
2[2] K. Atkinson , A survey of numerical methods for the solution of Fredholm integral equations of the second kind , Soc. for Industrial and Applied Mathematics, Philadelphia, PA, 1976, https://cds.cern.ch/record/107092 .
3[3] A. Bain and D. Crisan , Fundamentals of stochastic filtering , vol. 3, Springer, 2009, https://doi.org/10.1007/978-0-387-76896-0 . · doi ↗
4[4] D. Bakry, F. Barthe, P. Cattiaux, and A. Guillin , A simple proof of the Poincaré inequality for a large class of probability measures including the log-concave case , Electron. Commun. Probab, 13 (2008), pp. 60–66, https://doi.org/10.1214/ECP.v 13-1352 . · doi ↗
5[5] D. Bakry, I. Gentil, and M. Ledoux , Analysis and geometry of Markov diffusion operators , vol. 348, Springer Science & Business Media, 2013, https://doi.org/10.1007/978-3-319-00227-9_3 . · doi ↗
6[6] M. Belkin , Problems of learning on manifolds , Ph D thesis, The University of Chicago, 2003. AAI 3097083.
7[7] M. Belkin and P. Niyogi , Convergence of Laplacian eigenmaps , in Advances in Neural Information Processing Systems, 2007, pp. 129–136, https://doi.org/10.7551/mitpress/7503.003.0021 . · doi ↗
8[8] K. Bergemann and S. Reich , An ensemble Kalman-Bucy filter for continuous data assimilation , Meteorologische Zeitschrift, 21 (2012), pp. 213–219, https://doi.org/10.1127/0941-2948/2012/0307 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Diffusion map-based algorithm for Gain function approximation in the Feedback Particle Filter††thanks: Financial support from the NSF CMMI grants 1334987 and 1462773 is gratefully acknowledged.

Abstract

keywords:

1 Introduction

1.1 Contributions of this paper

1.2 Relationship to prior work

1.3 Literature survey

1.4 Paper outline

1.5 Notation

2 Gain function approximation

2.1 Problem formulation

2.2 Mathematical preliminaries

Remark 2.1**.**

2.2.1 Spectral representation

2.2.2 Weak formulation

Proposition 2.2**.**

Remark 2.3** (Constant gain approximation).**

2.2.3 Semigroup

Proposition 2.4**.**

3 Diffusion map-based Algorithm

Remark 3.1** (Numerical procedure).**

Remark 3.2**.**

3.1 Approximation results

Proposition 3.3**.**

Proposition 3.4**.**

Remark 3.5** (Related work).**

4 Convergence and error analysis

Proposition 4.1**.**

4.1 Example - the scalar case

4.2 Bias

Proposition 4.2**.**

Theorem 4.3**.**

4.3 Variance

Theorem 4.4**.**

Remark 4.5** (Convergence rate).**

Remark 4.6**.**

4.4 Relationship to the constant gain approximation

Proposition 4.7**.**

5 Numerics

5.1 Example - the vector case

Remark 5.1** (Selection of ϵ\epsilonϵ).**

Remark 5.2**.**

5.2 Filtering example

5.3 Benes filter

6 Conclusions and Directions for Future Work

Appendix A Exact semigroup and and its diffusion map approximation for the Gaussian case

Definition A.1**.**

Proposition A.2**.**

Proof A.3**.**

Appendix B Proof of Proposition 2.4

Appendix C Derivation of the linear form of the gain Eq. 19

Appendix D Proof of Proposition 4.1

Appendix E Proof of the Proposition 3.3

Proof E.1**.**

Appendix F Proof of the Proposition 4.2

Proof F.1**.**

Appendix G Proof of the Theorem 4.3

Proof G.1**.**

Appendix H Proof of the Proposition 3.4

Proof H.1**.**

Appendix I Proof of the Theorem 4.4

Appendix J Proof of Proposition 4.7

Remark 2.1.

Proposition 2.2.

Remark 2.3 (Constant gain approximation).

Proposition 2.4.

Remark 3.1 (Numerical procedure).

Remark 3.2.

Proposition 3.3.

Proposition 3.4.

Remark 3.5 (Related work).

Proposition 4.1.

Proposition 4.2.

Theorem 4.3.

Theorem 4.4.

Remark 4.5 (Convergence rate).

Remark 4.6.

Proposition 4.7.

Remark 5.1 (Selection of $\epsilon$ ).

Remark 5.2.

Definition A.1.

Proposition A.2.

Proof A.3.

Proof E.1.

Proof F.1.

Proof G.1.

Proof H.1.