Asymptotically exact unweighted particle filter for manifold-valued   hidden states and point process observations

Simone Carlo Surace; Anna Kutschireiter; Jean-Pascal Pfister

arXiv:1907.10143·math.OC·November 1, 2019·IEEE Control. Syst. Lett.

Asymptotically exact unweighted particle filter for manifold-valued hidden states and point process observations

Simone Carlo Surace, Anna Kutschireiter, Jean-Pascal Pfister

PDF

TL;DR

This paper introduces an asymptotically exact particle filter for manifold-valued hidden states with point process observations, utilizing intrinsic dynamics and PDE-based control terms to improve filtering accuracy.

Contribution

It develops a novel filter (ppFPF) that extends feedback particle filtering to manifolds with point process data, using PDE solutions for control, ensuring intrinsic and accurate state estimation.

Findings

01

The filter accurately updates particles on manifolds during observations.

02

It leverages PDE solutions similar to weighted Poisson equations for control.

03

The method is compatible with existing PDE approximation algorithms.

Abstract

The filtering of a Markov diffusion process on a manifold from counting process observations leads to `large' changes in the conditional distribution upon an observed event, corresponding to a multiplication of the density by the intensity function of the observation process. If that distribution is represented by unweighted samples or particles, they need to be jointly transformed such that they sample from the modified distribution. In previous work, this transformation has been approximated by a translation of all the particles by a common vector. However, such an operation is ill-defined on a manifold, and on a vector space, a constant gain can lead to a wrong estimate of the uncertainty over the hidden state. Here, taking inspiration from the feedback particle filter (FPF), we derive an asymptotically exact filter (called ppFPF) for point process observations, whose particles…

Figures8

Click any figure to enlarge with its caption.

Equations46

d X_{t}^{i} = V_{0}^{i} d t + V_{j}^{i} \circ d B_{t}^{j}

d X_{t}^{i} = V_{0}^{i} d t + V_{j}^{i} \circ d B_{t}^{j}

A = V_{0} + \frac{1}{2} j = 1 \sum r V_{j}^{2},

A = V_{0} + \frac{1}{2} j = 1 \sum r V_{j}^{2},

d Y_{t} = h (X_{t}) d t + d W_{t},

d Y_{t} = h (X_{t}) d t + d W_{t},

d μ_{t} = (A^{†} μ_{t}) d t + (h - \hat{h}_{t}) μ_{t} (d Y_{t} - \hat{h}_{t} d t),

d μ_{t} = (A^{†} μ_{t}) d t + (h - \hat{h}_{t}) μ_{t} (d Y_{t} - \hat{h}_{t} d t),

\int_{M} φ A^{†} μ = \int_{M} (A φ) μ .

\int_{M} φ A^{†} μ = \int_{M} (A φ) μ .

d S_{t} = V_{0} d t + V_{j} \circ d Z_{t}^{j} + K_{t} \circ d Y_{t} + Ω_{t} d t,

d S_{t} = V_{0} d t + V_{j} \circ d Z_{t}^{j} + K_{t} \circ d Y_{t} + Ω_{t} d t,

d \int \displaylimits_{M} φ \overset{μ}{ˉ}_{t} = \int \displaylimits_{M} (A φ d t + K_{t} φ \circ d Y_{t} + Ω_{t} φ d t) \overset{μ}{ˉ}_{t} = \int \displaylimits_{M} φ (A^{†} \overset{μ}{ˉ}_{t} d t - L_{K_{t}} \overset{μ}{ˉ}_{t} \circ d Y_{t} - L_{Ω_{t}} \overset{μ}{ˉ}_{t} d t) + boundary terms .

d \int \displaylimits_{M} φ \overset{μ}{ˉ}_{t} = \int \displaylimits_{M} (A φ d t + K_{t} φ \circ d Y_{t} + Ω_{t} φ d t) \overset{μ}{ˉ}_{t} = \int \displaylimits_{M} φ (A^{†} \overset{μ}{ˉ}_{t} d t - L_{K_{t}} \overset{μ}{ˉ}_{t} \circ d Y_{t} - L_{Ω_{t}} \overset{μ}{ˉ}_{t} d t) + boundary terms .

d \overset{μ}{ˉ}_{t} = (A^{†} \overset{μ}{ˉ}_{t} - L_{Ω_{t}} \overset{μ}{ˉ}_{t} + \frac{1}{2} L_{K_{t}}^{2} \overset{μ}{ˉ}_{t}) d t - (L_{K_{t}} \overset{μ}{ˉ}_{t}) d Y_{t} .

d \overset{μ}{ˉ}_{t} = (A^{†} \overset{μ}{ˉ}_{t} - L_{Ω_{t}} \overset{μ}{ˉ}_{t} + \frac{1}{2} L_{K_{t}}^{2} \overset{μ}{ˉ}_{t}) d t - (L_{K_{t}} \overset{μ}{ˉ}_{t}) d Y_{t} .

L_{K_{t}} μ_{t}

L_{K_{t}} μ_{t}

L_{Ω_{t}} μ_{t}

Ω_{t} = - \frac{1}{2} (h + \hat{h}) K_{t}

Ω_{t} = - \frac{1}{2} (h + \hat{h}) K_{t}

div_{vol_{g}} (p_{t} grad ϕ_{t}) = - (h - \hat{h}_{t}) p_{t} .

div_{vol_{g}} (p_{t} grad ϕ_{t}) = - (h - \hat{h}_{t}) p_{t} .

K^{CG} = \int \displaylimits_{\mathds R^{n}} (grad ϕ) μ = \int \displaylimits_{\mathds R^{n}} x (h (x) - \hat{h}) μ (d x) .

K^{CG} = \int \displaylimits_{\mathds R^{n}} (grad ϕ) μ = \int \displaylimits_{\mathds R^{n}} x (h (x) - \hat{h}) μ (d x) .

E (μ, φ) : L_{V} μ = - (φ - \int_{M} φ μ) μ,

E (μ, φ) : L_{V} μ = - (φ - \int_{M} φ μ) μ,

d μ_{t} = (A^{†} μ_{t}) d t + (\frac{h}{h ^ _{t^{i}}} - 1) μ_{t^{-}} (d N_{t} - \hat{h}_{t} d t),

d μ_{t} = (A^{†} μ_{t}) d t + (\frac{h}{h ^ _{t^{i}}} - 1) μ_{t^{-}} (d N_{t} - \hat{h}_{t} d t),

d S_{t} = V_{0} d t + V_{j}^{i} \circ d Z_{t}^{j} + Ω_{t} d t .

d S_{t} = V_{0} d t + V_{j}^{i} \circ d Z_{t}^{j} + Ω_{t} d t .

d \overset{μ}{ˉ}_{t} = (A^{†} \overset{μ}{ˉ}_{t} - L_{Ω_{t}} \overset{μ}{ˉ}_{t}) d t .

d \overset{μ}{ˉ}_{t} = (A^{†} \overset{μ}{ˉ}_{t} - L_{Ω_{t}} \overset{μ}{ˉ}_{t}) d t .

L_{Ω_{t}} μ_{t} = (h - \hat{h}_{t}) μ_{t},

L_{Ω_{t}} μ_{t} = (h - \hat{h}_{t}) μ_{t},

μ_{t^{-}} \mapsto μ_{t} = \frac{h}{h ^ _{t^{-}}} μ_{t^{-}},

μ_{t^{-}} \mapsto μ_{t} = \frac{h}{h ^ _{t^{-}}} μ_{t^{-}},

(T_{t^{-}})_{*} μ_{t^{-}} = \frac{h}{h ^ _{t^{-}}} μ_{t^{-}},

(T_{t^{-}})_{*} μ_{t^{-}} = \frac{h}{h ^ _{t^{-}}} μ_{t^{-}},

\tilde{μ}_{t, s} = \frac{h ^{s} μ _{t^{-}}}{\int \displaylimits _{M} h ^{s} μ _{t^{-}}}, 0 \leq s \leq 1.

\tilde{μ}_{t, s} = \frac{h ^{s} μ _{t^{-}}}{\int \displaylimits _{M} h ^{s} μ _{t^{-}}}, 0 \leq s \leq 1.

\mathscr{L}_{V_{t,s}}\tilde{\mu}_{t,s}=-\frac{d}{ds}\tilde{\mu}_{t,s}=-\Bigg{(}\log h-\int_{M}(\log h)\tilde{\mu}_{t,s}\Bigg{)}\tilde{\mu}_{t,s},

\mathscr{L}_{V_{t,s}}\tilde{\mu}_{t,s}=-\frac{d}{ds}\tilde{\mu}_{t,s}=-\Bigg{(}\log h-\int_{M}(\log h)\tilde{\mu}_{t,s}\Bigg{)}\tilde{\mu}_{t,s},

d S_{t} = V_{0} d t + V_{j}^{i} \circ d Z_{t}^{j} + Ω_{t} d t,

d S_{t} = V_{0} d t + V_{j}^{i} \circ d Z_{t}^{j} + Ω_{t} d t,

event at time t :

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Asymptotically exact unweighted particle filter for manifold-valued hidden states and point process observations

Simone Carlo Surace*†, Anna Kutschireiter†,∗, Jean-Pascal Pfister†,∘* *†*Department of Physiology, University of Bern, Switzerland. *∗*Department of Neurobiology, Harvard Medical School, Boston MA, USA. *∘*Institute of Neuroinformatics, University and ETH Zurich, Switzerland. This work was supported by the Swiss National Science Foundation, grant PP00P3_179060. Corresponding author: [email protected]

Abstract

The filtering of a Markov diffusion process on a manifold from counting process observations leads to ‘large’ changes in the conditional distribution upon an observed event, corresponding to a multiplication of the density by the intensity function of the observation process. If that distribution is represented by unweighted samples or particles, they need to be jointly transformed such that they sample from the modified distribution. In previous work, this transformation has been approximated by a translation of all the particles by a common vector. However, such an operation is ill-defined on a manifold, and on a vector space, a constant gain can lead to a wrong estimate of the uncertainty over the hidden state. Here, taking inspiration from the feedback particle filter (FPF), we derive an asymptotically exact filter (called ppFPF) for point process observations, whose particles evolve according to intrinsic (i.e. parametrization-invariant) dynamics that are composed of the dynamics of the hidden state plus additional control terms. While not sharing the gain-times-error structure of the FPF, the optimal control terms are expressed as solutions to partial differential equations analogous to the weighted Poisson equation for the gain of the FPF. The proposed filter can therefore make use of existing approximation algorithms for solutions of weighted Poisson equations.

Index Terms:

Filtering, Estimation, Stochastic systems, Mean field games, Stochastic optimal control

I Introduction

Alarge number of natural and engineered systems and datasets have states that are naturally described as elements of smooth manifolds. Classical cases are the motion of a body constrained by equality constraints, motion on the surface of the earth, or the attitude of a rigid body. Increasingly, the systems are very high-dimensional, whereas data points often lie on relatively low-dimensional manifolds, whose structure can be exploited for filtering and estimation problems.

In filtering, the state of the system (called the hidden state) needs to be estimated from the history of observations. In practise, observations often arrive sparsely, randomly and in digital form. One example is when observations are simple event counts. Such counting or point process observations arise in a variety of applications of time series models, e.g. neuroscience, geosciences, or finance.

The exact solution of the filtering problem is intractable in most cases and requires numerical approximation. One approach has been the class of interacting particle algorithms, in which an unweighted ensemble of $N$ particles is propagated based on the known dynamics of the hidden state and the incoming observations. The feedback particle filter (FPF) [1]-[2] is such an algorithm that is based on mean-field optimal control, with a gain $\times$ error structure that is reminiscent of the Kalman filter. The gain is given by the solution of a partial differential equation (PDE), which makes the FPF exact in the limit of large $N$ even for nonlinear problems. Although in practise the gain has to be estimated from the particles, unweighted approaches hold the promise of scaling to high-dimensional problems, in contrast to particle algorithms with importance weights [3].

In this paper, we consider the problem of finding an FPF-like algorithm for systems whose hidden states evolve continuously in time on a known smooth manifold and observations are given by a conditional Poisson process. The FPF for manifold-valued hidden states and diffusion observations has been introduced in [4]. A filter for a hidden state in $\mathds{R}^{n}$ and point process observations was introduced in [5], called EKSPF. While it is reminiscent of the FPF, having a gain $\times$ error structure, it uses a constant gain. As a result, the filter is exact only to first order and does not properly reflect higher-order statistics. For example, when particles are initially spread out and an incoming event confers evidence that the hidden state is in some narrow region of the state space, we should find the updated particles concentrated in that region. However, upon an event the EKSPF translates all particles by the same vector, see Figures 1-1.

The reliance on this uniform translation also leads to difficulties in extending the EKSPF to hidden states evolving on a manifold. In fact, when the EKSPF is applied naïvely on some arbitrary chart of the manifold, filtering performance can be poor (see Section IV for an example). This is because the meaning of a ‘translation’ is fundamentally ill-defined on a manifold. Since a translation in coordinate chart $A$ does not necessarily correspond to a translation in coordinate chart $B$ , the performance of the EKSPF depends on the choice of coordinates. However, the filtering problem on a manifold is intrinsic, i.e. independent of the choice of coordinates. It would therefore be desirable for a particle filter, and the transformation of particles in particular, to be defined in a coordinate-independent way. This would be advantageous even if the state space carries additional structure, such as the vector space structure on $\mathds{R}^{n}$ . A large class of estimation problems in $\mathds{R}^{n}$ , such as e.g. satellite tracking, are naturally described in curvilinear coordinates.

For infinitesimal motion of particles, the notion of constancy of a vector field111As we will explain in the next section, the control terms in the FPF can be viewed as vector fields, and thus of a constant gain approximation, depends on additional structure on the manifold, namely a connection; a mathematical structure that prescribes how to parallel transport a vector between different points. This can be visualised for the example of the unit circle $S^{1}$ that (regarded as a smooth manifold) can be embedded in different ways in, say, $\mathds{R}^{2}$ (see Figures 1-1). If the constancy of a tangent vector field is made to depend on the embedding, then we obtain different vector fields for different embeddings. On many manifolds, there are no nontrivial parallel vector fields, which precludes the choice of a nontrivial constant gain. While this problem also affects a constant gain approximation of the FPF gain, the problem can be circumvented by seeking a non-constant gain estimate. Meanwhile, the constant gain assumption is ‘baked’ into the EKSPF.

In this paper, we derive an exact FPF-like filter on a manifold for point process observations, called ppFPF, from first principles, addressing the limitations of a constant gain in the EKSPF. The result is a filter whose control terms are given by solutions of PDEs analogous to the Poisson equation for the gain of the FPF. However, the gain $\times$ error structure of the FPF is not strictly preserved. Instead, for the conceptual reasons stated above, the control term associated to an event is fundamentally distinct and treated separately from the term in-between events.

The remainder of the paper is structured as follows: in Section II, we introduce the mathematical notation, review the filtering problem for the Gaussian white noise observation case, and re-derive the FPF in the manifold setting, making some observations regarding the symmetry of the problem. In Section III, we present our main contribution: we derive the ppFPF, which is an adaptation of the FPF to point-process observations. In Section IV, we present numerical examples that illustrate the differences in performance and uncertainty quantification (UQ) between the ppFPF and other filters.

\floatsetup

[figure]style=plain, subcapbesideposition=top

II Preliminaries and background

II-A Notations and conventions

Tangent vectors at a point $p\in M$ are written in a local chart as $a^{i}\partial_{i}|_{p}$ , where Einstein’s summation convention is used. A vector field $X\in\text{Vect}(M)$ is a smooth section of the tangent bundle $TM$ and is written locally as a first-order differential operator $X^{i}\partial_{i}$ . The Lie derivative with respect to the vector field $V$ is denoted by $\mathscr{L}_{V}$ and acts on sections of tensor product bundles of $TM$ . If $\varphi\in C^{\infty}(M)$ , then its differential $d\varphi$ is a one-form or smooth section of the cotangent bundle $T^{*}M$ . More generally, a differential form of degree $k$ is a smooth section of $\Omega^{k}(M):=\bigwedge^{k}T^{*}M$ , where the wedge denotes the exterior product. Top degree forms are elements of $\Omega^{n}(M)$ , where $n$ is the dimension of $M$ . A nowhere-vanishing element of $\Omega^{n}(M)$ is an orientation; if such an element exists then $M$ is called orientable, and we can then distinguish positive top degree forms, which we call volume forms. Normalized volume forms will be used to describe smooth nowhere-vanishing distributions on $M$ . The letter $d$ is used for exterior derivatives on differential forms $\omega\in\Omega^{k}(M)$ as $d\omega$ , and for stochastic differentials on stochastic processes $X_{t}$ as $dX_{t}$ . The interior derivative on $\omega\in\Omega^{k}(M)$ wrt. $X\in\text{Vect}(M)$ is written as $i_{X}\omega$ . The notation $\mathscr{F}^{Y}_{t}$ is used for the filtration generated by the process $(Y_{t})_{t\geq 0}$ .

II-B Filtering problem and filtering equations

We consider a filtering problem in which the hidden state $X_{t}$ evolves as a Markov diffusion process on an $n$ -dimensional manifold222To avoid further complications, we assume $M$ to be connected and orientable. $M$ , described by a Stratonovich stochastic differential equation (SDE) of the form

[TABLE]

in local coordinates, where $B^{1},...,B^{r}$ are mutually independent standard Brownian motions.333We use Einstein’s summation convention. We will use the index-free notation $dX_{t}=V_{0}dt+V_{j}\circ dB^{j}_{t}$ for such an SDE on $M$ . This SDE corresponds to an infinitesimal generator

[TABLE]

where $V_{0},V_{1},...,V_{r}$ are vector fields on $M$ . This is a second-order differential operator, which can be expressed in local coordinates as $\mathscr{A}=V_{0}^{i}\partial_{i}+\tfrac{1}{2}\sum_{k=1}^{r}V_{k}^{i}\partial_{i}V_{k}^{j}\partial_{j}$ .

The classical observation model in nonlinear filtering is a diffusion process with additive noise, also referred to as observations in Gaussian white noise, i.e.

[TABLE]

where $W_{t}$ is a Brownian motion independent of $X_{t}$ . Although the present paper is concerned with point process observations, in order to explain the background of this paper this section will focus exclusively on the model in Eq. (3). Later, in Section III, we shall consider point process observations, adapting an approach that has been used in the case of Gaussian white noise.

Probability distributions over the manifold $M$ will be described by positive top-degree forms $\mu$ (volume forms) that integrate to one, i.e. $\int\displaylimits_{M}\mu=1$ . This convention avoids the superfluous appearance of a reference measure on $M$ , and therefore emphasizes the metric-independent nature of the filtering problem. Of course, for concreteness, it is always possible to pick a reference volume form $\lambda$ (for example, take the riemannian volume measure with respect to some riemannian metric on $M$ , e.g. the Lebesgue measure for $M=\mathds{R}^{n}$ ), and then to express $\mu$ in terms of a density $p$ as $\mu=p\lambda$ .

If the distribution of $X_{0}$ is described in terms of a volume form $\mu_{0}$ , the conditional distribution $\mu_{t}$ of $X_{t}$ , given observations $\mathscr{F}^{Y}_{t}$ , evolves according to the equation

[TABLE]

where $\hat{h}_{t}=\int\displaylimits_{M}h\mu_{t}$ and $\mathscr{A}^{{\dagger}}$ is the adjoint of $\mathscr{A}$ with respect to the dual pairing $\langle\mu,\varphi\rangle$ of volume forms and smooth functions, i.e. for all bounded $\varphi\in C^{\infty}(M)$ and all volume forms $\mu$ we have

[TABLE]

Eq. (4) is known as the Kushner-Stratonovich equation, see e.g. [6].

II-C Unweighted particle filters

In unweighted particle filtering, the goal is to find a Monte-Carlo approximation of $\mu_{t}$ , i.e. for any $N=1,2,...$ , the objective is to find processes $S^{(i)}_{t}$ , $i=1,...,N$ , called particles such that $\mu_{t}\approx\frac{1}{N}\sum_{i=1}^{N}\delta_{S^{(i)}_{t}}$ . The processes $S^{(i)}_{t}$ should be adapted to $\mathscr{F}^{N,Z}_{t}$ , where $Z$ is a vector-valued process independent of $X$ and $N$ that can capture additional noise in the particle dynamics. Usually, one is interested in ‘symmetric’ particle representations in which all $S^{(i)}_{t}$ have identical distributions. The problem thus is to specify dynamics for a representative process $S_{t}$ that depend on the particle ensemble.

II-D Feedback particle filter

For Gaussian white noise observations, a recipe for building such a particle filter is known. Let us briefly review the derivation of the feedback particle filter (FPF) [2] (see [4] for the manifold setting). The FPF uses particle dynamics given by the prior dynamics plus a feedback control term $dU_{t}$ that is chosen such that the Fokker-Planck equation for a single particle gives the same change in distribution as the filtering equation. An ansatz of $dU_{t}=K_{t}\circ dY_{t}+\Omega_{t}dt$ gives

[TABLE]

where $Z_{t}^{j}$ is an independent copy of $B_{t}^{j}$ . A corresponding equation for the conditional distribution of $S_{t}$ given $\mathscr{F}^{Y}_{t}$ , denoted by $\bar{\mu}_{t}$ , can be derived by an integration-by-parts argument using Lie derivatives:

[TABLE]

In the first line, the Stratonovich chain rule is used. In the second line, directional derivatives are replaced by Lie derivatives444On smooth functions, the Lie derivative agrees with the directional derivative, i.e. $\mathscr{L}_{X}\varphi=X\varphi=d\varphi(X)$ for all $\varphi\in C^{\infty}(M),X\in\text{Vect}(M)$ ., and we performed integration by parts, reducing exact top-degree forms to boundary terms using Stokes’ theorem. It is customary to demand that $K,\Omega$ be tangent to the boundary of $M$ (if $\partial M$ is nonempty), or even completely vanish on $\partial M$ . This assumption implies $i_{K}\mu=0$ on $\partial M$ , such that the boundary terms can be discarded. After switching back to Itô calculus, one obtains

[TABLE]

Matching the terms of Eq. (8) with Eq. (4) (conditioned on $\bar{\mu}_{t}=\mu_{t}$ ) leads to the system of equations555 $K_{t}h=dh(K_{t})=i_{K_{t}}dh$ denotes the directional derivative of $h$ in the direction of the vector field $K_{t}$ , whereas $hK_{t}$ is the vector field $K_{t}$ scaled point-wise by the function $h$ .

[TABLE]

Given a vector field $K_{t}$ solving Eq. (9), called a gain for the FPF, setting

[TABLE]

gives an associated solution to Eq. (10).666This can be shown by using Cartan’s magic formula and the graded product rule for the interior derivative, or simply by observing that $\mathscr{L}_{\varphi X}\mu=\varphi\mathscr{L}_{X}\mu+(X\varphi)\mu$ for all $\varphi\in C^{\infty}(M),X\in\text{Vect}(M)$ , and $\mu\in\Omega^{n}(M)$ .

II-E Uniqueness, approximation, and estimation of the gain

The solutions of Eqs. (9) and (10) are not unique, as any pair $(K_{t},\Omega_{t})$ of solutions can be modified by adding an arbitrary divergence-free777The divergence of a vector field $V$ with respect to a volume form $\mu$ is the function $\text{div}_{\mu}V$ defined implicitly by $\mathscr{L}_{V}\mu=(\text{div}_{\mu}V)\mu$ . Using Cartan’s magic formula and the fact that $d\mu=0$ , the divergence can also be written as $\text{div}_{\mu}V=\frac{di_{V}\mu}{\mu}$ . It follows that for $f>0$ we have $f\text{div}_{f\mu}V=\text{div}_{\mu}(fV)=df(V)+f\text{div}_{\mu}V$ . vector field $V$ , i.e. such that $\mathscr{L}_{V}\mu_{t}=0$ . Uniqueness can be obtained by fixing a riemannian metric $g$ , and then demanding that the gain take the form $K_{t}=\operatorname{\text{grad}}\phi_{t}$ . This leads to the equation $\mathscr{L}_{\operatorname{\text{grad}}\phi_{t}}\mu_{t}=-(h-\hat{h}_{t})\mu_{t}$ . Moreover, if $\text{vol}_{g}$ denotes the riemannian volume form and $\mu_{t}$ is expressed in terms of the density $p_{t}$ as $\mu_{t}=p_{t}\text{vol}_{g}$ , Eq. (9) reduces to a (weighted) Poisson equation

[TABLE]

Existence and uniqueness of a solution is guaranteed under mild assumptions on $p_{t}$ and $h$ (see [7], Theorem 2.2), and $K_{t}=\operatorname{\text{grad}}\phi_{t}$ minimizes the functional $K\mapsto\int\displaylimits_{M}g(K,K)\mu$ among all solutions of Eq. (9) (see Lemma 8.4.2 in [8]). In the case $M=\mathds{R}^{n}$ , Euclidean $g$ , Gaussian $p_{t}$ , and linear $h$ , this gain reduces to the Kalman gain.

Sometimes it is desirable to approximate the vector field $K_{t}=\operatorname{\text{grad}}\phi_{t}$ , where $\phi_{t}$ solves Eq. (12), by a constant. As mentioned in the introduction, in order to define the notion of constancy on a manifold, an additional structure $\nabla$ , called connection, has to be defined. One may choose the Levi-Civita connection corresponding to some (already given) $g$ , but other choices are possible. A constant gain $K^{\text{CG}}$ can then be defined as the minimum of $\left\|K-\operatorname{\text{grad}}\phi\right\|^{2}$ over all parallel $K$ (i.e. $\nabla K=0$ ). For example, when $M=\mathds{R}^{n}$ , g is the Euclidean metric, and $\nabla$ its Levi-Civita connection,

[TABLE]

The right-hand representation is obtained by multiplying the Eq. (12) by $x$ , integrating by parts, and using $\operatorname{\text{grad}}x_{i}=\partial_{x_{i}}$ . Eq. (13) is convenient because the RHS can be estimated by a sample, but on some manifolds, topological obstructions make this approach infeasible. On $S^{1}$ with the standard metric and connection, a constant vector field cannot be a gradient of a smooth function. Insisting and performing the calculation on a chart leads to $K^{\text{CG}}=\int_{0}^{2\pi}\theta(h(\theta)-\hat{h})p(\theta)d\theta+2\pi K(0)p(0)$ . It is unclear how to estimate the additional term that depends on the exact gain. In other cases the situation is still worse: many manifolds with connection do not have any nontrivial parallel vector fields (a common example is $S^{2}$ with its standard connection).

In practise, the gain $K_{t}=\operatorname{\text{grad}}\phi_{t}$ has to be estimated from a finite number of particles $S_{t}^{(i)}\in M$ , $i=1,..,N$ , thought to be i.i.d. samples from $\mu_{t}$ . If only the gain at the particle locations is needed, we denote the mapping particles $\to$ gains by $\mathbf{K}_{t}=\mathfrak{G}(\mathbf{S}_{t},h)$ , where $\mathbf{K}_{t}=((K_{t})_{S_{t}^{(i)}})_{i=1}^{N}$ and $\mathbf{S}_{t}=(S_{t}^{(i)})_{i=1}^{N}$ . This is called the gain estimation problem. For the purposes of this article, the question of how to optimally estimate the gain shall be left aside and we refer to e.g. [9, 10, 11] and the references therein. The aim is to show that the construction of an FPF-like algorithm for point processes can be fully reduced to the same types of equations as for the FPF gain, i.e. to equations of the following form:

Definition II.1

For every positive volume form $\mu$ with $\int\displaylimits_{M}\mu=1$ and every smooth function $\varphi$ we denote by $\mathcal{E}(\mu,\varphi)$ the equation

[TABLE]

whose unknown quantity is the vector field $V$ .

III FPF for point process observations

Now, we consider the case where the hidden state $X_{t}$ is a diffusion on a manifold as in Section II, but the observation process is now a counting process888By convention, $N_{t}$ is right-continuous with left limits (càdlàg). $N_{t}$ , counting the number of events since time $t=0$ , with intensity function $h(X_{t})$ , where $h:M\to(0,\infty)$ is called the observation function. Here, the observations are corrupted by Poisson noise.

An equation for the optimal filter is known also in this setting. If the distribution of $X_{0}$ is described in terms of a volume form $\mu_{0}$ , the conditional distribution $\mu_{t}$ of $X_{t}$ given observations $\mathscr{F}^{N}_{t}$ evolves according to the equation

[TABLE]

where $t^{-}$ denotes left limits. Eq. (15) will be referred to as the filtering equation for point process observations. It is sometimes called Kushner-Stratonovich-Poisson equation (see [5] for further references).

The goal of the present section is to carry out the derivation of an FPF for point process observations. We will call the resulting filter feedback particle filter for point process observations, or ppFPF for short.

In the following two subsections, we will separately derive the drift and the jump terms of the particle dynamics. The separation of these two aspects is necessary because the drift term is infinitesimal, i.e. a vector field, whereas the event term is an instantaneous transformation of the particles from the prior to the posterior. Since a vector field (infinitesimal) and a finite transformation cannot be easily mixed, the ppFPF lacks the gain $\times$ error structure of the FPF, with a common prefactor. This will be shown below.

III-A Derivation of the drift term

We first consider the terms proportional to $dt$ in Eq. (15), describing the evolution of the conditional distribution in-between events, and make the following ansatz for the particle dynamics:

[TABLE]

Since the modification is deterministic, the corresponding equation for the conditional distribution of $S_{t}$ given $\mathscr{F}^{Y}_{t}$ simply reads

[TABLE]

Matching this to Eq. (15) (again, setting $\bar{\mu}_{t}=\mu_{t}$ ) yields the relation

[TABLE]

which is $\mathcal{E}(\mu_{t},-h)$ , up to a sign the same as Eq. (9) for the gain of the FPF. Thus, up to divergence-free terms, the drift of the ppFPF is identical to the negative gain of the corresponding FPF (i.e. with the same $h$ ).

III-B Derivation of the jump term

Upon an event, Eq. (15) prescribes a change of the conditional distribution as follows:

[TABLE]

i.e. the distribution is multiplied by the observation function and subsequently renormalized. This requires a corresponding instantaneous change of the particle positions, i.e. $S_{t^{-}}\mapsto S_{t}=T_{t^{-}}(S_{t^{-}})$ , where $T_{t}:M\to M$ satisfies the constraint

[TABLE]

where ∗ denotes the pushforward. In rare cases, such as for gaussian $p$ and exponential $h$ , this functional equation has exact closed-form solutions. In the absence of an exact solution, a solution $T_{t^{-}}$ to Eq. (20) can be approximated by an iterative procedure, also used in [12, 13], by an adaptation of Moser’s classical result [14]. The idea is to define an interpolation999The chosen interpolation is sometimes called log-homotopy and has the virtue of producing a PDE analogous to the one for the drift term. Other smooth interpolations can be used as needed. $\tilde{\mu}_{t,s}$ of $\mu_{t}=\mu_{t^{-}}$ and $\tfrac{h}{\hat{h}_{t^{-}}}\mu_{t^{-}}$ :

[TABLE]

We then match this flow of probability distributions with a flow of particles, i.e. the flow of an $s$ -dependent vector field $V_{t,s}$ satisfying

[TABLE]

which is equation $\mathcal{E}(\mu_{t,s},\log h)$ in Definition II.1. This procedure results in Algorithm 1.

III-C Exactness of the particle filter

Thus, the ppFPF is defined in terms of the following dynamics, yielding a càdlàg process:

[TABLE]

where $\Omega_{t}$ is a vector field that solves Eq. (18) and $T_{t^{-}}$ is the diffeomorphism constructed in Section III-B. The PDEs to be solved for both steps are of the forms $\mathcal{E}(\mu,-h)$ and $\mathcal{E}(\mu,\log h)$ , and are therefore analogous to the PDE for the gain of the FPF. As a result, all considerations in Section II-E apply to the ppFPF. By construction, the ppFPF has the following property of being exact:

Theorem III.1

Let $\mu_{t}$ denote the conditional distribution of $X_{t}$ given $\mathscr{F}^{N}_{t}$ . Under assumption A, if the distribution of $S_{0}$ coincides with $\mu_{0}$ , and if the process $(S_{t})_{t\geq 0}$ is defined according to Eqs. (23)-(24), then the conditional distribution of $S_{t}$ given $\mathscr{F}^{N}_{t}$ coincides with $\mu_{t}$ for all $t\geq 0$ .

The full algorithm 2 additionally requires the choice of a specific gain estimation algorithm.

\floatsetup

[figure]style=plain, subcapbesideposition=top

\floatsetup

[figure]style=plain, subcapbesideposition=top

IV Numerical results

Simulations were conducted in order to study the performance (in terms of mean-squared error) and UQ (in terms of posterior variance) of the ppFPF in comparison to other well-known approximate filters for a filtering problem on $M=\mathds{R}$ (see Fig. 2) as well as $M=S^{1}$ (Fig. 3). The ppFPF was implemented with the differential loss reproducing kernel Hilbert space method from [16] (see figure captions for parameters). The bootstrap particle filter (BPF) was resampled when $N_{\text{eff}}/N$ dropped below 1/2, where $N_{\text{eff}}=1/\sum_{i=1}^{N}(w^{(i)})^{2}$ . For $M=S^{1}$ , the EKSPF was naïvely101010We emphasize that the EKSPF was not intented/designed to be used in this way. This example only serves to illustrate that a naïve application can lead to poor performance, which is to be expected due to the conceptual reasons outlined in the introduction. applied to the chart on the interval $[0,2\pi)$ .

V Conclusions

In this brief article, we reviewed the problem of designing unweighted particle filters for a manifold-valued hidden process observed in Poisson noise. We provided conceptual arguments as well as numerical illustrations that the existing approach from [5] (EKSPF) is limited by an intrinsic constant gain approximation, which compromises higher-order statistics as well as the ability to be extended to manifolds. We then derived an asymptotically exact unweighted particle filter, called ppFPF, by matching the particle forward equation with the equation for the optimal filter. This approach starts from first principles and is analogous to the derivation of the FPF. The resulting filter does not have the gain $\times$ error structure of the FPF, but can otherwise be reduced to partial differential equations that are completely analogous to the ones in the FPF. This makes it possible to leverage existing and future approaches to gain estimation in the FPF. As an unweighted filter, the ppFPF is expected to scale to high-dimensional problems [3].

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. Yang, P. G. Mehta, and S. P. Meyn, “A mean-field control-oriented approach to particle filtering,” in Proceedings of the 2011 American Control Conference . IEEE, 2011, pp. 2037–2043.
2[2] ——, “Feedback Particle Filter,” IEEE Transactions on Automatic Control , vol. 58, no. 10, pp. 2465–2480, 2013.
3[3] S. C. Surace, A. Kutschireiter, and J.-P. Pfister, “How to Avoid the Curse of Dimensionality: Scalability of Particle Filters with and without Importance Weights,” SIAM Review , vol. 61, no. 1, pp. 79–91, 2019.
4[4] C. Zhang, A. Taghvaei, and P. G. Mehta, “Feedback particle filter on riemannian manifolds and matrix lie groups,” IEEE Transactions on Automatic Control , vol. 63, no. 8, pp. 2465–2480, 2018.
5[5] M. Venugopal, R. M. Vasu, and D. Roy, “An Ensemble Kushner-Stratonovich-Poisson Filter for Recursive Estimation in Nonlinear Dynamical Systems,” IEEE Transactions on Automatic Control , vol. 61, no. 3, pp. 823–828, 2016.
6[6] A. Bain and D. Crisan, Fundamentals of Stochastic Filtering , ser. Stochastic Modelling and Applied Probability. New York, NY: Springer New York, 2009, vol. 60.
7[7] R. S. Laugesen, P. G. Mehta, S. P. Meyn, and M. Raginsky, “Poisson’s Equation in Nonlinear Filtering,” SIAM Journal on Control and Optimization , vol. 53, no. 1, pp. 501–525, 2015.
8[8] L. Ambrosio, N. Gigli, and G. Savaré, Gradient Flows , 2nd ed. Basel: Birkhäuser Basel, 2008.