Online Sequential Monte Carlo smoother for partially observed stochastic   differential equations

Pierre Gloaguen (MIA-Paris); Marie-Pierre Etienne (MIA-Paris); Sylvain; Le Corff

arXiv:1703.01776·stat.ME·March 14, 2018

Online Sequential Monte Carlo smoother for partially observed stochastic differential equations

Pierre Gloaguen (MIA-Paris), Marie-Pierre Etienne (MIA-Paris), Sylvain, Le Corff

PDF

Open Access

TL;DR

This paper presents an online Monte Carlo smoothing algorithm for partially observed stochastic differential equations, using unbiased estimators to handle unknown transition densities, enabling real-time data processing with linear complexity.

Contribution

Introduces a novel online smoothing algorithm for SDEs that employs unbiased estimators to manage unknown transition densities, extending previous methods.

Findings

01

Algorithm is consistent and effective.

02

Performance demonstrated on two models.

03

Computational complexity grows linearly with samples.

Abstract

This paper introduces a new algorithm to approximate smoothed additive functionals for partially observed stochastic differential equations. This method relies on a recent procedure which allows to compute such approximations online, i.e. as the observations are received, and with a computational complexity growing linearly with the number of Monte Carlo samples. This online smoother cannot be used directly in the case of partially observed stochastic differential equations since the transition density of the latent data is usually unknown. We prove that a similar algorithm may still be defined for partially observed continuous processes by replacing this unknown quantity by an unbiased estimator obtained for instance using general Poisson estimators. We prove that this estimator is consistent and its performance are illustrated using data from two models.

Figures16

Click any figure to enlarge with its caption.

Equations122

X_{0} = x_{0} \mbox an d d X_{t} = α (X_{t}) d t + d W_{t},

X_{0} = x_{0} \mbox an d d X_{t} = α (X_{t}) d t + d W_{t},

ϕ_{k : k^{'} ∣ n} [h] = E [h (X_{k}, \dots, X_{k^{'}}) ∣ Y_{0 : n}] .

ϕ_{k : k^{'} ∣ n} [h] = E [h (X_{k}, \dots, X_{k^{'}}) ∣ Y_{0 : n}] .

ϕ_{0 : n ∣ n} [H_{n}] = E [H_{n} (X_{0 : n}) ∣ Y_{0 : n}] where H_{n} = k = 0 \sum n - 1 h_{k} (X_{k}, X_{k + 1}),

ϕ_{0 : n ∣ n} [H_{n}] = E [H_{n} (X_{0 : n}) ∣ Y_{0 : n}] where H_{n} = k = 0 \sum n - 1 h_{k} (X_{k}, X_{k + 1}),

ϕ_{0 : n ∣ n} [h] = ϕ_{n} [T_{n} [h]], \mbox w h er e T_{n} [h] (X_{n}) = E [h (X_{0 : n}) ∣ X_{n}, Y_{0 : n}] .

ϕ_{0 : n ∣ n} [h] = ϕ_{n} [T_{n} [h]], \mbox w h er e T_{n} [h] (X_{n}) = E [h (X_{0 : n}) ∣ X_{n}, Y_{0 : n}] .

ϕ_{0}^{N} [h] = \frac{1}{Ω _{0}^{N}} ℓ = 1 \sum N ω_{0}^{ℓ} h (ξ_{0}^{ℓ}), Ω_{0}^{N} := ℓ = 1 \sum N ω_{0}^{ℓ} .

ϕ_{0}^{N} [h] = \frac{1}{Ω _{0}^{N}} ℓ = 1 \sum N ω_{0}^{ℓ} h (ξ_{0}^{ℓ}), Ω_{0}^{N} := ℓ = 1 \sum N ω_{0}^{ℓ} .

ω_{k}^{ℓ} := \frac{q _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} ) g _{k} ( ξ _{k}^{ℓ} )}{ϑ _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} ) p _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} )} .

ω_{k}^{ℓ} := \frac{q _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} ) g _{k} ( ξ _{k}^{ℓ} )}{ϑ _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} ) p _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} )} .

ϕ_{k}^{N} [h] := \frac{1}{Ω _{k}^{N}} ℓ = 1 \sum N ω_{k}^{ℓ} h (ξ_{k}^{ℓ}), Ω_{k}^{N} := ℓ = 1 \sum N ω_{k}^{ℓ} .

ϕ_{k}^{N} [h] := \frac{1}{Ω _{k}^{N}} ℓ = 1 \sum N ω_{k}^{ℓ} h (ξ_{k}^{ℓ}), Ω_{k}^{N} := ℓ = 1 \sum N ω_{k}^{ℓ} .

T_{k} [H_{k}] (X_{k})

T_{k} [H_{k}] (X_{k})

= \frac{\int ϕ _{k - 1} ( d x _{k - 1} ) q _{k - 1} ( x _{k - 1} , X _{k} ) { T _{k - 1} [ H _{k - 1} ] ( x _{k - 1} ) + h _{k - 1} ( x _{k - 1} , X _{k} ) }}{\int ϕ _{k - 1} ( d x _{k - 1} ) q _{k - 1} ( x _{k - 1} , X _{k} )} .

T_{k}^{N} [H_{k}] (ξ_{k}^{i}) = j = 1 \sum N Λ_{k - 1}^{N} (i, j) {T_{k - 1} [H_{k - 1}] (ξ_{k - 1}^{j}) + h_{k - 1} (ξ_{k - 1}^{j}, ξ_{k}^{i})},

T_{k}^{N} [H_{k}] (ξ_{k}^{i}) = j = 1 \sum N Λ_{k - 1}^{N} (i, j) {T_{k - 1} [H_{k - 1}] (ξ_{k - 1}^{j}) + h_{k - 1} (ξ_{k - 1}^{j}, ξ_{k}^{i})},

Λ_{k}^{N} (i, ℓ) = \frac{ω _{k}^{ℓ} q _{k} ( ξ _{k}^{ℓ} , ξ _{k + 1}^{i} )}{\sum _{ℓ = 1}^{N} ω _{k}^{ℓ} q _{k} ( ξ _{k}^{ℓ} , ξ _{k + 1}^{i} )}, 1 \leq ℓ \leq N .

Λ_{k}^{N} (i, ℓ) = \frac{ω _{k}^{ℓ} q _{k} ( ξ _{k}^{ℓ} , ξ _{k + 1}^{i} )}{\sum _{ℓ = 1}^{N} ω _{k}^{ℓ} q _{k} ( ξ _{k}^{ℓ} , ξ _{k + 1}^{i} )}, 1 \leq ℓ \leq N .

τ_{k + 1}^{i} := \frac{1}{N} ℓ = 1 \sum N {τ_{k}^{J_{k}^{i, ℓ}} + h_{k} (ξ_{k}^{J_{k}^{i, ℓ}}, ξ_{k + 1}^{i})} .

τ_{k + 1}^{i} := \frac{1}{N} ℓ = 1 \sum N {τ_{k}^{J_{k}^{i, ℓ}} + h_{k} (ξ_{k}^{J_{k}^{i, ℓ}}, ξ_{k + 1}^{i})} .

ϕ_{0 : n ∣ n}^{N} [τ_{n}] = \frac{1}{Ω _{n}^{N}} i = 1 \sum N ω_{n}^{i} τ_{n}^{i} .

ϕ_{0 : n ∣ n}^{N} [τ_{n}] = \frac{1}{Ω _{n}^{N}} i = 1 \sum N ω_{n}^{i} τ_{n}^{i} .

q_{k} (ξ_{k}^{ℓ}, ξ_{k + 1}^{i}; ζ_{k}) > 0 \leavevmode \leavevmode a.s \mbox an d E [q_{k} (ξ_{k}^{ℓ}, ξ_{k + 1}^{i}; ζ_{k}) G_{k + 1}^{N}] = q_{k} (ξ_{k}^{ℓ}, ξ_{k + 1}^{i}),

q_{k} (ξ_{k}^{ℓ}, ξ_{k + 1}^{i}; ζ_{k}) > 0 \leavevmode \leavevmode a.s \mbox an d E [q_{k} (ξ_{k}^{ℓ}, ξ_{k + 1}^{i}; ζ_{k}) G_{k + 1}^{N}] = q_{k} (ξ_{k}^{ℓ}, ξ_{k + 1}^{i}),

F_{k}^{N}

F_{k}^{N}

G_{k + 1}^{N}

ω_{k}^{ℓ} := \frac{q _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} ; ζ _{k} ) g _{k} ( ξ _{k}^{ℓ} )}{ϑ _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} ) p _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} )} .

ω_{k}^{ℓ} := \frac{q _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} ; ζ _{k} ) g _{k} ( ξ _{k}^{ℓ} )}{ϑ _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} ) p _{k} ( ξ _{k - 1}^{I_{k}^{ℓ}} , ξ _{k}^{ℓ} )} .

sup_{x, y, ζ} q_{k} (x, y; ζ) \leq \overset{σ}{^}_{+}^{k} .

sup_{x, y, ζ} q_{k} (x, y; ζ) \leq \overset{σ}{^}_{+}^{k} .

sup_{j, y, ζ} q_{k} (ξ_{k}^{j}, y, ζ) \leq \overset{σ}{^}_{+}^{k} .

sup_{j, y, ζ} q_{k} (ξ_{k}^{j}, y, ζ) \leq \overset{σ}{^}_{+}^{k} .

sup_{i, j, ζ} q_{k} (ξ_{k}^{j}, ξ_{k + 1}^{j}, ζ) \leq \overset{σ}{^}_{+}^{k} .

sup_{i, j, ζ} q_{k} (ξ_{k}^{j}, ξ_{k + 1}^{j}, ζ) \leq \overset{σ}{^}_{+}^{k} .

q_{k} (x, y) = φ_{Δ_{k}} (x, y) exp {A (y) - A (x)} E_{W^{x, y, Δ_{k}}} [exp {- \int_{0}^{Δ_{k}} ϕ (w_{s}) d s}],

q_{k} (x, y) = φ_{Δ_{k}} (x, y) exp {A (y) - A (x)} E_{W^{x, y, Δ_{k}}} [exp {- \int_{0}^{Δ_{k}} ϕ (w_{s}) d s}],

ϕ (x) = (∥ α (x) ∥^{2} + △ A (x)) /2,

ϕ (x) = (∥ α (x) ∥^{2} + △ A (x)) /2,

q_{k} (x, y; ζ_{k}) = φ_{Δ_{k}} (x, y) exp {A (y) - A (x)} \times exp {- U_{w} Δ} \frac{Δ _{k}^{κ}}{μ ( κ ) κ !} j = 1 \prod κ (U_{w} - ϕ (w_{U_{j}})) .

q_{k} (x, y; ζ_{k}) = φ_{Δ_{k}} (x, y) exp {A (y) - A (x)} \times exp {- U_{w} Δ} \frac{Δ _{k}^{κ}}{μ ( κ ) κ !} j = 1 \prod κ (U_{w} - ϕ (w_{U_{j}})) .

q_{k} (x, y; ζ_{k}) = φ_{Δ_{k}} (x, y) exp {A (y) - A (x) - L_{w} Δ_{k}} j = 1 \prod κ \frac{U _{w} - ϕ ( w _{U_{j}} )}{U _{w} - L _{w}} .

q_{k} (x, y; ζ_{k}) = φ_{Δ_{k}} (x, y) exp {A (y) - A (x) - L_{w} Δ_{k}} j = 1 \prod κ \frac{U _{w} - ϕ ( w _{U_{j}} )}{U _{w} - L _{w}} .

ρ_{Δ_{k}} : R^{d} \times R^{d}

ρ_{Δ_{k}} : R^{d} \times R^{d}

(x, y)

ω_{0} (x) = \frac{χ ( x ) g _{0} ( x )}{η _{0} ( x )} \mbox an df or k \geq 1 ω_{k} (x, x^{'}; z) = \frac{q _{k} ( x , x ^{'} ; z ) g _{k + 1} ( x ^{'} )}{ϑ _{k + 1} ( x ) p _{k} ( x , x ^{'} )} .

ω_{0} (x) = \frac{χ ( x ) g _{0} ( x )}{η _{0} ( x )} \mbox an df or k \geq 1 ω_{k} (x, x^{'}; z) = \frac{q _{k} ( x , x ^{'} ; z ) g _{k + 1} ( x ^{'} )}{ϑ _{k + 1} ( x ) p _{k} ( x , x ^{'} )} .

E [ω_{k + 1}^{1} τ_{k + 1}^{1} F_{k}^{N}] = (ϕ_{k}^{N} [ϑ_{k + 1}])^{- 1} ϕ_{k}^{N} [\int q_{k} (\cdot, x) g_{k + 1} (x) {τ_{k} (\cdot) + h_{k + 1} (\cdot, x)} d x] .

E [ω_{k + 1}^{1} τ_{k + 1}^{1} F_{k}^{N}] = (ϕ_{k}^{N} [ϑ_{k + 1}])^{- 1} ϕ_{k}^{N} [\int q_{k} (\cdot, x) g_{k + 1} (x) {τ_{k} (\cdot) + h_{k + 1} (\cdot, x)} d x] .

P (ϕ_{k}^{N} [τ_{k}] - ϕ_{k} [T_{k} h_{k}] \geq ε) \leq b_{k} exp (- c_{k} N ε^{2}) .

P (ϕ_{k}^{N} [τ_{k}] - ϕ_{k} [T_{k} h_{k}] \geq ε) \leq b_{k} exp (- c_{k} N ε^{2}) .

p_{k} (x_{k - 1}, x_{k}) \propto \tilde{q}_{k} (x_{k - 1}, x_{k}) g_{k} (x_{k}),

p_{k} (x_{k - 1}, x_{k}) \propto \tilde{q}_{k} (x_{k - 1}, x_{k}) g_{k} (x_{k}),

d X_{t} = sin (X_{t} - θ) d t + d W_{t}, \leavevmode \leavevmode X_{0} = x_{0} .

d X_{t} = sin (X_{t} - θ) d t + d W_{t}, \leavevmode \leavevmode X_{0} = x_{0} .

Y_{k} = X_{k} + ε_{k},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Target Tracking and Data Fusion in Sensor Networks · Statistical Methods and Inference

Full text

Online Sequential Monte Carlo smoother for partially observed stochastic differential equations

Pierre Gloaguen111AgroParistech, UMR MIA 518, F-75231 Paris, France.

Marie-Pierre Etienne11footnotemark: 1

Sylvain Le Corff222Laboratoire de Mathématiques d’Orsay, Univ. Paris-Sud, CNRS, Université Paris-Saclay.

Abstract

This paper introduces a new algorithm to approximate smoothed additive functionals for partially observed stochastic differential equations. This method relies on the recent procedure introduced in [24] which allows to compute such approximations online, i.e. as the observations are received, and with a computational complexity growing linearly with the number of Monte Carlo samples. The algorithm of [24] cannot be used in the case of partially observed stochastic differential equations since the transition density of the latent data is usually unknown. We prove that a similar algorithm may still be defined for partially observed continuous processes by replacing this unknown quantity by an unbiased estimator obtained for instance using general Poisson estimators. We prove that this estimator is consistent and its performance are illustrated using data from two models.

Keywords: Stochastic differential equations, Smoothing, Sequential Monte Carlo Methods.

1 Introduction

This paper introduces a new algorithm to solve the smoothing problem for partially observed continuous time stochastic processes. In this setting, the hidden state process $(X_{t})_{t\geq 0}$ is assumed to be a solution to a stochastic differential equation (SDE) and the only information available is given by noisy observations $(Y_{k})_{0\leq k\leq n}$ of the states $(X_{k})_{0\leq k\leq n}$ at some discrete time points $(t_{k})_{0\leq k\leq n}$ . The bivariate stochastic process $\{(X_{k},Y_{k})\}_{0\leq k\leq n}$ is a state space model such that conditional on the state sequence $(X_{k})_{0\leq k\leq n}$ the observations $(Y_{k})_{0\leq k\leq n}$ are independent and for all $0\leq\ell\leq n$ the conditional distribution of $Y_{\ell}$ given $\{X_{k}\}_{0\leq k\leq n}$ depends on $X_{\ell}$ only.

Statistical inference for partially observed state sequences often requires to solve bayesian filtering and smoothing problems, i.e. the computation of the posterior distributions of sequences of hidden states given observations. The filtering problem refers to the estimation, for each $0\leq k\leq n$ , of the distributions of the hidden state $X_{k}$ given the observations $(Y_{0},\ldots,Y_{k})$ . Smoothing stands for the estimation of the distributions of the sequence of states $(X_{k},\ldots,X_{p})$ given observations $(Y_{0},\ldots,Y_{\ell})$ with $0\leq k\leq p\leq\ell\leq n$ . These posterior distributions are crucial to compute maximum likelihood estimators of unknown parameters using the observations $(Y_{0},\ldots,Y_{n})$ only. For instance, the E-step of the EM algorithm introduced in [7] boils down to the computation of a conditional expectation of an additive functional of the hidden states given all the observations up to time $n$ . Similarly, by Fisher’s identity, recursive maximum likelihood estimates may be computed using the gradient of the loglikelihood which can be written as the conditional expectation of an additive functional of the hidden states. See [5, Chapter $10$ and $11$ ], [15, 19, 20, 27] for further references on the use of these smoothed expectations of additive functionals applied to maximum likelihood parameter inference in latent data models.

The exact computation of these expectations is usually not possible in the case of partially observed diffusions. In this paper, we propose to use Sequential Monte Carlo (SMC) methods to approximate smoothing distributions with random particles associated with importance weights. [13, 18] introduced the first particle filters and smoothers for state space models by combining importance sampling steps to propagate particles with resampling steps to duplicate or discard particles according to their importance weights. Unfortunately, these methods cannot be applied directly to partially observed stochastic differential equations since some elementary quantities, such as transition densities of the hidden states, are not available explicitly. Discretization procedures may be used to approximate transition densities, for instance the Euler-Maruyama method, the Ozaki discretization which proposes a linear approximation of the drift coefficient between two observations [25, 28], or Gaussian based approximations using Taylor expansions of the posterior mean and variance of an observation given the observation at the previous time step, [16, 17, 29]. Other approaches based on Hermite polynomials expansion were also introduced by [1, 2, 3] and extended in several directions recently, see [21] and all the references on the approximation of transition densities therein. However, even the most recent discretization based approximations of the transition densities induce a systematic bias of particle based approximations of posterior distributions, see for instance [6]. To overcome this difficulty, [11] proposed to solve the filtering problem by combining SMC methods with an unbiased estimate of the transition densities based on the generalized Poisson estimator (GPE). In this case, only the Monte Carlo error has to be controlled as there is no Taylor expansion to approximate unknown transition densities.

The only solution to solve the smoothing problem for partially observed SDE using SMC methods has been proposed in [23] and extends the fixed-lag smoother of [22]. Using forgetting properties of the hidden chain, the algorithm improves the performance of [11] to approximate smoothing distributions but at the cost of a bias that does not vanish as the number of particles grows to infinity. In the case of discrete time state space models, approximations of the smoothing distributions may also be obtained using the Forward Filtering Backward Smoothing algorithm (FFBS) and the Forward Filtering Backward Simulation algorithm (FFBSi) developed respectively in [18, 14, 9] and [12]. Both algorithms require first a forward pass which produces a set of particles and weights approximating the sequence of filtering distributions up to time $n$ . Then, a backward pass is performed to compute new weights (FFBS) or sample trajectories (FFBSi) in order to approximate the smoothing distributions. Recently, [24] proposed a new SMC algorithm, the particle-based rapid incremental smoother (PaRIS), to approximate on-the-fly (i.e. using the observations as they are received) smoothed expectations of additive functionals. Unlike the FFBS algorithm, the complexity of this algorithm grows only linearly with the number of particles $N$ and contrary to the FFBSi algorithm, no backward pass is required.

In this paper, we extend the use of PaRIS algorithm to partially observed SDE. The proposed algorithm allows to approximate smoothed expectations of additive functionals online and with a complexity growing only linearly with the number of particles. The crucial and simple result (Lemma 1) of the application of PaRIS algorithm to SDE is that the accept reject mechanism introduced in [8] ensuring the linear complexity of the procedure is still correct when the transition densities are replaced by unbiased estimates. The usual FFBS and FFBSi algorithms may not be extended this easily since they both require the computation of weights defined as ratios involving the transition densities, thus replacing these unknown quantities by unbiased estimates does not lead to unbiased estimators of the weights. The proposed Generalized Random version of PaRIS algorithm, hereafter named GRand PaRIS algorithm, may be applied to general hidden Markov models whose Markovian dynamics is ruled by a stochastic differential equation (one of the first two domains defined in [4]) but also to any general state space model where the transition density of the hidden chain may be estimated unbiasedly.

Section 2 describes the proposed algorithm to approximate smoothed additive functionals using unbiased estimates of the transition density of the hidden states and details the application of this algorithm when the transition density may be approximated using a GPE. In Section 3, classical convergence results for SMC smoothers are extended to the setting of this paper and illustrated with numerical experiments in Section 4. All proofs are postponed to Appendix A.

2 The Generalized Random PaRIS algorithm

$(X_{t})_{t\geq 0}$ is defined as a weak solution to the following SDE in $\mathbb{R}^{d}$ :

[TABLE]

where $(W_{t})_{t\geq 0}$ is a standard Brownian motion. It is assumed that $\alpha$ is of the form $\alpha(x)=\nabla_{x}A(x)$ where $A:\mathbb{R}^{d}\to\mathbb{R}$ is a twice continuously differentiable function. The solution to (1) is supposed to be partially observed at times $t_{0}=0,\dots,t_{n}$ through an observation process $(Y_{k})_{0\leq k\leq n}$ in $(\mathbb{R}^{m})^{n+1}$ . For all $0\leq k\leq n$ , the distribution of $Y_{k}$ given $X_{k}:=X_{t_{k}}$ has a density with respect to a reference measure $\lambda$ on $\mathbb{R}^{m}$ given by $g(X_{k},\cdot)=g_{k}(X_{k})$ . The distribution of $X_{0}$ has a density with respect to a reference measure $\mu$ on $\mathbb{R}^{d}$ given by $\chi$ . For all $0\leq k\leq n-1$ , the conditional distribution of $X_{k+1}$ given $X_{k}$ has a density $q_{k}(X_{k},\cdot)$ with respect to $\mu$ .

Let $0\leq k\leq k^{\prime}\leq n$ , the joint smoothing distributions of the hidden states are defined, for all measurable function $h$ on $(\mathbb{R}^{d})^{k^{\prime}-k+1}$ , by:

[TABLE]

For all $0\leq k\leq n$ , $\phi_{k}=\phi_{k:k|k}$ denote the filtering distributions. The aim of this section is to detail the extension of PaRIS algorithm to approximate expectations of the form

[TABLE]

when the transition density of the hidden states is not available explicitly and where $\{h_{k}\}_{k=0}^{n-1}$ are given functions on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ . The algorithm is based on the following link between the filtering and smoothing distributions for additive functionals, see [24]:

[TABLE]

The approximation of (3) requires first to approximate the sequence of filtering distributions. Sequential Monte Carlo methods provide an efficient and simple solution to obtain these approximations using sets of particles $\{\xi^{\ell}_{k}\}_{\ell=1}^{N}$ associated with weights $\{\omega^{\ell}_{k}\}_{\ell=1}^{N}$ , $0\leq k\leq n$ .

At time $k=0$ , $N$ particles $\{\xi^{\ell}_{0}\}_{\ell=1}^{N}$ are sampled independently according to $\xi^{\ell}_{0}\sim\eta_{0}$ , where $\eta_{0}$ is a probability density with respect to $\mu$ . Then, $\xi^{\ell}_{0}$ is associated with the importance weights $\omega_{0}^{\ell}=\chi(\xi^{\ell}_{0})g_{0}(\xi^{\ell}_{0})/\eta_{0}(\xi^{\ell}_{0})$ . For any bounded and measurable function $h$ defined on $\mathbb{R}^{d}$ , the expectation $\phi_{0}[h]$ is approximated by

[TABLE]

Then, for $1\leq k\leq n$ , using $\{(\xi^{\ell}_{k-1},\omega^{\ell}_{k-1})\}_{\ell=1}^{N}$ , the auxiliary particle filter of [26] samples pairs $\{(I^{\ell}_{k},\xi^{\ell}_{k})\}_{\ell=1}^{N}$ of indices and particles using an instrumental transition density $p_{k}$ on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ and an adjustment multiplier function $\vartheta_{k}$ on $\mathbb{R}^{d}$ . Each new particle $\xi^{\ell}_{k}$ and weight $\omega^{\ell}_{k}$ at time $k$ are computing following these steps:

choose a particle index $I^{\ell}_{k}$ at time $k-1$ in $\{1,\ldots,N\}$ with probabilities proportional to $\omega_{k-1}^{j}\vartheta_{k}(\xi^{j}_{k-1})$ , for $j$ in $\{1,\ldots,N\}$ ; 2. -

sample $\xi^{\ell}_{k}$ using this chosen particle according to $\xi^{\ell}_{k}\sim p_{k}(\xi^{I^{\ell}_{k}}_{k-1},\cdot)$ ; 3. -

associate the particle $\xi^{\ell}_{k}$ with the importance weight:

[TABLE]

The expectation $\phi_{k}[h]$ is approximated by

[TABLE]

PaRIS algorithm uses the same decomposition as the FFBS algorithm introduced in [10] and the FFBSi algorithm proposed by [12] to approximate smoothing distributions. It combines both the forward only version of the FFBS algorithm with the sampling mechanism of the FFBSi algorithm. It does not produce an approximation of the smoothing distributions but of the smoothed expectation of a fixed additive functional and thus may be used to approximate (2). Its crucial property is that it does not require a backward pass, the smoothed expectation is computed on-the-fly with the particle filter and no storage of the particles or weights is needed.

PaRIS algorithm relies on the following fundamental property of $T_{k}[H_{k}]$ when $H_{k}$ is as in (2):

[TABLE]

Therefore, [24] introduces sufficient statistics $\tau^{i}_{k}$ (starting with $\tau^{i}_{0}=0$ , $1\leq i\leq N$ ), approximating $T_{k}[H_{k}](\xi^{i}_{k})$ , for $1\leq i\leq N$ and $0\leq k\leq n$ . First, replacing $\phi_{k-1}$ by $\phi^{N}_{k-1}$ in the last equation leads to the following approximation of $T_{k}[H_{k}](\xi^{i}_{k})$ :

[TABLE]

where

[TABLE]

Computing exactly these approximations would lead to a complexity growing quadratically with $N$ because of the normalizing constant in (6). Therefore, PaRIS algorithm samples particles in the set $\{\xi^{j}_{k-1}\}_{j=1}^{N}$ with probabilities $\Lambda_{k}^{N}(i,\cdot)$ to approximate the expectation (5) and produce $\tau^{i}_{k}$ . Choosing $\tilde{N}\geq 1$ , at each time step $0\leq k\leq{n-1}$ these statistics are updated according to the following steps.

(i)

Run one step of a particle filter to produce $\{(\xi^{\ell}_{k},\omega^{\ell}_{k})\}$ for $1\leq\ell\leq N$ . 2. (ii)

For all $1\leq i\leq N$ , sample independently $J_{k}^{i,\ell}$ in $\{1,\ldots,N\}$ for $1\leq\ell\leq\widetilde{N}$ with probabilities $\Lambda_{k}^{N}(i,\cdot)$ , given by (6). 3. (iii)

Set

[TABLE]

Then, (2) is approximated by

[TABLE]

As proved in [24], the algorithm is asymptotically consistent (as $N$ goes to infinity) for any precision parameter $\tilde{N}$ . However, there is a significant qualitative difference between the cases $\tilde{N}=1$ and $\tilde{N}\geq 2$ . As for the FFBSi algorithm, when there exists $\sigma_{+}$ such that $0<q_{k}<\sigma_{+}$ , PaRIS algorithm may be implemented with $\mathcal{O}(N)$ complexity using the accept-reject mechanism of [8].

In general situations, PaRIS algorithm cannot be used for stochastic differential equations as $q_{k}$ is unknown. Therefore, the computation of the importance weights $\omega_{k}^{\ell}$ and of the acceptance ratio of [8] is not tractable. Following [11, 23], filtering weights can be approximated by replacing $q_{k}(\xi^{\ell}_{k},\xi_{k+1}^{i})$ by an unbiased estimator $\widehat{q}_{k}(\xi^{\ell}_{k},\xi_{k+1}^{i};\zeta_{k})$ , where $\zeta_{k}$ is a random variable in $\mathbb{R}^{q}$ such that:

[TABLE]

where, for all $0\leq k\leq n$ ,

[TABLE]

Practical choices for $\zeta_{k}$ are discussed below, see for instance (9) which presents the choice made for the implementation of such estimators in our context. In the case where $q_{k}$ is unknown, the filtering weights in (4) then become:

[TABLE]

Therefore, to obtain a generalized random version of PaRIS algorithm, we only need to be able to sample from the discrete probability distribution $\Lambda_{k}^{N}(i,\cdot)$ in the case of SDE based HMM. Consider the following assumption: for all $0\leq k\leq n$ , there exists a random variable $\hat{\sigma}^{k}_{+}$ measurable with respect to $\mathcal{G}_{k+1}^{N}$ such that,

[TABLE]

Lemma 1.

Assume that $A_{1}$ holds for some $0\leq k\leq n-1$ . For all $1\leq i\leq N$ , define the random variable $J_{k}^{i}$ as follows:

* repeat

* Sample independently $\zeta$ , $U\sim\mathcal{U}[0,1]$ and $J\in\{1,\ldots,N\}$ with probabilities proportional to $\{\widehat{\omega}_{k}^{1},\dots,\widehat{\omega}_{k}^{N}\}$ . *

* until $U\leq\widehat{q_{k}}(\xi_{k}^{J},\xi_{k+1}^{i},\zeta)/\hat{\sigma}^{k}_{+}$ . *

* Set $J_{k}^{i}=J$ . *

Then, the conditional probability distribution given $\mathcal{G}_{k+1}^{N}$ of $J_{k}^{i}$ is $\Lambda_{k}^{N}(i,\cdot)$ .

Proof.

See Appendix A. ∎

Note that Lemma 1 still holds if assumption (A1) is relaxed and replaced by one of the two following assumptions:

[TABLE]

It is worth noting that under assumptions (A1) or (A2), the linear complexity property of PaRIS algorithm still holds, whereas if only assumption (A3) holds, the algorithm has a quadratic complexity.

Bounded estimator of $q_{k}$

For $x,y\in\mathbb{R}^{d}$ , by Girsanov and Ito’s formulas, the transition density $q_{k}(x,y)$ of (1) satisfies, with $\Delta_{k}=t_{k+1}-t_{k}$ ,

[TABLE]

where $\mathbb{W}^{x,y,\Delta_{k}}$ is the law of Brownian bridge starting at $x$ at 0 and hitting $y$ at $\Delta_{k}$ , $(\mathsf{w}_{t})_{0\leq t\leq\Delta_{k}}$ is such a Brownian bridge, $\varphi_{\Delta_{k}}(x,y)$ is the p.d.f. of a normal distribution with mean $x$ and variance $\Delta_{k}$ , evaluated at $y$ and $\phi:\mathbb{R}^{d}\to\mathbb{R}$ is defined as:

[TABLE]

with $\triangle$ the Laplace operator. Assume that there exist random variables $\mathsf{L}_{\mathsf{w}}$ and $\mathsf{U}_{\mathsf{w}}$ such that for all $0\leq s\leq\Delta_{k}$ , $\mathsf{L}_{\mathsf{w}}\leq\phi(\mathsf{w}_{s})\leq\mathsf{U}_{\mathsf{w}}$ . Let $\kappa$ be a random variable taking values in $\mathbb{N}$ with distribution $\mu$ and $(U_{j})_{1\leq j\leq\kappa}$ be independent uniform random variables on $[0,\Delta_{k}]$ , and $\zeta_{k}=\left\{\kappa,\mathsf{w},U_{1},\ldots,U_{\kappa}\right\}\;$ . As shown in [11], a positive unbiased estimator is given by

[TABLE]

Interesting choices of $\mu$ are discussed in [11] and we focus here on the so called GPE-1, where $\mu$ is a Poisson distribution with intensity $(\mathsf{U}_{\mathsf{w}}-\mathsf{L}_{\mathsf{w}})\Delta_{k}$ . In that case, the estimator (8) becomes:

[TABLE]

On the r.h.s. of (9), the product over $\kappa$ elements is bounded by 1, therefore, a sufficient condition to satisfy of the assumptions (A1)-(A3) is that the function:

[TABLE]

is upper bounded almost surely by $\hat{\sigma}^{k}_{+}$ . In particular, if $\mathsf{L}_{\mathsf{w}}$ is bounded almost surely, (10) always satisfies assumption (A3) and Algorithm 1 can be used. This condition is always satisfied for models in the domains $\mathcal{D}_{1}$ and $\mathcal{D}_{2}$ defined in [4], i.e. domains for which the exact algorithms EA1 and EA2 can be used.

When (A1) or (A2) holds, it can be nonetheless of practical interest to choose the bound $\hat{\sigma}^{k}_{+}$ corresponding to (A3). Indeed, this might increase significantly the acceptance rate of the algorithm, and therefore reduce the number of drawings of the random variable $\zeta$ , which has a much higher cost than the computation of $\rho$ , as it requires simulations of Brownian Bridges. Moreover, this latter option can also avoid numerical optimization if no analytical expression of $\hat{\sigma}_{+}^{k}$ is available. In practice, we found this option more efficient in terms of computation time when $N$ has moderate values.

3 Convergence results

Consider the following assumptions.

H1

(i)

For all $k\geq 0$ and all $x\in\mathbb{R}^{d}$ , $g_{k}(x)>0$ . 2. (ii)

$\underset{k\geq 0}{\sup}|g_{k}|_{\infty}<\infty$ .

H2

$\underset{k\geq 1}{\sup}|\vartheta_{k}|_{\infty}<\infty$ , $\underset{k\geq 1}{\sup}|p_{k}|_{\infty}<\infty$ and $\underset{k\geq 1}{\sup}|\widehat{\omega}_{k}|_{\infty}<\infty$ , where

[TABLE]

Lemma 2.

For all $0\leq k\leq n-1$ , the random variables $\{\widehat{\omega}_{k+1}^{i}\tau_{k+1}^{i}\}_{i=1}^{N}$ are independent conditionally on $\mathcal{F}_{k}^{N}$ and

[TABLE]

Proof.

See appendix A ∎

Proposition 1.

Assume that HH1 and HH2 hold and that for all $1\leq k\leq n$ , $\mathrm{osc}(h_{k})<+\infty$ . For all $0\leq k\leq n$ and all $\widetilde{N}\geq 1$ , there exist $b_{k},c_{k}>0$ such that for all $N\geq 1$ and all $\varepsilon\in\mathbb{R}_{+}^{\star}$ ,

[TABLE]

Proof.

See appendix A ∎

4 Numerical experiments

This section investigates the performance of the proposed algorithm with the sine and log-growth models. In both cases, the proposal distribution $p_{k}$ is chosen as the following approximation of the optimal filter (or the fully adapted particle filter in the terminology of [26]):

[TABLE]

where $\tilde{q}_{k}(x_{k-1},x_{k})$ is the p.d.f. of Gaussian distibution with mean $\alpha(x_{k-1})\Delta_{k}$ and variance $\Delta_{k}I_{d}$ , i.e. the Euler approximation of equation (1). As the observation model is linear and Gaussian, the proposal distribution is therefore Gaussian with explicit mean and variance.

In order to evaluate the performance of the proposed algorithm, the following strategy has been chosen. We compare the estimation of the EM intermediate quantity with the one obtained by the fixed lag method of [23], for different values of the lag (namely, 1,2,5,10,50). The particle approximation of $\mathcal{Q}(\theta,\theta)$ for each model is computed using each algorithm, see Figure 1 for the SINE model (and respectively Figure 3 for the log-growth model). This estimation is performed 200 times to obtain the estimates $\widehat{Q}_{1},\dots,\widehat{Q}_{200}$ , using $\tilde{N}=2$ particles for PaRIS algorithm, and $M=30$ replications for the Monte Carlo approximation $\widehat{q}_{k}$ of each $q_{k}$ . Moreover, the E step requires the computation of a quantity such as (2) with $h_{k}=\log g_{k}+\log q_{k}$ . $\log q_{k}$ is not available explicitly and is approximated using the unbiased estimator proposed in [23, Appendix B] based on 30 independent Monte Carlo simulations. The intermediate quantity of the EM algorithm is also estimated with our algorithm 30 times using $N=5000$ particles, the reference value is then computed as the arithmetic mean of these 30 estimations, and denoted by $\widehat{Q}_{\star}$ . Figure 1 (resp. 3 ) shows this estimate for an example on one simulated data set. The GRand Paris algorithm is performed using $N=400$ particles in both cases, the fixed lag technique using $N=1600$ so that both estimations require similar computational times.

The SINE model

The performance of the GRand PaRIS algorithm are first highlighted using the SINE model, where $(X_{t})_{t\geq 0}$ is supposed to be the solution to:

[TABLE]

This simple model has no explicit transition density, however GPE estimators may be computed by simulating Brownian bridges. The process solution to (11) is observed regularly at times $t_{0}=0,\ldots,t_{100}=50$ through the observation process $(Y_{k})_{0\leq k\leq 100}$ :

[TABLE]

where the $(\varepsilon)_{0\leq k\leq 100}$ are i.i.d. $\mathcal{N}(0,1)$ . In the example displayed on Figure 1, we set $\theta=0$ . In that case, the function $\rho_{\Delta_{k}}$ defined in (10) can be upper bounded either on $(x,y)$ or only on $y$ , the GRand PaRIS algorithm has therefore a linear complexity.

This same experiment was reproduced on 100 different simulated data sets. For each simulation $s$ , the empirical absolute relative bias $\mathsf{arb}_{s}$ and the empirical absolute coefficient of variation $\mathsf{acv}_{s}$ are computed as

[TABLE]

where $m(\widehat{Q}^{s})$ and $\sigma(\widehat{Q}^{s})$ are the empirical mean and standard deviation of the sample $Q_{1}^{s},\dots,Q_{200}^{s}$ . For each estimation method, the resulting distributions of $\mathsf{arb}_{1},\dots,\mathsf{arb}_{100}$ and $\mathsf{acv}_{1},\dots,\mathsf{acv}_{100}$ are shown on Figure 2.

The GRand PaRIS algorithm outperforms the fixed lag methods for any value of the lag as the bias is the lowest (it is already negligible for $N=400$ ) and with a lower variance than fixed lag estimates with negligible bias (i.e., in this case, lags larger than 10). Small lags lead to strongly biased estimates for the fixed lag method, and unbiased estimates are at the cost of a large variance. It is worth noting here that the lag for which the bias is small is model dependent.

Log-growth model

Following [4] and [24], the performance of the proposed algorithm are also illustrated with the log-growth model defined by:

[TABLE]

In order to use the exact algorithms of [4] and the GPE of [11], we consider (15) after the Lamperti transform, i.e., the process defined by $X_{t}=\eta(Z_{t})$ , with $\eta(z):=-\log(z)/\sigma$ , which satisfies the following SDE:

[TABLE]

In this case, the conditions of the Exact Algorithm 2 defined in [4] are satisfied, as for any $m\in\mathbb{R}$ there exists $\mathsf{U}_{m}$ such that for all $x\geq m$ , $\psi(x):=\alpha^{2}(x)+\alpha^{\prime}(x)\leq\mathsf{U}_{m}$ . Moreover, $\psi$ is lower bounded uniformly by $\mathsf{L}$ . Then, GPE estimators may be computed by simulating the minimum of a Brownian bridge, and simulating Bessel bridges conditionally to this minimum, as proposed by [4].

The process solution to (16) is observed regularly at times $t_{0}=0,\dots,t_{50}=100$ through the observation process $(Y_{k})_{0\leq k\leq 50}$ defined as:

[TABLE]

where the $(\varepsilon_{k})_{0\leq k\leq 50}$ are i.i.d. $\mathcal{N}(0,\sigma^{2}_{obs})$ . The parameters are given by

[TABLE]

In that case, the $\rho_{\Delta_{k}}$ function defined in (10) can be upper bounded as a function of $y$ when $x\in\{\xi_{k}^{1},\dots,\xi_{k}^{N}\}$ , the GRand PaRIS algorithm has therefore a linear complexity. The intermediate quantity of the EM algorithm is evaluated as for the SINE model, see Figures 3 and 4.

The results for the fixed lag technique are similar to the ones presented in [23, Figure 1] on the same model. For small lags, the variance of the estimates is small, but the estimation is highly biased. The bias rapidly decreases as the lag increases, together with a great increase of variance. Again, the GRand PaRIS algorithm outperforms the fixed lag smoother as it shows a similar (vanishing) bias as the fixed lag for the largest lag and a smaller variance than the fixed lags estimates with negligible bias.

5 Conclusions

This paper presents a new online SMC smoother for partially observed differential equations. This algorithm relies on an accept-reject procedure inspired from the recent PaRIS algorithm. The main result of the article for practical applications is that the mechanism of this procedure remains valid when the transition density is approximated by a an unbiased positive estimator. The proposed procedure outperforms the existing fixed lag smoother for SDE of [23], as it does not introduce an intrinsic and non vanishing bias. In addition, numerical simulations highlight a better variance using data from two different models. It can be implemented for the class of models $\mathcal{D}_{1}$ and $\mathcal{D}_{2}$ defined in [4] with a linear complexity in $N$ .

Appendix A Proofs

Proof of Lemma 1.

Let $\tau$ be the first time draws are accepted in the accept-reject mechanism. For all $\ell\geq 1$ , write

[TABLE]

Let $h$ be a function defined on $\{1,\ldots,N\}$ ,

[TABLE]

which concludes the proof. ∎

Proof of Lemma 2.

The independence is ensured by the mechanism of SMC methods. By (7),

[TABLE]

Note that by Lemma 1,

[TABLE]

Since $\tau^{i}_{k+1}$ and $\zeta_{k}$ are independent conditionally to $\mathcal{G}_{k+1}^{N}$ :

[TABLE]

Moreover, conditionally to $\mathcal{F}_{k}^{N}$ , the probability density function of $(\xi_{k+1}^{i},I_{k+1}^{i})$ is given by

[TABLE]

Therefore, this yields:

[TABLE]

which concludes the proof. ∎

Proof of Proposition 1.

The results is proved by induction. At time $k=0$ , the result holds using that for all $1\leq i\leq N$ , $\rho_{0}^{i}=0$ and the convention $T_{0}[h_{0}]=0$ . In addition, $\phi_{0}^{N}$ is a standard importance sampler estimator of $\phi_{0}$ with $\widehat{\omega}_{0}^{i}\leq|\widehat{\omega}_{0}|_{\infty}$ so that for any bounded function $h$ on $\mathsf{X}$ ,

[TABLE]

Assume the results holds for $k\geq 1$ and that $\vartheta_{k+1}=1$ for simplicity. Write

[TABLE]

where $a_{N}=N^{-1}\sum_{i=1}^{N}\widehat{\omega}_{k+1}^{i}\left(\tau_{k+1}^{i}-\phi_{k+1}\left[T_{k+1}[h_{k+1}]\right]\right)$ and $b_{N}=N^{-1}\sum_{i=1}^{N}\widehat{\omega}_{k+1}^{i}$ . By Lemma 2, the random variables $\{\widehat{\omega}_{k+1}^{i}\tau_{k+1}^{i}\}_{i=1}^{N}$ are independent conditionally on $\mathcal{F}_{k}^{N}$ and by HH2,

[TABLE]

Therefore, by Hoeffding inequality,

[TABLE]

On the other hand,

[TABLE]

where

[TABLE]

By [24, Lemma 11], $\phi_{k}\left[\Upsilon_{k}\right]=0$ which implies by the induction assumption that

[TABLE]

Then,

[TABLE]

Similarly, as $b_{N}\leq|\widehat{\omega}_{k}|_{\infty}$ , by Hoeffding inequality,

[TABLE]

Note that

[TABLE]

By the induction assumption,

[TABLE]

The proof is completed using Lemma 3. ∎

Lemma 3.

Assume that $a_{N}$ , $b_{N}$ , and $b$ are random variables defined on the same probability space such that there exist positive constants $\beta$ , $B$ , $C$ , and $M$ satisfying

(i)

$|a_{N}/b_{N}|\leq M$ , $\mathbb{P}$ -a.s. and $b\geq\beta$ , $\mathbb{P}$ -a.s., 2. (ii)

For all $\epsilon>0$ and all $N\geq 1$ , $\mathbb{P}\left[|b_{N}-b|>\epsilon\right]\leq B\exp\left(-CN\epsilon^{2}\right)$ , 3. (iii)

For all $\epsilon>0$ and all $N\geq 1$ , $\mathbb{P}\left[|a_{N}|>\epsilon\right]\leq B\exp\left(-CN\left(\epsilon/M\right)^{2}\right)$ .

Then,

[TABLE]

Proof.

See [8]. ∎

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Ait-Sahalia. Transition densities for interest rate and other nonlinear diffusions. Journal of Finance , 54:1361–1395, 1999.
2[2] Y. Ait-Sahalia. Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach. Econometrica , 70:223–262, 2002.
3[3] Y. Ait-Sahalia. Closed-form likelihood expansions for multivariate diffu- sions. The Annals of Statistics , 36:906–937, 2008.
4[4] A. Beskos, O. Papaspiliopoulos, G. Roberts, and P. Fearnhead. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discusion). J. Roy. Statist. Soc. Ser. B , 68(3):333–382, 2006.
5[5] O. Cappé, E. Moulines, and T. Rydén. Inference in Hidden Markov Models . Springer, 2005.
6[6] P. Del Moral, J. Jacod, and P. Protter. The Monte Carlo method for filtering with discrete-time observations. Probability Theory and Related Fields , 120:346 – 368, 2001.
7[7] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B , 39(1):1–38 (with discussion), 1977.
8[8] R. Douc, A. Garivier, E. Moulines, and J. Olsson. Sequential Monte Carlo smoothing for general state space hidden Markov models. Ann. Appl. Probab. , 21(6):2109–2145, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Online Sequential Monte Carlo smoother for partially observed stochastic differential equations

Abstract

1 Introduction

2 The Generalized Random PaRIS algorithm

Lemma 1**.**

Proof.

Bounded estimator of qkq_{k}qk​

3 Convergence results

Lemma 2**.**

Proof.

Proposition 1**.**

Proof.

4 Numerical experiments

The SINE model

Log-growth model

5 Conclusions

Appendix A Proofs

Proof of Lemma 1.

Proof of Lemma 2.

Proof of Proposition 1.

Lemma 3**.**

Proof.

Lemma 1.

Bounded estimator of $q_{k}$

Lemma 2.

Proposition 1.

Lemma 3.