Error Analysis for the Particle Filter: Methods and Theoretical Support

Ziyu Liu; Shihong Wei; James C. Spall

arXiv:1903.12078·stat.CO·March 29, 2019

Error Analysis for the Particle Filter: Methods and Theoretical Support

Ziyu Liu, Shihong Wei, James C. Spall

PDF

Open Access

TL;DR

This paper analyzes the error behavior of particle filters in nonlinear, non-Gaussian settings, providing theoretical insights and empirical validation of asymptotic normality with frequent resampling.

Contribution

It offers a decomposition of particle filter error and proves asymptotic normality under continuous resampling, supported by practical examples.

Findings

01

Error decomposes into two terms with asymptotic normality.

02

Frequent resampling ensures the estimator's distribution converges to normal.

03

Empirical examples confirm theoretical predictions.

Abstract

The particle filter is a popular Bayesian filtering algorithm for use in cases where the state-space model is nonlinear and/or the random terms (initial state or noises) are non-Gaussian distributed. We study the behavior of the error in the particle filter algorithm as the number of particles gets large. After a decomposition of the error into two terms, we show that the difference between the estimator and the conditional mean is asymptotically normal when the resampling is done at every step in the filtering process. Two nonlinear/non-Gaussian examples are tested to verify this conclusion.

Equations225

\hat{x}_{k} - x_{k} = (\hat{x}_{k} - E [x_{k} ∣ z_{1 : k}]) + (E [x_{k} ∣ z_{1 : k}] - x_{k}) .

\hat{x}_{k} - x_{k} = (\hat{x}_{k} - E [x_{k} ∣ z_{1 : k}]) + (E [x_{k} ∣ z_{1 : k}] - x_{k}) .

\left\{\begin{array}[]{lr}\bm{x}_{k+1}=\bm{f}_{k}(\bm{x}_{k},\ \bm{w}_{k}),\\ \bm{z}_{k}=\bm{h}_{k}(\bm{x}_{k},\ \bm{v}_{k}).\end{array}\right.

\left\{\begin{array}[]{lr}\bm{x}_{k+1}=\bm{f}_{k}(\bm{x}_{k},\ \bm{w}_{k}),\\ \bm{z}_{k}=\bm{h}_{k}(\bm{x}_{k},\ \bm{v}_{k}).\end{array}\right.

p (x_{k} ∣ z_{1 : k - 1}) = \int p (x_{k} ∣ x_{k - 1}) p (x_{k - 1} ∣ z_{1 : k - 1}) d x_{k - 1},

p (x_{k} ∣ z_{1 : k - 1}) = \int p (x_{k} ∣ x_{k - 1}) p (x_{k - 1} ∣ z_{1 : k - 1}) d x_{k - 1},

p (x_{k} ∣ z_{1 : k}) = \frac{p ( z _{k} ∣ x _{k} ) p ( x _{k} ∣ z _{1 : k - 1} )}{p ( z _{k} ∣ z _{1 : k - 1} )} .

p (x_{k} ∣ z_{1 : k}) = \frac{p ( z _{k} ∣ x _{k} ) p ( x _{k} ∣ z _{1 : k - 1} )}{p ( z _{k} ∣ z _{1 : k - 1} )} .

\upalpha_{k} (x_{1 : k}^{i})

\upalpha_{k} (x_{1 : k}^{i})

= \frac{p ( z _{1 : k} , x _{k}^{i} )}{q ( x _{k}^{i} ∣ z _{1 : k} )} = \frac{p ( x _{k}^{i} ∣ z _{1 : k} ) p ( z _{k} )}{q ( x _{k}^{i} ∣ z _{1 : k} )} .

\upalpha_{k} (x_{1 : k}^{i})

\upalpha_{k} (x_{1 : k}^{i})

= \upalpha_{k - 1} (x_{1 : k - 1}^{i}) \frac{p ( z _{k} ∣ x _{k}^{i} ) p ( x _{k}^{i} ∣ x _{k - 1}^{i} )}{q ( x _{k}^{i} ∣ x _{0 : k - 1}^{i} , z _{1 : k} )} .

p_{T} (x_{1 : T} ∣ z_{1 : T}) \propto k = 1 \prod T [p_{k} (x_{k} ∣ x_{k - 1}) p_{k} (z_{k} ∣ x_{k})] .

p_{T} (x_{1 : T} ∣ z_{1 : T}) \propto k = 1 \prod T [p_{k} (x_{k} ∣ x_{k - 1}) p_{k} (z_{k} ∣ x_{k})] .

L_{T} (x_{1 : T}) = \frac{p _{T} ( x _{1 : T} ∣ z _{1 : T} )}{k = 1 \prod T q _{k} ( x _{k} ∣ x _{1 : k - 1} )} .

L_{T} (x_{1 : T}) = \frac{p _{T} ( x _{1 : T} ∣ z _{1 : T} )}{k = 1 \prod T q _{k} ( x _{k} ∣ x _{1 : k - 1} )} .

\overset{ˉ}{\upalpha}_{k} = \frac{1}{m} j = 1 \sum m \upalpha_{k} (\tilde{x}_{1 : k}^{j}),

\overset{ˉ}{\upalpha}_{k} = \frac{1}{m} j = 1 \sum m \upalpha_{k} (\tilde{x}_{1 : k}^{j}),

H_{k}^{i}

H_{k}^{i}

\tilde{H}_{k}^{i}

\left\{\begin{array}[]{lr}\mathcal{F}_{2k-1}=\{\tilde{\bm{x}}^{i}_{1}:1\leq i\leq m\}\cup\\ \ \ \ \ \ \ \ \ \ \{(\bm{x}^{i}_{l},\tilde{\bm{x}}^{i}_{l+1},A^{i}_{l}):1\leq l<k,1\leq i\leq m\},\\ \mathcal{F}_{2k}=\mathcal{F}_{2k-1}\cup\{(\bm{x}^{i}_{k},A^{i}_{k}):1\leq i\leq m\}.\end{array}\right.

\left\{\begin{array}[]{lr}\mathcal{F}_{2k-1}=\{\tilde{\bm{x}}^{i}_{1}:1\leq i\leq m\}\cup\\ \ \ \ \ \ \ \ \ \ \{(\bm{x}^{i}_{l},\tilde{\bm{x}}^{i}_{l+1},A^{i}_{l}):1\leq l<k,1\leq i\leq m\},\\ \mathcal{F}_{2k}=\mathcal{F}_{2k-1}\cup\{(\bm{x}^{i}_{k},A^{i}_{k}):1\leq i\leq m\}.\end{array}\right.

\left\{\begin{array}[]{lr}\bm{u}_{0}=E[\bm{x}_{T}|\bm{z}_{1:T}],\\ \bm{u}_{k}(\bm{x}_{1:k})=E[\bm{x}_{T}L_{T}(\bm{x}_{1:k})|\bm{x}_{1:k}],\ \ \ 1\leq k\leq T\end{array}\right.

\left\{\begin{array}[]{lr}\bm{u}_{0}=E[\bm{x}_{T}|\bm{z}_{1:T}],\\ \bm{u}_{k}(\bm{x}_{1:k})=E[\bm{x}_{T}L_{T}(\bm{x}_{1:k})|\bm{x}_{1:k}],\ \ \ 1\leq k\leq T\end{array}\right.

g_{k}^{*} (x_{1 : k}) = \frac{E [ l = 1 \prod k \upalpha _{l} ( x _{1 : l} )]}{l = 1 \prod k \upalpha _{l} ( x _{1 : l} )},

g_{k}^{*} (x_{1 : k}) = \frac{E [ l = 1 \prod k \upalpha _{l} ( x _{1 : l} )]}{l = 1 \prod k \upalpha _{l} ( x _{1 : l} )},

\left\{\begin{array}[]{lr}\bm{\Sigma}_{2k-1}=E\{(\bm{u}_{k}(\bm{x}_{1:k})\bm{u}_{k}(\bm{x}_{1:k})^{T}-\\ \ \ \ \ \ \ \ \ \ \ \ \ \bm{u}_{k-1}(\bm{x}_{1:k-1})\bm{u}_{k-1}(\bm{x}_{1:k-1})^{T})g_{k-1}^{*}\},\\ \bm{\Sigma}_{2k}=E\{[\bm{u}_{k}(\bm{x}_{1:k})g_{k}^{*}-\bm{u}_{0}]{[\bm{u}_{k}(\bm{x}_{1:k})g_{k}^{*}-\bm{u}_{0}]}^{T}/g_{k}^{*}\}.\end{array}\right.

\left\{\begin{array}[]{lr}\bm{\Sigma}_{2k-1}=E\{(\bm{u}_{k}(\bm{x}_{1:k})\bm{u}_{k}(\bm{x}_{1:k})^{T}-\\ \ \ \ \ \ \ \ \ \ \ \ \ \bm{u}_{k-1}(\bm{x}_{1:k-1})\bm{u}_{k-1}(\bm{x}_{1:k-1})^{T})g_{k-1}^{*}\},\\ \bm{\Sigma}_{2k}=E\{[\bm{u}_{k}(\bm{x}_{1:k})g_{k}^{*}-\bm{u}_{0}]{[\bm{u}_{k}(\bm{x}_{1:k})g_{k}^{*}-\bm{u}_{0}]}^{T}/g_{k}^{*}\}.\end{array}\right.

\hat{x}_{T} = i = 1 \sum m \tilde{x}_{T}^{i} w_{T} (\tilde{x}_{1 : T}^{i}) = (m \overset{ˉ}{\upalpha}_{T})^{- 1} i = 1 \sum m \tilde{x}_{T}^{i} \upalpha_{T} (\tilde{x}_{1 : T}^{i}) .

\hat{x}_{T} = i = 1 \sum m \tilde{x}_{T}^{i} w_{T} (\tilde{x}_{1 : T}^{i}) = (m \overset{ˉ}{\upalpha}_{T})^{- 1} i = 1 \sum m \tilde{x}_{T}^{i} \upalpha_{T} (\tilde{x}_{1 : T}^{i}) .

\hat{x}_{T}^{*} = m^{- 1} i = 1 \sum m L_{T} (\tilde{x}_{1 : T}^{i}) \tilde{x}_{T}^{i} H_{T - 1}^{i} .

\hat{x}_{T}^{*} = m^{- 1} i = 1 \sum m L_{T} (\tilde{x}_{1 : T}^{i}) \tilde{x}_{T}^{i} H_{T - 1}^{i} .

E [t = 1 \prod T \upalpha_{t} (x_{1 : t})]

E [t = 1 \prod T \upalpha_{t} (x_{1 : t})]

= \int t = 1 \prod T p_{t} (x_{t} ∣ x_{t - 1}) p_{t} (z_{t} ∣ x_{t}) d ν (x_{1 : T}),

p_{T} (x_{1 : T} ∣ z_{1 : T}) = \frac{k = 1 \prod T [ p _{k} ( x _{k} ∣ x _{k - 1} ) p _{k} ( z _{k} ∣ x _{k} )]}{\int k = 1 \prod T p _{k} ( x _{k} ∣ x _{k - 1} ) p _{t} ( z _{k} ∣ x _{k} ) d ν ( x _{1 : T} )} .

p_{T} (x_{1 : T} ∣ z_{1 : T}) = \frac{k = 1 \prod T [ p _{k} ( x _{k} ∣ x _{k - 1} ) p _{k} ( z _{k} ∣ x _{k} )]}{\int k = 1 \prod T p _{k} ( x _{k} ∣ x _{k - 1} ) p _{t} ( z _{k} ∣ x _{k} ) d ν ( x _{1 : T} )} .

p_{T} (x_{1 : T} ∣ z_{1 : T}) = \frac{k = 1 \prod T [ p _{k} ( x _{k} ∣ x _{k - 1} ) p _{k} ( z _{k} ∣ x _{k} )]}{E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]} .

p_{T} (x_{1 : T} ∣ z_{1 : T}) = \frac{k = 1 \prod T [ p _{k} ( x _{k} ∣ x _{k - 1} ) p _{k} ( z _{k} ∣ x _{k} )]}{E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]} .

L_{T} (x_{1 : T})

L_{T} (x_{1 : T})

L_{T} (\tilde{x}_{1 : T}^{i}) H_{T - 1}^{i}

= \frac{\upalpha ˉ _{1} \dots \upalpha ˉ _{T}}{E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]} \frac{\upalpha _{T} ( x ~ _{1 : T}^{i} )}{\upalpha ˉ _{T}} .

\upalpha_{T} (\tilde{x}_{1 : T}^{i}) = \frac{\upalpha ˉ _{T} E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]}{\upalpha ˉ _{1} \dots \upalpha ˉ _{T}} L_{T} (\tilde{x}_{1 : T}^{i}) H_{T - 1}^{i},

\upalpha_{T} (\tilde{x}_{1 : T}^{i}) = \frac{\upalpha ˉ _{T} E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]}{\upalpha ˉ _{1} \dots \upalpha ˉ _{T}} L_{T} (\tilde{x}_{1 : T}^{i}) H_{T - 1}^{i},

\hat{x}_{T} = (m \overset{ˉ}{\upalpha}_{T})^{- 1} i = 1 \sum m \tilde{x}_{T}^{i} \upalpha_{T} (\tilde{x}_{1 : T}^{i})

= (m \overset{ˉ}{\upalpha}_{T})^{- 1} i = 1 \sum m \tilde{x}_{T}^{i} \frac{\upalpha ˉ _{T} E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]}{\upalpha ˉ _{1} \dots \upalpha ˉ _{T}} L_{T} (\tilde{x}_{1 : T}^{i}) H_{T - 1}^{i}

= m^{- 1} \frac{E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]}{\upalpha ˉ _{1} \dots \upalpha ˉ _{T}} i = 1 \sum m \tilde{x}_{T}^{i} L_{T} (\tilde{x}_{1 : T}^{i}) H_{T - 1}^{i}

= \frac{E [ k = 1 \prod T \upalpha _{k} ( x _{1 : k} )]}{\upalpha ˉ _{1} \dots \upalpha ˉ _{T}} \hat{x}_{T}^{*} .

m^{- 1} i = 1 \sum m G (\tilde{x}_{1 : k}^{i}) ⟶ p E [G (x_{1 : k}) / g_{k - 1}^{*} (x_{1 : k - 1})];

m^{- 1} i = 1 \sum m G (\tilde{x}_{1 : k}^{i}) ⟶ p E [G (x_{1 : k}) / g_{k - 1}^{*} (x_{1 : k - 1})];

m^{- 1} i = 1 \sum m G (x_{1 : k}^{i}) ⟶ p E [G (x_{1 : k}) / g_{k}^{*} (x_{1 : k})] .

m^{- 1} i = 1 \sum m G (x_{1 : k}^{i}) ⟶ p E [G (x_{1 : k}) / g_{k}^{*} (x_{1 : k})] .

\frac{H _{k}^{i}}{g _{k}^{*} ( x _{1 : k}^{i} )} ⟶ p 1 a s m \to \infty.

\frac{H _{k}^{i}}{g _{k}^{*} ( x _{1 : k}^{i} )} ⟶ p 1 a s m \to \infty.

m^{- 1} i = 1 \sum m ∣ G (\tilde{x}_{1 : k}^{i}) ∣ 1_{{∣ G (\tilde{x}_{1 : k}^{i}) ∣ > \upepsilon m}} ⟶ p 0 a s m \to \infty.

m^{- 1} i = 1 \sum m ∣ G (\tilde{x}_{1 : k}^{i}) ∣ 1_{{∣ G (\tilde{x}_{1 : k}^{i}) ∣ > \upepsilon m}} ⟶ p 0 a s m \to \infty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTarget Tracking and Data Fusion in Sensor Networks · Bayesian Modeling and Causal Inference · Fault Detection and Control Systems

Full text

\bbl@ifshorthand

"

Error Analysis for the Particle Filter:

Methods and Theoretical Support

Ziyu Liu1, Shihong Wei2, and James C. Spall3 A compressed version of this paper appears in the Proceedings of the American Control Conference, Philadelphia, PA, 10-12 July 2019.1Ziyu Liu is a graduate student of Johns Hopkins University Applied Math and Statistics Department. Whitehead Hall, 3400 North Charles Street, Baltimore, MD 21218 [email protected]2Shihong Wei is a graduate student of Johns Hopkins University Applied Math and Statistics Department. Whitehead Hall, 3400 North Charles Street, Baltimore, MD 21218 [email protected]3James C. Spall is a member of the Principal Professional Staff at the JHU Applied Physics Laboratory and Research Professor of Johns Hopkins University Applied Math and Statistics Department. Whitehead Hall, 3400 North Charles Street, Baltimore, MD 21218 [email protected]

Zusammenfassung

The particle filter is a popular Bayesian filtering algorithm for use in cases where the state-space model is nonlinear and/or the random terms (initial state or noises) are non-Gaussian distributed. We study the behavior of the error in the particle filter algorithm as the number of particles gets large. After a decomposition of the error into two terms, we show that the difference between the estimator and the conditional mean is asymptotically normal when the resampling is done at every step in the filtering process. Two nonlinear/non-Gaussian examples are tested to verify this conclusion.

I INTRODUCTION

This paper is aimed at error analysis for the particle filter (PF) in the nonlinear and/or non-Gaussian discrete state-space model. We establish the asymptotic normality for the difference between the PF estimate and the conditional mean in multivariate cases.

The PF, proposed in [1], is a popular Bayesian filtering algorithm for its ease of implementation and wide range of application. The PF circumvents the intractability of the required integral operations when updating the posterior density by directly approximating the posterior distributions by a large number of particles instead. Recall that under the linear model and Gaussian noise cases, the Kalman filter is the standard filtering choice, which is exactly the conditional mean. Furthermore, the error distribution of the Kalman filter can be characterized by the covariance matrix. However, the same convenience does not naturally hold for PF. If we have a knowledge of the error distribution of PF estimate, the evaluation of PF algorithm can be more precisely made in specific applications. Moreover, it also helps to improve the performance of PF in terms of deciding the optimal number of particles and tuning crucial parameters in PF variants. Thus, the error analysis for PF is a topic of both theoretical and practical interest.

Many previous studies have been conducted on modifying the generic PF to improve its performance under certain circumstances. In addition, some research has focused on the statistical properties of the PF estimate and among them the characterization of the error term draws some attention. Unlike the error behavior of Kalman filter or extended Kalman filter (see e.g. [3]), which can be characterized by the error covariance matrix in the presence of Gaussian noise, there is usually no closed-form expression for the error in PF, especially in the nonlinear and/or non-Gaussian system cases.

However, it is possible to calculate an estimation error bound for some cases of the Kalman filter with non-Gaussian noise [4][5]. Some studies are made on how PF estimate converges to the true conditional mean as an approximation. Ref. [6] discussed convergence of PF and the fluctuation of its path space and showed that the distribution of PF converges to the distribution of conditional mean as number of particles increase under certain assumptions. Ref. [7] studied the distance between the PF as a numerical approximation and its underlying continuous system and then established the convergence of PF to the continuous optimal filter. Other researchers directly focus on characterizing the error distribution. In discrete state-space model, the general framework is to decompose the error term into two parts:

[TABLE]

where $\bm{z}_{1:k}$ represent $(\bm{z}_{1},\cdots,\bm{z}_{k})$ , the first part being the difference between the PF estimate $\hat{\bm{x}}_{k}$ and the conditional mean $E[\bm{x}_{k}|\bm{z}_{1:k}]$ , and second part being the difference between the conditional mean $E[\bm{x}_{k}|\bm{z}_{1:k}]$ and the underlying true state $\bm{x}_{k}$ .

For the first part of the error decomposition, [8] uses the result of [9] to show that the distribution of the difference between generic PF estimate and the conditional mean is asymptotically normal in scalar cases as the number of particles gets large. Recently, [10] conducted error analysis specifically on the linear feedback PF, which, as a special variant of PF, includes a feedback control for particles. However, whether a similar result holds for the multivariate case remains unclear and the theoretical foundation for the second part of the error term also remain to be explored.

In this paper, we provide an error analysis for a generic type of PF for which re-sampling is performed at every step, with a focus on the first part of the error decomposition (1). In fact, we will extend the work of [8] to allow for the analysis of the difference between estimator and conditional mean under the multivariate case. After a rigorous derivation, we show that the first error term will converge asymptotically to the multivariate normal distribution as the number of particles gets large. Then, we verify the above result on two nonlinear and/or non-Gaussian discrete state-space cases.

The reminder of this paper is arranged as follows: The second section will be the statement of problem setting and clarification of notation. The third section will be the mathematical analysis showing the asymptotic normality for the partial error term. The fourth section will be the numerical verification on two examples and the last section will the conclusion and discussion for future work.

II Problem Statement and Particle Filter

II-A Discrete Time-State-Space Model

Consider the discrete time state-space model (DSSM) with the state equation and the measurement equation as follows:

[TABLE]

where $\bm{x}_{k}$ and $\bm{z}_{k}$ are the true states and the measurements, respectively, with $\bm{w}_{k}$ and $\bm{v}_{k}$ being the noise terms in the state equation and measurement equation, $\bm{f}_{k}$ is a possibly nonlinear function of state $\bm{x}_{k-1}$ , and $\bm{h}_{k}$ is a possibly nonlinear function of $\bm{x}_{k}$ .

Consider $\{\bm{x}_{k}\}$ as a hidden Markov process (HMP), $\bm{x}_{k}\sim p_{k}(\cdot|\bm{x}_{k-1})$ , $\bm{z}_{k}\sim p_{k}(\cdot|\bm{x}_{k})$ . Denote the historical records of true states and measurements by $\bm{x}_{1:k}=(\bm{x}_{1},\cdots,\bm{x}_{k})$ and $\bm{z}_{1:k}=(\bm{z}_{1},\cdots,\bm{z}_{k})$ . Then, for this filtering problem, the goal is to calculate $E\left(\bm{x}_{k}|\bm{z}_{1:k}\right)$ . In the nonlinear and/or non-Gaussian cases, Bayesian filter updates its estimators using the following recursive form:

$Prediction$ : using information of $\bm{z}_{1:k-1}$ to predict $\bm{x}_{k}$

[TABLE]

$Update$ : using information of $\bm{z}_{k}$ to adjust $\bm{x}_{k}$

[TABLE]

II-B Particle Filter

For the nonlinear and non-Gaussian DSSM, the integration of the posterior density, as required in computing $E(\bm{x}_{k}|\bm{z}_{1:k})$ , is often intractable. Hence there is usually no closed-form solution to $E(\bm{x}_{k}|\bm{z}_{1:k})$ . However, the PF can be used to represent the posterior density by a set of randomly (re)sampled weighted particles generated by the Monte Carlo method, and the particles can be averaged to form an estimator of the expectation of interest. Let the number of particles be $m$ and the $i^{th}$ particle at time $k$ be $\bm{x}^{i}_{k}$ . The realization of PF relies heavily on the principle of importance sampling.

Suppose $p(\bm{x}|\bm{z})$ is our target possiblility density function (p.d.f), and $q(\bm{x}|\bm{z})$ is the proposal p.d.f. Then the unnormalized weights are:

[TABLE]

From (II-B), the unnormalized weight $\upalpha_{k}(\bm{x}^{i}_{1:k})$ can be updated recursively as:

[TABLE]

Finally, let $w_{k}(\bm{x}^{i}_{1:k})=\upalpha_{k}(\bm{x}^{i}_{1:k})/\sum_{j=1}^{m}\upalpha_{k}(\bm{x}^{j}_{1:k})$ be the normalized weights, which we use to construct the PF estimator.

To deal with the problem of degeneration, the situation where all but a few particles have zero importance weight, we can re-sample the particles. Ref. [2] discussed several re-sampling schemes in the PF and in this study we consider the most commonly used one, multinomial re-sampling scheme. That is, at time $k$ , we first update $m$ particles from last step by $\tilde{\bm{x}}^{i}_{k}\sim p_{k}(\cdot|\bm{x}^{i}_{k-1})$ , where, to be consistent with the notation of [8], we use $\tilde{\bm{x}}^{i}_{k}$ to denote the particles before resampling. Then, we draw $m$ paths from $\{\tilde{\bm{x}}^{i}_{1:k},1\leq i\leq m$ } with probability $w^{i}_{k}$ , where we denote the records before resampling as: $\tilde{\bm{x}}^{i}_{1:k}=(\bm{x}^{i}_{1:k-1},\tilde{\bm{x}}^{i}_{k})$ , and records after resampling as: $\bm{x}^{i}_{1:k}$ .

In this study, we focus on the particular situation where the above re-sampling process is done at every step and we assign new weights to the re-sampled particles. Moreover, the total number of particles remains unchanged as $m$ .

II-C Notation

For terminal time point $T$ , the conditional density function in the hidden Markov model implies:

[TABLE]

The likelihood ratio in the importance sampling process is:

[TABLE]

For computational convenience, we also define the following quantities:

[TABLE]

where $\upalpha_{k}(\tilde{\bm{x}}^{j}_{1:k})$ is unnormalized weight of the $j^{th}$ particle path before resampling, and

[TABLE]

where $\upalpha_{l}(\bm{x}^{i}_{1:l})$ is unnormalized weight of the $i^{th}$ particle path after resampling.

Following the notation of [8], we denote the ”ancestry origin” of a particle by $A^{i}_{k}$ to keep track of it and it is defined as follows: $A^{i}_{0}=i$ for all $1\leq i\leq m$ by definition. If $\bm{x}^{i}_{1:k}$ and $\bm{x}^{j}_{1:l}$ , $l>k$ , share the same first state vector in time, (i.e. $\bm{x}^{i}_{1:k}=(\bm{x}^{i_{1}}_{1},\cdots,\bm{x}^{i_{k}}_{k}$ ), $\bm{x}^{j}_{1:l}=(\bm{x}^{j_{1}}_{1},\cdots,\bm{x}^{j_{l}}_{l}$ ), and $i_{1}=j_{1}$ ) they have the same ancestral particle, which implies $A^{j}_{l}=A^{i}_{k}$ .

Finally, let

[TABLE]

denote the history information generated by the $m$ particles at the $k^{th}$ step. Our definition of such history is in line with the decomposition of variance in the following analysis, which is aimed at constructing a nice martingale structure when proving the asymptotic normality of $\hat{\bm{x}}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T})$ .

III Mathematical Analysis

This section includes the main formal results that justify our approach. Due to space limitation here, complete proofs are given in the appendix.

III-A Theorem Statement

We state our main theorem below and then give a proof in Sec.III.C. Let the function $\bm{u}_{0}$ and $\bm{u}_{t}(\bm{x}_{1:t})$ be as follows:

[TABLE]

and same as our previous work [14], we define $\bm{\Sigma}=\sum_{k=1}^{2T-1}\bm{\Sigma}_{k}$ , where

[TABLE]

Theorem: Assume the HMP as (2), for PF estimator $\hat{\bm{x}}_{k}$ obtained by resampling at each step, and $\text{det}(\bm{\Sigma}_{k})<\infty$ for all $k$ . Then, $\sqrt{m}(\hat{\bm{x}}_{T}-E[\bm{x}_{T}|\bm{z}_{1:T}])\stackrel{{\scriptstyle\text{dist}}}{{\longrightarrow}}N(\bm{0},\bm{\Sigma})$ , as $m\rightarrow\infty$ .

III-B Estimator

In the PF algorithm, the true estimator of $E[\bm{x}_{T}|\bm{z}_{1:T}]$ is:

[TABLE]

To show asymptotically normality of $\hat{\bm{x}}_{T}-E[\bm{x}_{T}|\bm{z}_{1:T}]$ , we need an estimator that has nice martingale properties to find its asymptotic variance. Hence, we re-express (12) by (13) below based on the following rationale, and then prove (13) converges to (12) as $m$ gets large and has nice martingale structure. To facilitate that derivation, we first represent $\hat{\bm{x}}^{*}_{T}$ as

[TABLE]

Furthermore, it is easier to derive the asymptotic variance of $\hat{\bm{x}}^{*}_{T}-E[\bm{x}_{T}|\bm{z}_{1:T}]$ .

Note that $\hat{\bm{x}}_{T}$ and $\hat{\bm{x}}^{*}_{T}$ have the same (normalized) limiting distribution. However, (13) can not be used in practice because it contains normalizing constants $L_{T}(\tilde{\bm{x}}^{i}_{1:T})$ which is often unknown.

Next, we provide the reasoning behind (13) as follows:

[TABLE]

where $\nu(\bm{x}_{1:T})$ is the measure defined on space of all records corresponding to probability density function $p_{k}(\cdot|\bm{x}_{k-1})$ (note that $p(\cdot|\bm{x}_{k-1})=dP(\cdot|\bm{x}_{k-1})/d\nu(\bm{x}_{1:T})$ ).

Normalizing the right hand side of (4), we know that:

[TABLE]

Combining with (14), we have:

[TABLE]

Then,

[TABLE]

From (13) and (15),

[TABLE]

By lemma 2, which we will state and prove later, $\bar{\upalpha}_{1}\cdots\bar{\upalpha}_{T}\stackrel{{\scriptstyle p}}{{\rightarrow}}E[\prod^{T}_{k=1}\upalpha_{k}(\bm{x}_{1:k})]$ as $m\rightarrow\infty$ , and we have that $\hat{\bm{x}}^{*}_{T}$ converges to the true PF estimator $\hat{\bm{x}}_{T}$ in probability as $m\rightarrow\infty$ .

III-C Proof for Theorem

Here, we provide a rigorous proof for the theorem of Sec.III.A. To begin with, we propose the following two lemmas.

Lemma 1: Let $\bm{G}$ be a measurable vector function from history of state-space $\mathbb{R}^{t\times n}$ ( $t$ is time and $n$ is dimension of state) to $\mathbb{R}^{s}$ , where $s<\infty$ . For any $1\leq k\leq T$ , we define $g^{*}_{k}(\bm{x}_{1:k})$ as (10). Then,

(i) if $\left\lVert E[\bm{G}(\bm{x}_{1:k})/g_{k-1}^{*}(\bm{x}_{1:k-1})]\right\rVert_{\infty}<\infty$ , where $\left\lVert\bm{y}\right\rVert_{\infty}=\max_{i}|\bm{y}_{i}|$ for any vector $\bm{y}$ , as $m\rightarrow\infty$ ,

[TABLE]

(ii) if $\left\lVert E[\bm{G}(\bm{x}_{1:k})/g_{k}^{*}(\bm{x}_{1:k})]\right\rVert_{\infty}<\infty$ , as $m\rightarrow\infty$ ,

[TABLE]

Proof: Here we only give the lemma statement. All proof details are in the appendix.

Lemma 2: If the same conditions in Lemma 1 are satisfied, then

[TABLE]

Furthermore, if $\bm{G}$ is a vector function from $\mathbb{R}^{t\times n}$ ( $t$ is time and $n$ is dimension of state) to $\mathbb{R}$ , and $E[|\bm{G}(\bm{x}_{1:k})|/g^{*}_{k-1}(\bm{x}_{1:k-1})]<\infty$ , then

[TABLE]

Proof: We only give the lemma statement here. All proof details are in the appendix.

III-C1 Conditional Distribution

First, let us consider the distribution of $\sqrt{m}(\hat{\bm{x}}^{*}_{k}-E(\bm{x}_{T}|\bm{z}_{1:t}))$ conditional on $\mathcal{F}_{k-1}$ , and we will show that it is asymptotically normal as the number of particles $m$ gets large.

According to (7), we have

[TABLE]

where $\#^{i}_{t}$ is the number of copies generated in the resampling process for particle path $\tilde{\bm{x}}^{i}_{1:t}$ .

Combining (18) and (19), we have:

[TABLE]

Plugging (13) into (20),

[TABLE]

Then,

[TABLE]

where

[TABLE]

Next, we prove that the above (22) is a martingale difference sequence. Firstly, according to the importance sampling, the conditional distribution of $\tilde{\bm{x}}^{i}_{1:t}$ for $1\leq i\leq m$ given $\mathcal{F}_{2t-2}$ are independent with $\tilde{\bm{x}}^{i}_{1:t}$ having the density function $q_{t}(.|x^{i}_{1:t-1})$ . Secondly, in the resampling process, the conditional distribution of $\bm{x}^{i}_{1:t}$ for $1\leq i\leq m$ given $\mathcal{F}_{2t-1}$ are i.i.d that can take on the values $\tilde{\bm{x}}^{i}_{1:t}$ with probability $w^{i}_{t}$ .

From (22), noticing that $\mathcal{F}_{1}$ contains information of $\tilde{\bm{x}}_{1}$ , we have

[TABLE]

Also, we have

[TABLE]

Thus, carry out this process and finally we have that $\{{M}^{j}_{k},\mathcal{F}_{k},1\leq k\leq 2T-1\}$ is a martingale difference sequence. Additionally, we have $M_{k}^{1},M_{k}^{2},...,M_{k}^{m}$ are independent conditioning on $\mathcal{F}_{k-1}$ .

Rearranging (21), we have

[TABLE]

Then, given any vector $\uptheta$ , we have

[TABLE]

By (23) and (24), for any $\upepsilon>0$ , applying lemma 1 and 2, we have

[TABLE]

as $m\rightarrow\infty$

Therefore, by multivariate Lindeberg’s central limit theorem [15], the conditional distribution converges to normal distribution:

[TABLE]

Where $\stackrel{{\scriptstyle\mathcal{F}-\mathcal{D}}}{{\longrightarrow}}$ means that $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ conditional on $\mathcal{F}$ converge in distribution to $N(0,\bm{\Sigma})$ . Although of $P(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T})|\mathcal{F}_{T-1})$ is a function of history records, as the number of particles increase, the distribution of $\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T})$ conditional on $\mathcal{F}_{T-1}$ becomes stable. Therefore, the variance of this distribution is a constant.

III-C2 Unconditional Distribution

In this part, we want to show that $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ is asymptotically normal. In particular, our goal is to show unconditional asymptotic distribution of $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ converges to the same characteristic function as that of normal distribution.

[TABLE]

Specifically, we want to prove that:

[TABLE]

The proof details are shown in the appendix. In supporing matrial, we showed that (48) holds. Therefore, the unconditional distribution of $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ is asymptotically normal with mean zero and covariance matrix $\bm{\Sigma}$ . Since we have shown that, $\hat{\bm{x}}^{*}_{T}\stackrel{{\scriptstyle p}}{{\rightarrow}}\hat{\bm{x}}_{T}$ , we can conclude that $\sqrt{m}(\hat{\bm{x}}_{T}-\bm{x}_{T})\stackrel{{\scriptstyle\text{dist}}}{{\longrightarrow}}N(\bm{0},\bm{\Sigma})$ .

IV Numerical Study

In this part, let us test the asymptotic normality of $\sqrt{m}(\hat{\bm{x}}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ by studying the following two examples. One example has linear state and measurement equations but non-Gaussian noise terms. The other example is the classic multivariate stochastic volatility model in finance. It is a nonlinear and Gaussian system. The selection of these two example considers the simplicity of interpretation and wide-spread use of similiar system.

IV-A Linear and Non-Gaussian

Consider the following linear DSSM with non-Gaussian measurement error,

[TABLE]

where $\bm{w}_{k}$ and $\bm{v}_{k}$ are independent noise terms with each component following a $U_{[-1,1]}$ distribution. We approximate the conditional mean by PF using $10^{6}$ particles. Meanwhile, we generate the PF estimators by the algorithm using $10^{3}$ particles.

We run the simulation for 500 times with terminal $T=25$ . The histograms of each component of $\hat{\bm{x}}_{k}-E(\bm{x}_{25}|\bm{z}_{1:25})$ are follows:

Using the Jarque-Bera test of normality [12], the $p$ -values for each component in the error term are respectively 0.2795, 0.2438 and 0.2138. Then, we do not reject the null hypothesis of normality for all three components at the significance level of 0.05 (no adjustment for multiple comparisons here).

This result is in line with our theorem that $\hat{\bm{x}}_{k}-E(\bm{x}_{k}|\bm{z}_{1:k})$ follows an asymptotic normal distribution.

IV-B Nonlinear and Gaussian

Next, let us consider a slightly more complex model: Multivariate Stochastic Volatility model. As stated in [13], it is a classical approach to model the underlying volatility of financial derivatives using observable variables. Let $\bm{x}_{t}$ denote the volatility vector, and $\bm{z}_{t}$ be the observation vector. The system can be expressed as:

[TABLE]

where $\upmu$ denote the mean of state vector and $\Phi$ denote a matrix with each element being constant. $\bm{w}_{t}$ and $\bm{v}_{t}$ denote the multivariate normal noise terms.

This this example, consider the case when $\mu=[0,0,0]^{T},\Phi=0.5,p=3,w_{k}\ and\ v_{k}$ being standard normal. Similarly, conditional mean is approximated by PF using $10^{6}$ particles and we generate the PF estimators by the algorithm using $500$ particles.

Run the simulation for 500 times and the histograms of each component of $\hat{\bm{x}}_{25}-E(\bm{x}_{25}|\bm{z}_{1:25})$ is shown as follows:

This time, the $p$ -value for each component in the error term are respectively 0.0584, 0.1799 and 0.8063. Thus, we do not reject the null hypothesis of normality for all three components at the significant level of 0.05 (again, no adjustment for multiple comparisons here).

V CONCLUSIONS AND DISCUSSION

From the above analysis and numerical results, we come to the conclusion that $\hat{\bm{x}}_{T}-E[\bm{x}_{T}|\bm{z}_{1:T}]$ is asymptotically normal as the number of particles $m$ gets sufficiently large. For further work, we will consider a computable approximation for the covariance matrix in the asymptotic distribution, which we discuss in another work [14]. Moreover, as is stated in the framework (1), since the second part of the error decomposition is not necessarily normal, we will focus more on providing a reasonable bound for it.

Appendix

V-A Lemma 1:

Let $\bm{G}$ be a measurable vector function from history of state-space $\mathbb{R}^{t\times n}$ ( $t$ is time and $n$ is dimension of state) to $\mathbb{R}^{s}$ , where $s<\infty$ . For any $1\leq k\leq T$ , we define $g^{*}_{k}(\bm{x}_{1:k})$ as following:

[TABLE]

where $\upalpha(\bm{x}_{1:l})$ is unnormalized weight defined in equation (II-B). Then,

(i) if $\left\lVert E[\bm{G}(\bm{x}_{1:k})/g_{k-1}^{*}(\bm{x}_{1:k-1})]\right\rVert_{\infty}<\infty$ , where $\left\lVert\bm{y}\right\rVert_{\infty}=max_{i}|\bm{y}_{i}|$ for any vector $\bm{y}$ , as $m\rightarrow\infty$ ,

[TABLE]

(ii) if $\left\lVert E[\bm{G}(\bm{x}_{1:k})/g_{k}^{*}(\bm{x}_{1:k})]\right\rVert_{\infty}<\infty$ , as $m\rightarrow\infty$ ,

[TABLE]

Proof: This lemma can be proved by induction: first, we show that if (ii) holds for $k-1$ , then (i) holds for $k$ . Then, we prove that if (i) holds for $k$ , (ii) holds for $k$ using the same method.

We declare some notation first for computational convenience: for any two real value function $f(x)$ and $g(x)$ , $f^{+}(x)=\max(f(x),0)$ for all $x$ , $f^{-}(x)=-\min(f(x),0)$ for all $x$ , and $f(x)\wedge g(x)=\min(f(x),g(x))$ for all $x$ . For any two function $\upphi$ and $\upgamma$ from $\mathbb{R}^{n}$ to $\mathbb{R}^{s}$ , define:

[TABLE]

$\bm{G}(\bm{r})=(\upgamma_{1}(\bm{r}),\dots,\upgamma_{s}(\bm{r}))$ where for all $1\leq i\leq s$ , $\upgamma_{i}(\bm{r})$ is a real-valued function defined on $\mathbb{R}^{n}$ . Define $\bm{G}^{+}(\bm{r})=(\upgamma_{1}^{+}(\bm{r}),\dots,\upgamma_{s}^{+}(\bm{r}))$ , $\bm{G}^{-}(\bm{r})=(\upgamma_{1}^{-}(\bm{r}),\dots,\upgamma_{s}^{-}(\bm{r}))$ , then we have $\bm{G}(\bm{r})=\bm{G}^{+}(\bm{r})-\bm{G}^{-}(\bm{r})$ . So, without loss of generality, we can assume that $\bm{G}_{i}(\bm{r})\geq 0$ for all $i$ .

First, check that (i) holds for $k=1$ . For this, notice that when $k=1$ , $g^{*}_{k-1}\equiv 1$ . Then, (i) can be expressed as:

[TABLE]

Notice that $\upgamma_{j}(\bm{x}_{1})$ has finite variance since $\|E[\bm{G}(\bm{x}_{1})]\|_{\infty}<\infty$ (by (i)). By the weak law of large numbers, $m^{-1}\sum_{i=1}^{m}\upgamma_{j}(\tilde{\bm{x}}^{i}_{1})\stackrel{{\scriptstyle p}}{{\rightarrow}}E[\upgamma_{j}(\bm{x}_{1})]$ for all $1\leq j\leq s$ .

Thus,

[TABLE]

Equivalently, we have

[TABLE]

Next, we want to show that if (ii) holds for $k-1$ , then (i) holds for $k$ . This can be proved by a contradictory argument.

Denote $\upmu_{k}=E[\bm{G}(\bm{x}_{1:k})/g^{*}_{k-1}(\bm{x}_{1:k-1})]$ . And, in contrast to (i), there exists $m_{1}<m_{2}<\cdots$ , $m_{l},\cdots\rightarrow\infty$ , $\upepsilon>0,\updelta>0$ such that for all $\ m\in\{m_{1},\cdots,m_{l},\cdots\}$ ,

[TABLE]

In fact, we can find $\upepsilon>0$ such that for all $\ m\in\{m_{1},\cdots,m_{l},\cdots\}$ ,

[TABLE]

where $\zeta_{k}=E[\|\bm{G}(\tilde{\bm{x}}_{1:k})\|_{2}/g^{*}_{k-1}(\bm{x}_{1:k-1})]$ . The reason behind above equation is: as $\upepsilon\rightarrow 0^{+}$ , the left hand side of (32) increases to 1 and the right hand side of (32) decreases to 0. Thus, we can always find some $\upepsilon>0$ to satisfy (32).

Now, let us decompose $\bm{G}(\tilde{\bm{x}}^{i}_{1:k})$ into following three parts, which are easier to be computed and bounded:

By (30) and (31), $\upphi=(\upphi\wedge\upgamma)+(\upphi-\upgamma)^{+}$ . Thus, we can write $\bm{G}(\tilde{\bm{x}}^{i}_{1:t})$ as:

[TABLE]

where

[TABLE]

Note the following facts:

[TABLE]

Then,

[TABLE]

Since $\bm{U}^{i}_{k}$ is a projection of $\bm{G}(\tilde{\bm{x}}^{i}_{1:k})\wedge\bm{K}$ to the orthogonal subspace of $\mathcal{F}_{2k-2}$ (it means: $E[\bm{U}^{i}_{k}|\mathcal{F}_{2k-2}]=0$ ), we have:

[TABLE]

By (35), (36), and (V-A),

[TABLE]

Applying (ii) to $\bm{G}^{*}(\bm{x}_{1:k-1})=E[\bm{G}(\tilde{\bm{x}}_{1:k})|\mathcal{F}_{2k-2}]$ , we have

[TABLE]

and

[TABLE]

From (37) and (38), it follows that for m sufficiently large,

[TABLE]

Using the same trick by applying (ii) to $\bm{G}^{*}(\bm{x}_{1:k-1})=E[\bm{G}(\tilde{\bm{x}}_{1:k})|\mathcal{F}_{2k-2}]$ ,we can prove:

[TABLE]

Applying (ii) to

[TABLE]

Then, $as\ n_{l}\rightarrow\infty$

[TABLE]

In particular, we can choose $n_{l}$ such that

[TABLE]

Then,

[TABLE]

To assure this result, we require $\sqrt{s}n_{l}<m\upepsilon^{3}$ , but this can be achieved by choosing $m_{l}$ large enough. Thus, with probability at least $1-s/l$ ,

[TABLE]

Combining (39), (40) and (41), we can conclude that:

[TABLE]

Contradiction! Therefore, If (ii) for $k-1$ , (i) for $k$ . Analogously, we can prove that if (i) holds for $k$ , (ii) holds for $k$ .

V-B Lemma 2:

If the same conditions in Lemma 1 are satisfied, then

[TABLE]

$H^{i}_{k}$ in above equation is defined by (7). Furthermore, if $\bm{G}$ is a vector function from $\mathbb{R}^{t\times n}$ ( $t$ is time and $n$ is dimension of state) to $\mathbb{R}$ , and $E[|\bm{G}(\bm{x}_{1:k})|/g^{*}_{k-1}(\bm{x}_{1:k-1})]<\infty$ , then

[TABLE]

Proof: Considering the special case when $\bm{G}=\upalpha_{k}$ , by Lemma 1 (i), we have

[TABLE]

Then,

[TABLE]

Therefore,

[TABLE]

Similarly, we have

[TABLE]

Applying Lemma 1 (i) to $|\bm{G}(\cdot)|\mathbb{1}_{\{|\bm{G}(\cdot)|>M\}}$ for $M>0$ ,

[TABLE]

For arbitrary M, when m large enough.

[TABLE]

As $M\rightarrow\infty$ , $E[|\bm{G}(\bm{x}_{1:k})|\mathbb{1}_{\{|\bm{G}(\bm{x}_{1:k})|>M\}}]\rightarrow 0$ . Therefore,

[TABLE]

V-C Unconditional Distribution

In this part, we show that $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ is asymptotically normal by induction.

By equation (27), we have that

[TABLE]

Then, our goal is to show unconditional asymptotic distribution of $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ converges to the same characteristic function as that of normal distribution.

[TABLE]

Specifically, we want to prove that:

[TABLE]

First, let us check that when $T=1$ , (V-C) is automatically satisfied since

[TABLE]

Next, assuming that when $T=K$ , it is satisfied:

[TABLE]

Then, at $T=K+1$ , it follows that:

[TABLE]

From above, we have shown that (48) holds. Therefore, the unconditional distribution of $\sqrt{m}(\hat{\bm{x}}^{*}_{T}-E(\bm{x}_{T}|\bm{z}_{1:T}))$ is asymptotically normal with mean zero and covariance matrix $\bm{\Sigma}$ . Since we have shown that, $\hat{\bm{x}}^{*}_{T}\stackrel{{\scriptstyle p}}{{\rightarrow}}\hat{\bm{x}}_{T}$ , we can conclude that $\sqrt{m}(\hat{\bm{x}}_{T}-\bm{x}_{T})\stackrel{{\scriptstyle\text{dist}}}{{\longrightarrow}}N(\bm{0},\bm{\Sigma})$ .

Literatur

[1] N. Gordon, D. Salmond, and A. Smith, ”Novel approach to nonlinear/non-Gaussian Bayesian state estimation”, IEE Proceedings F Radar and Signal Processing, vol. 140, no. 2 , pp. 107, 1993.
[2] R. Douc and O. Cappe, ”Comparison of resampling schemes for particle filtering,” ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005.
[3] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, ”Stochastic stability of the discrete-time extended Kalman filter”, IEEE Transactions on Automatic Control, vol. 44, no. 4, pp. 714–728, 1999.
[4] J. C. Spall, ”The Kantorovich inequality for error analysis of the Kalman filter with unknown noise distributions,” Automatica, vol. 31, pp. 1513–1517, 1995.
[5] John L. Maryak, James C. Spall, and Bryan D. Heydon, ”Use of the Kalman filter for inference in state-space models with unknown noise distributions”, IEEE Transactions on Automatic Control vol. 49 pp. 87–90, 2004
[6] P. D. Moral and A. Guionnet, ”Central limit theorem for nonlinear filtering and interacting particle systems”, Annals of Applied Probability, vol. 9, no. 2, pp. 275–297, 1999.
[7] X. Han, J. Li, and D. Xiu, ”Error analysis for numerical formulation of particle filter”, Discrete and Continuous Dynamical Systems - Series B, vol. 20, no. 5, pp. 1337–1354, 2015.
[8] H. P. Chan and T. L. Lai, ”A general theory of particle filters in hidden Markov models and some applications”, Annals of Statistics, vol. 41, no. 6, pp. 2877–2904, 2013.
[9] H. P. Chan and T. L. Lai, ”A sequential Monte Carlo approach to computing tail probabilities in stochastic models”, The Annals of Applied Probability, vol. 21, no. 6, pp. 2315–2342, 2011.
[10] A. Taghvaei and P. G. Mehta, ”Error Analysis for the Linear Feedback Particle Filter”, Proc. American Control Conference (ACC), 2018.
[11] M. Asai, M. Mcaleer, and J. Yu, ”Multivariate Stochastic Volatility: A Review,” Econometric Reviews, vol. 25, no. 2–3, pp. 145–175, 2006.
[12] T. Thadewald and H. Büning, ”Jarqu-Bera Test and its Competitors for Testing Normality-A Power Comparison”, Journal of Applied Statistics, vol. 34, no. 1, pp. 87–105, 2007.
[13] M. Asai, M. Mcaleer, and J. Yu, ”Multivariate Stochastic Volatility: A Review”, Econometric Reviews, vol. 25, no. 2–3, pp. 145–175, 2006.
[14] Z. Liu and J. C. Spall, ”Error Estimation for the Particle Filter,” Proceedings of the 53rd Annual Conference on Information Sciences and Systems, Baltimore, MD, 20-22 March 2019.
[15] T. Ferguson, A Course in Large Sample Theory, Chapman & Hall, 1996.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Gordon, D. Salmond, and A. Smith, ”Novel approach to nonlinear/non-Gaussian Bayesian state estimation”, IEE Proceedings F Radar and Signal Processing, vol. 140, no. 2 , pp. 107, 1993.
2[2] R. Douc and O. Cappe, ”Comparison of resampling schemes for particle filtering,” ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005.
3[3] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, ”Stochastic stability of the discrete-time extended Kalman filter”, IEEE Transactions on Automatic Control, vol. 44, no. 4, pp. 714–728, 1999.
4[4] J. C. Spall, ”The Kantorovich inequality for error analysis of the Kalman filter with unknown noise distributions,” Automatica, vol. 31, pp. 1513–1517, 1995.
5[5] John L. Maryak, James C. Spall, and Bryan D. Heydon, ”Use of the Kalman filter for inference in state-space models with unknown noise distributions”, IEEE Transactions on Automatic Control vol. 49 pp. 87–90, 2004
6[6] P. D. Moral and A. Guionnet, ”Central limit theorem for nonlinear filtering and interacting particle systems”, Annals of Applied Probability, vol. 9, no. 2, pp. 275–297, 1999.
7[7] X. Han, J. Li, and D. Xiu, ”Error analysis for numerical formulation of particle filter”, Discrete and Continuous Dynamical Systems - Series B, vol. 20, no. 5, pp. 1337–1354, 2015.
8[8] H. P. Chan and T. L. Lai, ”A general theory of particle filters in hidden Markov models and some applications”, Annals of Statistics, vol. 41, no. 6, pp. 2877–2904, 2013.