Error Analysis for the Particle Filter: Methods and Theoretical Support
Ziyu Liu, Shihong Wei, James C. Spall

TL;DR
This paper analyzes the error behavior of particle filters in nonlinear, non-Gaussian settings, providing theoretical insights and empirical validation of asymptotic normality with frequent resampling.
Contribution
It offers a decomposition of particle filter error and proves asymptotic normality under continuous resampling, supported by practical examples.
Findings
Error decomposes into two terms with asymptotic normality.
Frequent resampling ensures the estimator's distribution converges to normal.
Empirical examples confirm theoretical predictions.
Abstract
The particle filter is a popular Bayesian filtering algorithm for use in cases where the state-space model is nonlinear and/or the random terms (initial state or noises) are non-Gaussian distributed. We study the behavior of the error in the particle filter algorithm as the number of particles gets large. After a decomposition of the error into two terms, we show that the difference between the estimator and the conditional mean is asymptotically normal when the resampling is done at every step in the filtering process. Two nonlinear/non-Gaussian examples are tested to verify this conclusion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Bayesian Modeling and Causal Inference · Fault Detection and Control Systems
\bbl@ifshorthand
"
Error Analysis for the Particle Filter:
Methods and Theoretical Support
Ziyu Liu1, Shihong Wei2, and James C. Spall3 A compressed version of this paper appears in the Proceedings of the American Control Conference, Philadelphia, PA, 10-12 July 2019.1Ziyu Liu is a graduate student of Johns Hopkins University Applied Math and Statistics Department. Whitehead Hall, 3400 North Charles Street, Baltimore, MD 21218 [email protected]2Shihong Wei is a graduate student of Johns Hopkins University Applied Math and Statistics Department. Whitehead Hall, 3400 North Charles Street, Baltimore, MD 21218 [email protected]3James C. Spall is a member of the Principal Professional Staff at the JHU Applied Physics Laboratory and Research Professor of Johns Hopkins University Applied Math and Statistics Department. Whitehead Hall, 3400 North Charles Street, Baltimore, MD 21218 [email protected]
Zusammenfassung
The particle filter is a popular Bayesian filtering algorithm for use in cases where the state-space model is nonlinear and/or the random terms (initial state or noises) are non-Gaussian distributed. We study the behavior of the error in the particle filter algorithm as the number of particles gets large. After a decomposition of the error into two terms, we show that the difference between the estimator and the conditional mean is asymptotically normal when the resampling is done at every step in the filtering process. Two nonlinear/non-Gaussian examples are tested to verify this conclusion.
I INTRODUCTION
This paper is aimed at error analysis for the particle filter (PF) in the nonlinear and/or non-Gaussian discrete state-space model. We establish the asymptotic normality for the difference between the PF estimate and the conditional mean in multivariate cases.
The PF, proposed in [1], is a popular Bayesian filtering algorithm for its ease of implementation and wide range of application. The PF circumvents the intractability of the required integral operations when updating the posterior density by directly approximating the posterior distributions by a large number of particles instead. Recall that under the linear model and Gaussian noise cases, the Kalman filter is the standard filtering choice, which is exactly the conditional mean. Furthermore, the error distribution of the Kalman filter can be characterized by the covariance matrix. However, the same convenience does not naturally hold for PF. If we have a knowledge of the error distribution of PF estimate, the evaluation of PF algorithm can be more precisely made in specific applications. Moreover, it also helps to improve the performance of PF in terms of deciding the optimal number of particles and tuning crucial parameters in PF variants. Thus, the error analysis for PF is a topic of both theoretical and practical interest.
Many previous studies have been conducted on modifying the generic PF to improve its performance under certain circumstances. In addition, some research has focused on the statistical properties of the PF estimate and among them the characterization of the error term draws some attention. Unlike the error behavior of Kalman filter or extended Kalman filter (see e.g. [3]), which can be characterized by the error covariance matrix in the presence of Gaussian noise, there is usually no closed-form expression for the error in PF, especially in the nonlinear and/or non-Gaussian system cases.
However, it is possible to calculate an estimation error bound for some cases of the Kalman filter with non-Gaussian noise [4][5]. Some studies are made on how PF estimate converges to the true conditional mean as an approximation. Ref. [6] discussed convergence of PF and the fluctuation of its path space and showed that the distribution of PF converges to the distribution of conditional mean as number of particles increase under certain assumptions. Ref. [7] studied the distance between the PF as a numerical approximation and its underlying continuous system and then established the convergence of PF to the continuous optimal filter. Other researchers directly focus on characterizing the error distribution. In discrete state-space model, the general framework is to decompose the error term into two parts:
[TABLE]
where represent , the first part being the difference between the PF estimate and the conditional mean , and second part being the difference between the conditional mean and the underlying true state .
For the first part of the error decomposition, [8] uses the result of [9] to show that the distribution of the difference between generic PF estimate and the conditional mean is asymptotically normal in scalar cases as the number of particles gets large. Recently, [10] conducted error analysis specifically on the linear feedback PF, which, as a special variant of PF, includes a feedback control for particles. However, whether a similar result holds for the multivariate case remains unclear and the theoretical foundation for the second part of the error term also remain to be explored.
In this paper, we provide an error analysis for a generic type of PF for which re-sampling is performed at every step, with a focus on the first part of the error decomposition (1). In fact, we will extend the work of [8] to allow for the analysis of the difference between estimator and conditional mean under the multivariate case. After a rigorous derivation, we show that the first error term will converge asymptotically to the multivariate normal distribution as the number of particles gets large. Then, we verify the above result on two nonlinear and/or non-Gaussian discrete state-space cases.
The reminder of this paper is arranged as follows: The second section will be the statement of problem setting and clarification of notation. The third section will be the mathematical analysis showing the asymptotic normality for the partial error term. The fourth section will be the numerical verification on two examples and the last section will the conclusion and discussion for future work.
II Problem Statement and Particle Filter
II-A Discrete Time-State-Space Model
Consider the discrete time state-space model (DSSM) with the state equation and the measurement equation as follows:
[TABLE]
where and are the true states and the measurements, respectively, with and being the noise terms in the state equation and measurement equation, is a possibly nonlinear function of state , and is a possibly nonlinear function of .
Consider as a hidden Markov process (HMP), , . Denote the historical records of true states and measurements by and . Then, for this filtering problem, the goal is to calculate . In the nonlinear and/or non-Gaussian cases, Bayesian filter updates its estimators using the following recursive form:
: using information of to predict
[TABLE]
: using information of to adjust
[TABLE]
II-B Particle Filter
For the nonlinear and non-Gaussian DSSM, the integration of the posterior density, as required in computing , is often intractable. Hence there is usually no closed-form solution to . However, the PF can be used to represent the posterior density by a set of randomly (re)sampled weighted particles generated by the Monte Carlo method, and the particles can be averaged to form an estimator of the expectation of interest. Let the number of particles be and the particle at time be . The realization of PF relies heavily on the principle of importance sampling.
Suppose is our target possiblility density function (p.d.f), and is the proposal p.d.f. Then the unnormalized weights are:
[TABLE]
From (II-B), the unnormalized weight can be updated recursively as:
[TABLE]
Finally, let be the normalized weights, which we use to construct the PF estimator.
To deal with the problem of degeneration, the situation where all but a few particles have zero importance weight, we can re-sample the particles. Ref. [2] discussed several re-sampling schemes in the PF and in this study we consider the most commonly used one, multinomial re-sampling scheme. That is, at time , we first update particles from last step by , where, to be consistent with the notation of [8], we use to denote the particles before resampling. Then, we draw paths from } with probability , where we denote the records before resampling as: , and records after resampling as: .
In this study, we focus on the particular situation where the above re-sampling process is done at every step and we assign new weights to the re-sampled particles. Moreover, the total number of particles remains unchanged as .
II-C Notation
For terminal time point , the conditional density function in the hidden Markov model implies:
[TABLE]
The likelihood ratio in the importance sampling process is:
[TABLE]
For computational convenience, we also define the following quantities:
[TABLE]
where is unnormalized weight of the particle path before resampling, and
[TABLE]
where is unnormalized weight of the particle path after resampling.
Following the notation of [8], we denote the ”ancestry origin” of a particle by to keep track of it and it is defined as follows: for all by definition. If and , , share the same first state vector in time, (i.e. ), ), and ) they have the same ancestral particle, which implies .
Finally, let
[TABLE]
denote the history information generated by the particles at the step. Our definition of such history is in line with the decomposition of variance in the following analysis, which is aimed at constructing a nice martingale structure when proving the asymptotic normality of .
III Mathematical Analysis
This section includes the main formal results that justify our approach. Due to space limitation here, complete proofs are given in the appendix.
III-A Theorem Statement
We state our main theorem below and then give a proof in Sec.III.C. Let the function and be as follows:
[TABLE]
[TABLE]
and same as our previous work [14], we define , where
[TABLE]
Theorem: Assume the HMP as (2), for PF estimator obtained by resampling at each step, and for all . Then, , as .
III-B Estimator
In the PF algorithm, the true estimator of is:
[TABLE]
To show asymptotically normality of , we need an estimator that has nice martingale properties to find its asymptotic variance. Hence, we re-express (12) by (13) below based on the following rationale, and then prove (13) converges to (12) as gets large and has nice martingale structure. To facilitate that derivation, we first represent as
[TABLE]
Furthermore, it is easier to derive the asymptotic variance of .
Note that and have the same (normalized) limiting distribution. However, (13) can not be used in practice because it contains normalizing constants which is often unknown.
Next, we provide the reasoning behind (13) as follows:
[TABLE]
where is the measure defined on space of all records corresponding to probability density function (note that ).
Normalizing the right hand side of (4), we know that:
[TABLE]
Combining with (14), we have:
[TABLE]
Then,
[TABLE]
[TABLE]
By lemma 2, which we will state and prove later, as , and we have that converges to the true PF estimator in probability as .
III-C Proof for Theorem
Here, we provide a rigorous proof for the theorem of Sec.III.A. To begin with, we propose the following two lemmas.
Lemma 1: Let be a measurable vector function from history of state-space ( is time and is dimension of state) to , where . For any , we define as (10). Then,
(i) if , where for any vector , as ,
[TABLE]
(ii) if , as ,
[TABLE]
Proof: Here we only give the lemma statement. All proof details are in the appendix.
Lemma 2: If the same conditions in Lemma 1 are satisfied, then
[TABLE]
Furthermore, if is a vector function from ( is time and is dimension of state) to , and , then
[TABLE]
Proof: We only give the lemma statement here. All proof details are in the appendix.
III-C1 Conditional Distribution
First, let us consider the distribution of conditional on , and we will show that it is asymptotically normal as the number of particles gets large.
According to (7), we have
[TABLE]
[TABLE]
where is the number of copies generated in the resampling process for particle path .
Combining (18) and (19), we have:
[TABLE]
[TABLE]
Then,
[TABLE]
where
[TABLE]
Next, we prove that the above (22) is a martingale difference sequence. Firstly, according to the importance sampling, the conditional distribution of for given are independent with having the density function . Secondly, in the resampling process, the conditional distribution of for given are i.i.d that can take on the values with probability .
From (22), noticing that contains information of , we have
[TABLE]
Also, we have
[TABLE]
Thus, carry out this process and finally we have that is a martingale difference sequence. Additionally, we have are independent conditioning on .
Rearranging (21), we have
[TABLE]
Then, given any vector , we have
[TABLE]
[TABLE]
By (23) and (24), for any , applying lemma 1 and 2, we have
[TABLE]
[TABLE]
as
Therefore, by multivariate Lindeberg’s central limit theorem [15], the conditional distribution converges to normal distribution:
[TABLE]
Where means that conditional on converge in distribution to . Although of is a function of history records, as the number of particles increase, the distribution of conditional on becomes stable. Therefore, the variance of this distribution is a constant.
III-C2 Unconditional Distribution
In this part, we want to show that is asymptotically normal. In particular, our goal is to show unconditional asymptotic distribution of converges to the same characteristic function as that of normal distribution.
[TABLE]
Specifically, we want to prove that:
[TABLE]
The proof details are shown in the appendix. In supporing matrial, we showed that (48) holds. Therefore, the unconditional distribution of is asymptotically normal with mean zero and covariance matrix . Since we have shown that, , we can conclude that .
IV Numerical Study
In this part, let us test the asymptotic normality of by studying the following two examples. One example has linear state and measurement equations but non-Gaussian noise terms. The other example is the classic multivariate stochastic volatility model in finance. It is a nonlinear and Gaussian system. The selection of these two example considers the simplicity of interpretation and wide-spread use of similiar system.
IV-A Linear and Non-Gaussian
Consider the following linear DSSM with non-Gaussian measurement error,
[TABLE]
[TABLE]
where and are independent noise terms with each component following a distribution. We approximate the conditional mean by PF using particles. Meanwhile, we generate the PF estimators by the algorithm using particles.
We run the simulation for 500 times with terminal . The histograms of each component of are follows:
Using the Jarque-Bera test of normality [12], the -values for each component in the error term are respectively 0.2795, 0.2438 and 0.2138. Then, we do not reject the null hypothesis of normality for all three components at the significance level of 0.05 (no adjustment for multiple comparisons here).
This result is in line with our theorem that follows an asymptotic normal distribution.
IV-B Nonlinear and Gaussian
Next, let us consider a slightly more complex model: Multivariate Stochastic Volatility model. As stated in [13], it is a classical approach to model the underlying volatility of financial derivatives using observable variables. Let denote the volatility vector, and be the observation vector. The system can be expressed as:
[TABLE]
[TABLE]
where denote the mean of state vector and denote a matrix with each element being constant. and denote the multivariate normal noise terms.
This this example, consider the case when being standard normal. Similarly, conditional mean is approximated by PF using particles and we generate the PF estimators by the algorithm using particles.
Run the simulation for 500 times and the histograms of each component of is shown as follows:
This time, the -value for each component in the error term are respectively 0.0584, 0.1799 and 0.8063. Thus, we do not reject the null hypothesis of normality for all three components at the significant level of 0.05 (again, no adjustment for multiple comparisons here).
V CONCLUSIONS AND DISCUSSION
From the above analysis and numerical results, we come to the conclusion that is asymptotically normal as the number of particles gets sufficiently large. For further work, we will consider a computable approximation for the covariance matrix in the asymptotic distribution, which we discuss in another work [14]. Moreover, as is stated in the framework (1), since the second part of the error decomposition is not necessarily normal, we will focus more on providing a reasonable bound for it.
Appendix
V-A Lemma 1:
Let be a measurable vector function from history of state-space ( is time and is dimension of state) to , where . For any , we define as following:
[TABLE]
where is unnormalized weight defined in equation (II-B). Then,
(i) if , where for any vector , as ,
[TABLE]
(ii) if , as ,
[TABLE]
Proof: This lemma can be proved by induction: first, we show that if (ii) holds for , then (i) holds for . Then, we prove that if (i) holds for , (ii) holds for using the same method.
We declare some notation first for computational convenience: for any two real value function and , for all , for all , and for all . For any two function and from to , define:
[TABLE]
where for all , is a real-valued function defined on . Define , , then we have . So, without loss of generality, we can assume that for all .
First, check that (i) holds for . For this, notice that when , . Then, (i) can be expressed as:
[TABLE]
Notice that has finite variance since (by (i)). By the weak law of large numbers, for all .
Thus,
[TABLE]
Equivalently, we have
[TABLE]
Next, we want to show that if (ii) holds for , then (i) holds for . This can be proved by a contradictory argument.
Denote . And, in contrast to (i), there exists , , such that for all ,
[TABLE]
In fact, we can find such that for all ,
[TABLE]
where . The reason behind above equation is: as , the left hand side of (32) increases to 1 and the right hand side of (32) decreases to 0. Thus, we can always find some to satisfy (32).
Now, let us decompose into following three parts, which are easier to be computed and bounded:
By (30) and (31), . Thus, we can write as:
[TABLE]
where
[TABLE]
Note the following facts:
[TABLE]
Then,
[TABLE]
Since is a projection of to the orthogonal subspace of (it means: ), we have:
[TABLE]
[TABLE]
Applying (ii) to , we have
[TABLE]
and
[TABLE]
From (37) and (38), it follows that for m sufficiently large,
[TABLE]
Using the same trick by applying (ii) to ,we can prove:
[TABLE]
Applying (ii) to
[TABLE]
Then,
[TABLE]
In particular, we can choose such that
[TABLE]
Then,
[TABLE]
To assure this result, we require , but this can be achieved by choosing large enough. Thus, with probability at least ,
[TABLE]
Combining (39), (40) and (41), we can conclude that:
[TABLE]
Contradiction! Therefore, If (ii) for , (i) for . Analogously, we can prove that if (i) holds for , (ii) holds for .
V-B Lemma 2:
If the same conditions in Lemma 1 are satisfied, then
[TABLE]
in above equation is defined by (7). Furthermore, if is a vector function from ( is time and is dimension of state) to , and , then
[TABLE]
Proof: Considering the special case when , by Lemma 1 (i), we have
[TABLE]
Then,
[TABLE]
Therefore,
[TABLE]
Similarly, we have
[TABLE]
Applying Lemma 1 (i) to for ,
[TABLE]
For arbitrary M, when m large enough.
[TABLE]
As , . Therefore,
[TABLE]
V-C Unconditional Distribution
In this part, we show that is asymptotically normal by induction.
By equation (27), we have that
[TABLE]
Then, our goal is to show unconditional asymptotic distribution of converges to the same characteristic function as that of normal distribution.
[TABLE]
Specifically, we want to prove that:
[TABLE]
First, let us check that when , (V-C) is automatically satisfied since
[TABLE]
Next, assuming that when , it is satisfied:
[TABLE]
Then, at , it follows that:
[TABLE]
From above, we have shown that (48) holds. Therefore, the unconditional distribution of is asymptotically normal with mean zero and covariance matrix . Since we have shown that, , we can conclude that .
Literatur
- [1] N. Gordon, D. Salmond, and A. Smith, ”Novel approach to nonlinear/non-Gaussian Bayesian state estimation”, IEE Proceedings F Radar and Signal Processing, vol. 140, no. 2 , pp. 107, 1993.
- [2] R. Douc and O. Cappe, ”Comparison of resampling schemes for particle filtering,” ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005.
- [3] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, ”Stochastic stability of the discrete-time extended Kalman filter”, IEEE Transactions on Automatic Control, vol. 44, no. 4, pp. 714–728, 1999.
- [4] J. C. Spall, ”The Kantorovich inequality for error analysis of the Kalman filter with unknown noise distributions,” Automatica, vol. 31, pp. 1513–1517, 1995.
- [5] John L. Maryak, James C. Spall, and Bryan D. Heydon, ”Use of the Kalman filter for inference in state-space models with unknown noise distributions”, IEEE Transactions on Automatic Control vol. 49 pp. 87–90, 2004
- [6] P. D. Moral and A. Guionnet, ”Central limit theorem for nonlinear filtering and interacting particle systems”, Annals of Applied Probability, vol. 9, no. 2, pp. 275–297, 1999.
- [7] X. Han, J. Li, and D. Xiu, ”Error analysis for numerical formulation of particle filter”, Discrete and Continuous Dynamical Systems - Series B, vol. 20, no. 5, pp. 1337–1354, 2015.
- [8] H. P. Chan and T. L. Lai, ”A general theory of particle filters in hidden Markov models and some applications”, Annals of Statistics, vol. 41, no. 6, pp. 2877–2904, 2013.
- [9] H. P. Chan and T. L. Lai, ”A sequential Monte Carlo approach to computing tail probabilities in stochastic models”, The Annals of Applied Probability, vol. 21, no. 6, pp. 2315–2342, 2011.
- [10] A. Taghvaei and P. G. Mehta, ”Error Analysis for the Linear Feedback Particle Filter”, Proc. American Control Conference (ACC), 2018.
- [11] M. Asai, M. Mcaleer, and J. Yu, ”Multivariate Stochastic Volatility: A Review,” Econometric Reviews, vol. 25, no. 2–3, pp. 145–175, 2006.
- [12] T. Thadewald and H. Büning, ”Jarqu-Bera Test and its Competitors for Testing Normality-A Power Comparison”, Journal of Applied Statistics, vol. 34, no. 1, pp. 87–105, 2007.
- [13] M. Asai, M. Mcaleer, and J. Yu, ”Multivariate Stochastic Volatility: A Review”, Econometric Reviews, vol. 25, no. 2–3, pp. 145–175, 2006.
- [14] Z. Liu and J. C. Spall, ”Error Estimation for the Particle Filter,” Proceedings of the 53rd Annual Conference on Information Sciences and Systems, Baltimore, MD, 20-22 March 2019.
- [15] T. Ferguson, A Course in Large Sample Theory, Chapman & Hall, 1996.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Gordon, D. Salmond, and A. Smith, ”Novel approach to nonlinear/non-Gaussian Bayesian state estimation”, IEE Proceedings F Radar and Signal Processing, vol. 140, no. 2 , pp. 107, 1993.
- 2[2] R. Douc and O. Cappe, ”Comparison of resampling schemes for particle filtering,” ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005.
- 3[3] K. Reif, S. Gunther, E. Yaz, and R. Unbehauen, ”Stochastic stability of the discrete-time extended Kalman filter”, IEEE Transactions on Automatic Control, vol. 44, no. 4, pp. 714–728, 1999.
- 4[4] J. C. Spall, ”The Kantorovich inequality for error analysis of the Kalman filter with unknown noise distributions,” Automatica, vol. 31, pp. 1513–1517, 1995.
- 5[5] John L. Maryak, James C. Spall, and Bryan D. Heydon, ”Use of the Kalman filter for inference in state-space models with unknown noise distributions”, IEEE Transactions on Automatic Control vol. 49 pp. 87–90, 2004
- 6[6] P. D. Moral and A. Guionnet, ”Central limit theorem for nonlinear filtering and interacting particle systems”, Annals of Applied Probability, vol. 9, no. 2, pp. 275–297, 1999.
- 7[7] X. Han, J. Li, and D. Xiu, ”Error analysis for numerical formulation of particle filter”, Discrete and Continuous Dynamical Systems - Series B, vol. 20, no. 5, pp. 1337–1354, 2015.
- 8[8] H. P. Chan and T. L. Lai, ”A general theory of particle filters in hidden Markov models and some applications”, Annals of Statistics, vol. 41, no. 6, pp. 2877–2904, 2013.
