Analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in state-space models
Joaquin Miguez, Ines P. Mari\~no, Manuel A. Vazquez

TL;DR
This paper provides a rigorous convergence analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in state-space models, demonstrating optimal convergence rates even with approximate importance weights.
Contribution
It offers the first theoretical proof of convergence for the nonlinear population Monte Carlo method, including the optimal rate and the property of exact approximation.
Findings
Convergence of the NPMC scheme is almost sure with rate M^{-1/2}.
The scheme achieves optimal Monte Carlo convergence despite constant mean error in importance weights.
Simulation confirms theoretical convergence properties in a target tracking model.
Abstract
The Bayesian estimation of the unknown parameters of state-space (dynamical) systems has received considerable attention over the past decade, with a handful of powerful algorithms being introduced. In this paper we tackle the theoretical analysis of the recently proposed {\it nonlinear} population Monte Carlo (NPMC). This is an iterative importance sampling scheme whose key features, compared to conventional importance samplers, are (i) the approximate computation of the importance weights (IWs) assigned to the Monte Carlo samples and (ii) the nonlinear transformation of these IWs in order to prevent the degeneracy problem that flaws the performance of conventional importance samplers. The contribution of the present paper is a rigorous proof of convergence of the nonlinear IS (NIS) scheme as the number of Monte Carlo samples, , increases. Our analysis reveals that the NIS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Distributed Sensor Networks and Detection Algorithms · Fault Detection and Control Systems
Analysis of a nonlinear importance sampling scheme for Bayesian parameter estimation in state-space models
Joaquín Míguez†
Inés P. Mariño⋆
Manuel A. Vázquez†
†Department of Signal Theory & Communications, Universidad Carlos III de Madrid. Avenida de la Universidad 30, 28911 Leganés, Madrid, Spain.
⋆Department of Biology and Geology, Physics and Inorganic Chemistry, Universidad Rey Juan Carlos. C/ Tulipán s/n, 28933 Móstoles, Madrid, Spain.
Abstract
The Bayesian estimation of the unknown parameters of state-space (dynamical) systems has received considerable attention over the past decade, with a handful of powerful algorithms being introduced. In this paper we tackle the theoretical analysis of the recently proposed nonlinear population Monte Carlo (NPMC). This is an iterative importance sampling scheme whose key features, compared to conventional importance samplers, are (i) the approximate computation of the importance weights (IWs) assigned to the Monte Carlo samples and (ii) the nonlinear transformation of these IWs in order to prevent the degeneracy problem that flaws the performance of conventional importance samplers. The contribution of the present paper is a rigorous proof of convergence of the nonlinear IS (NIS) scheme as the number of Monte Carlo samples, , increases. Our analysis reveals that the NIS approximation errors converge to 0 almost surely and with the optimal Monte Carlo rate of . Moreover, we prove that this is achieved even when the mean estimation error of the IWs remains constant, a property that has been termed exact approximation in the Markov chain Monte Carlo literature. We illustrate these theoretical results by means of a computer simulation example involving the estimation of the parameters of a state-space model typically used for target tracking.
keywords:
Importance sampling; population Monte Carlo; state space models; Bayesian inference; adaptive importance sampling; parameter estimation.
\usetkzobj
all
1 Introduction
The estimation of the static unknown parameters of state-space dynamic models is a classical problem in statistical signal processing [1, 2, 3, 4, 5, 6] which has also received considerable attention, very recently, from the computational statistics community [7, 8, 9] (see also [10] for a recent survey) partly because of the ubiquity of the problem in science and engineering and partly because of the availability of more powerful computational resources to address it.
The particle Markov chain Monte Carlo (pMCMC) method originally proposed in [7] has been rapidly adopted by researchers in signal processing [11, 12, 6, 13, 14]. This is a Markov chain Monte Carlo (MCMC) algorithm [15] where the target probability density function (pdf) is the posterior density of the unknown parameters conditional on the available observations. This pdf is analytically intractable and, hence, it is approximated (for each element of the chain) via particle filtering [16, 17, 18, 19, 20]. The most popular MCMC schemes (including Metropolis and Metropolis-Hastings algorithms) admit a pMCMC implementation. A key feature of these methods is that they have the so-called exact approximation property. This means that, even if the acceptance test of the MCMC algorithm is only approximate (since the true target pdf is intractable), the stationary distribution of the Markov chain is still actual posterior density of the parameters. While popular, pMCMC procedures suffer from the same limitations as regular MCMC schemes [15, 21]:
Convergence of the chain is purely asymptotic (no convergence rates are known) and potentially very slow (a problem made worse by the particle approximation).
- 2.
The Monte Carlo samples in the chain are correlated, which reduces the accuracy of estimators compared to methods that produce independent samples.
- 3.
If the target pdf is multimodal, MCMC algorithms may get trapped in local maxima of the function.
An alternative to pMCMC methods is to employ schemes based on importance sampling (IS) [21]. This class of techniques includes population Monte Carlo (PMC) [22], the sequential Monte Carlo square (SMC2) of [23] or the nested particle filter of [9]. PMC is an iterative IS scheme in which the proposed functions used to generate Monte Carlo samples (and, hence, to approximate the posterior probability distribution of the unknown parameters) are improved across the iterations of the algorithm. See [24, 25, 26, 27] for recent applications, and new developments, of this methodology in statistical signal processing. SMC2 is a generalisation of the iterative batch importance sampling (IBIS) algorithm of [28]. It mimics the standard particle filter, but the Monte Carlo samples are drawn from the space of the (static) parameters and they are sequentially updated using a pMCMC kernel. All these methods, including SMC2, are batch, meaning that the whole record of observations is typically processed many times. A purely recursive version of the SMC2 algorithm has been proposed in [9]. The reduction in computational complexity, however, is obtained at the expense of a reduction in the convergence rate of the algorithm. It is worth mentioning that all these techniques (including pMCMC) can be fit within the theoretical framework of sequential Monte Carlo samplers introduced in [29].
The key feature of IS-based methods is that the Monte Carlo samples (used to approximate the target distribution) are generated from almost-arbitrary proposal functions and then assigned importance weights (IWs). While this is a very flexible approach, it suffers from the well-known problem of degeneracy of IWs [30, 18, 21, 8]: when the target pdf is concentrated in a very small region of the space of the unknowns, the largest IW tends to be orders of magnitude greater than all other IWs. As a result the IS-based scheme practically yields a degenerate one-sample approximation.
In this paper we address the analysis of the nonlinear population Monte Carlo (NPMC) algorithm proposed in [8]. In the latter scheme, the IWs undergo a nonlinear transformation to control their variance and, in this way, mitigate the degeneracy problem. In [8] it was proved that the approximation of the target distribution produced at each iteration of the NPMC method converges asymptotically, with the number of Monte Carlo samples , and almost surely (a.s.). Therefore, the weight transformation preserves asymptotic convergence, while it has been shown through numerical examples that performance for finite is consistently improved compared to conventional PMC procedures. The analysis in [8], however
relies on the exact computation of the IWs, which is not feasible for general state-space models,
- 2.
and does not provide explicit convergence rates111Error rates are found in [8] for convergence in probability (not for almost sure convergence) when the IWs are computed exactly.
In this paper we analyse the performance of NPMC methods for the Bayesian estimation the unknown parameters of state space models. Based on some unbiasedness properties of particle filters, we prove that IS with nonlinearly-transformed IWs also yields asymptotic convergence when the weights are approximate, i.e., computed via a particle filter with a fixed computational budget that introduces non-vanishing errors. In other words, we prove that the nonlinear importance sampler enjoys the same exact approximation property as pMCMC and SMC2 algorithms. Moreover, the analysis of this paper also extends considerably the results of [8] by obtaining an explicit (and almost sure) estimation error rate of order , where is an arbitrarily small constant. This result holds for approximate weights and under mild assumptions typical of classic IS analyses. It is worth mentioning that the analytical approach developed in this paper can be applied, in a rather natural way, to the study of recently proposed PMC-like algorithms [25, 31] when the target distribution is the posterior density of the parameters of a state space model.
The rest of the paper is organised as follows. The necessary background material, including notation, state-space models and particle filters, is presented in Section 2. The nonlinear IS scheme and its iterative implementation (the NPMC algorithm) are detailed in Section 3 for the case in which the target probability distribution is the posterior distribution of the unknown parameters of a state-space model. In Section 4 we introduce the new analytical results on the convergence of nonlinear importance samplers, which is the main contribution of the paper. We illustrate the exact approximation property, and numerically compare the NPMC algorithm with a pMCMC scheme through computer simulations for a target tracking model in Section 5. Finally, some brief concluding remarks are made in Section 6.
2 Background and problem statement
2.1 State-space model
A Markov state-space model consists of two sequences of random variables (r.v.’s), and . The first sequence, , is termed the system state. We assume it takes values on some space , hence is a random vector. The state dynamics are described by a prior probability measure and a sequence of Markov kernels that depend on a parameter vector . In this paper, is assumed unknown and modelled as a random vector, with prior pdf with respect to (w.r.t.) the Lebesgue measure. The support set of the parameter vector, , is assumed to be compact.
The state cannot be observed directly. Instead, some noisy observations , , are collected. We note that is a vector, with in general.
We assume that the observations are conditionally independent given the system states and the parameter vector , with a conditional pdf w.r.t. the Lebesgue measure, denoted , which depends on the parameter vector as well.
2.2 The optimal filter and its Monte Carlo approximation
Let denote the sequence of observations collected up the time . The posterior probability measure of the state conditional on the observations and the parameter vector is denoted , i.e., for any Borel set ,
[TABLE]
is the posterior probability of the event “”, given and .
Similarly, denotes the posterior probability measure of conditional on and (i.e., not including ). This is often referred to as the one-step-ahead predictive measure ([32], Chapter 10). For a Borel set ,
[TABLE]
is the posterior probability of the event “”, given and .
We refer to as the optimal filter conditional on the parameter vector . It is not possible, in general, to obtain either or in closed-form (with the notable exception of linear-Gaussian state space models, for which and are computed recursively and exactly using the Kalman flter [33]) and, therefore, numerical approximation algorithms are needed. One of the most popular schemes is the standard particle filter, also known as bootstrap filter (BF) [16, 34, 18].
The BF with particles (i.e., Monte Carlo samples on the state space ) conditional on a given parameter vector can be briefly outlined as follows.
Initialisation. Draw samples from the prior distribution . The particle approximation of is
[TABLE]
where denotes the Dirac delta measure centred at . 2. 2.
Recursive step. Given the approximation , take the following steps:
- (a)
Randomly propagate each particle using the Markov kernel in the model, i.e., draw from , . 2. (b)
Compute IWs, , for , and 3. (c)
normalise them as
[TABLE] 4. (d)
Resample: draw times independently from the discrete distribution and denote the resulting samples as . Construct the unweighted approximation .
The resampling step (d) above can be implemented in a number of different ways (see, e.g., [35, 32] or [20] for a brief survey of methods). Here, for simplicity, we have adopted a scheme which is often referred to as multinomial resampling [18, 35] but most asymptotic convergence results hold true for several other schemes as well [36, 32]. The measure-valued r.v. is an approximation of the optimal filter (conditional on ). Let us use the shorthand
[TABLE]
for the integral of a real function w.r.t. a measure . Under very mild assumptions it can be shown that
[TABLE]
almost surely (a.s.) for any bounded function [36, 32]. Moreover, if we denote , indicates the expected value of a r.v. and is its norm (), then it can be proved [37] that
[TABLE]
where is a constant independent of and
[TABLE]
The algorithm also produces a Monte Carlo approximation of the predictive measure , namely
[TABLE]
If we write for the complete sequence of observations up to time , it turns out that the conditional pdf of given the parameter vector , denoted , can be written in terms of integrals w.r.t. to the predictive measures , . To be specific,
[TABLE]
where
[TABLE]
The conditional pdf is the likelihood of the parameter vector given the available data and the BF yields the straightforward estimator
[TABLE]
which can be shown to be unbiased (i.e., ) under very mild assumptions ([36], Theorem 7.4.2).
2.3 Problem statement
Let be the available data set, with . Our goal is to approximate the probability measure associated to the posterior pdf of the parameter vector, , given the data, . We denote this pdf as and it is straightforward to show, using Bayes’ theorem, that
[TABLE]
where, we recall, is the prior pdf of .
In the next section, we describe an iterative importance sampling algorithm, originally introduced in [8], for the approximation of .
3 Algorithm
The NPMC algorithm of [8] is an iterative importance sampling (IS) scheme that seeks to approximate a target probability distribution, in our case given by the posterior pdf , using weighted Monte Carlo samples. It generates a sequence of proposal pdf’s , , from which samples can be drawn and importance weights (IWs) can be computed. This sequence of proposals is expected to yield increasingly better approximations of the target as the algorithm converges. The key feature of the NPMC method, which departs from the classical PMC technique of [22], is to compute a set of transformed importance weights (TIWs) by applying a nonlinear function to the standard IWs. The aim of this transformation is to mitigate the well-known problem of the degeneracy of the IWs (common to many IS methods, see [18, 8]) by controlling the weight variability.
For the case of general state space models, an additional difficulty encountered when trying to estimate the unknown model parameters (denoted in our setup) is that the likelihood is intractable. In the last few years, though, it has become a common approach to approximate this likelihood via particle filtering (PF) (see, e.g., [8, 7, 38, 23]). To be specific, we let stand for the approximation of computed using a standard bootstrap filter (BF) [16, 39] with particles (see equation (12) in Section 2.2). One key feature of this approach, that we exploit for our analysis in Section 4, is that can be proved to be an unbiased estimator of [36, 40].
The NPMC algorithm applied to a state space model, with iterations, Monte Carlo samples per iteration, plain Gaussian proposals , and approximate likelihoods is outlined below.
Initialisation. Draw i.i.d. samples from the prior pdf . Then,
compute non-normalised IWs , , 2. 2.
compute TIWs as , where is a nonlinear transformation, and 3. 3.
normalise the TIWs, , .
Iteration. For , take the following steps:
Let be a multivariate Gaussian pdf with mean vector and covariance matrix obtained, respectively, as
[TABLE]
Note that the random variates , , are vectors. The superscript ⊤ denotes transposition. 2. 2.
Draw i.i.d. samples , , from . 3. 3.
Compute IWs, , . 4. 4.
Compute TIWs, , , using the same nonlinear map as for . 5. 5.
Normalise the TIWs, , .
Following [8], the nonlinear map of choice is a “clipping” transformation. In particular, let be a permutation of the indices such that the IWs become ordered, namely . The clipping transformation , with parameter , flattens the largest IWs and makes them equal to the -th non-normalised IW, . Specifically, for each , we obtain
[TABLE]
Other choices of are possible (e.g., tempering schemes) but clipping has been found particularly effective in practice [8]. The choice of Gaussian proposals (in step 1 of the Iteration) is made merely for simplicity. Other (more efficient) possibilities exist, but we stick to this formulation as it is sufficient for the purpose of this paper.
Given , being the support set of the parameter vector described in Section 2, let denote the posterior probability measure (conditional on the observed data ) associated to the parameter vector . This measure yields the full probabilistic description of given the available observations. If is available, then we can compute various types of estimators and assess the associated errors. For example, the posterior-mean estimator is
[TABLE]
and it minimises the mean square error (MSE). For an arbitrary estimator , the MSE can also be written as an integral w.r.t. , namely,
[TABLE]
The proposed NPMC algorithm yields a sequence of importance sampling (i.e., weighted Monte Carlo) approximations of . To be specific, at each iteration we obtain the random probability measure
[TABLE]
where denotes the Dirac delta measure centred at . Using we can approximate any parameter estimator. For instance, is the approximation of the posterior mean . The corresponding minimum MSE can also be approximately computed as
[TABLE]
In the next section we analyse the convergence of the approximate measure as in a single iteration (i.e., for a given ) when the number of particles used to approximate the likelihood via the BF (i.e., the estimate of ) is kept constant and finite.
4 Analysis
Consider a single iteration in the NPMC algorithm, with a fixed importance density . We refer to the random measure computed via the TIWs , , as a nonlinear importance sampling (NIS) approximation of . Our aim in this section is to assess whether converges towards the true measure or not as . To do this, there are two issues that need to be handled and make the analysis more difficult compared to a conventional IS method (that relies on the standard IWs, rather than the TIWs). These issues are:
- (i)
the distortion in the Monte Carlo approximation due to the clipping of the weights, which introduces additional bias (compared to the use of standard IWs); and
- (ii)
the impossibility to compute the IWs, and hence the TIWs, exactly, since the likelihood is intractable and we work with the particle approximation instead.
In [8] it was proved that, when the IWs can be computed exactly, the NIS approximation converges almost surely (a.s.) towards the target probability measure as , which accounts for (i) above222The analysis of [8] does not provide an error rate, though. Such rate is explicitly derived in this paper. The problem of the approximate computation of the weights was partially addressed in [41], for a relatively simple case where the errors in the IWs where assumed deterministic and bounded. However, the estimation problem studied in [41] (parameter estimation for -stable distributions using iid data) did not involve any dynamics and the convergence analysis only showed an upper bound for the approximation errors that included a deterministic constant, namely a non-vanishing term proportional to the approximation error of the IWs.
Here, we show stronger analytical results that ensure the almost sure convergence of the NIS approximation when and the likelihood function can only be estimated as , i.e., using a BF with a finite and fixed number of particles . Under assumptions which are standard in the classical IS theory, we prove that integrals of the form converge towards a.s. as and provide explicit error rates.
4.1 Notation
Since we focus our attention in the NIS scheme alone, i.e., a single iteration of the proposed algorithm, in the remaining of this section we drop the iteration index . Hence, we assume a fixed importance density , from where independent Monte Carlo samples, , are drawn. Since the observations are assumed arbitrary but fixed, we drop them from the likelihood notation and write
[TABLE]
Similarly, we simplify the notation for the posterior pdf and write and . Then, the non-normalised IWs are approximated as
[TABLE]
where we have introduced the weight function as a shorthand. This weight function is a random approximation of the deterministic function . The support of is the same as the support of , and , denoted . We assume that for every as well (a standard assumption in classical IS). It is also apparent that , where is the posterior pdf, and the proportionality constant is independent of .
The non-normalised TIWs computed via the clipping function (15) are denoted
[TABLE]
where represents function composition and we omit the index argument of (15) for conciseness (its value is clear from the notation in any case). The normalised TIWs are , and they are used to compute the approximate measure .
4.2 Assumptions and a preliminary result
Let the state sequence take values on . We make the following classical assumptions on the conditional pdf of the observations , , the prior density of the parameters, , and the importance function .
Assumption 1
The observation sequence is arbitrary but fixed. The functions , , are uniformly bounded, i.e., there exists a finite and positive constant such that
[TABLE]
Assumption 2
The ratio of pdf’s is bounded on , i.e., there exists a positive and finite constant such that
[TABLE]
Remark 1
If the parameter support set is compact, then A.1 and A. 2 hold naturally for most models of practical interest.
The following lemma plays a key role in the asymptotic convergence analysis of the approximation . It states that is an unbiased estimator of the likelihood and enables us to show that the NIS scheme converges when , even if the number of particles in the approximation remains finite and constant.
Lemma 1
If Assumption 1 holds then
[TABLE]
independently of .
Proof. From the definition of in Eq. (10) and its estimator in Eq. (12), it is clear that both and when is the number of available observations. The fact that is unbiased is a consequence of [36, Theorem 7.4.2] (see also [40, Lemma 2] for an alternative proof that does not rely on the Feynmann-Kac framework). \qed
4.3 Asymptotic convergence, error rates and exact approximation
In the sequel we look into the approximation of integrals of the form
[TABLE]
where is a bounded real function on the parameter space . We use to denote the supremum norm of a bounded function, while the set of bounded functions on is denoted . The approximations of interest are
[TABLE]
for any .
The following theorem yields an explicit upper bound for the (random) approximation error . The bound is proportional to (for an arbitrarily small ) and, therefore, it vanishes as , independently of the number of particles used in the approximate likelihoods .
Theorem 1
Assume that A.1 and A.2 hold, and . Then, for every (arbitrarily small) and every there exists a positive and a.s. finite r.v. , independent of and , such that
[TABLE]
In particular, a.s.
Proof. Recall the intractable weight function and its random estimator . The integral of any w.r.t. the posterior measure can be written as
[TABLE]
by simply noting that . Similarly, for the random measure we can write
[TABLE]
where is the Monte Carlo approximation of the proposal distribution (with pdf ) and denotes composition of functions, hence is the transformed weight associated to .
Given equations (29) and (30) it is straightforward to show that
[TABLE]
Since and , where by assumption, Eq. (31) readily yields
[TABLE]
and, therefore, the problem of calculating bounds for reduces to the problem of computing bounds for errors of the form
[TABLE]
for .
Choose any . A simple triangle inequality yields
[TABLE]
It is straightforward to obtain an upper bound for the first term on the right hand side of the inequality (34). Indeed, by construction of (see Eq. (15)) we readily obtain
[TABLE]
where the inequality follows from the bound , which is a straightforward consequence of assumptions A.1 and A.2 and the definition of the estimate produced by the BF (see Eq. (12)).
Finding a suitable bound for the second term on the right hand side of the inequality (34) takes some more effort. Choose, again, any . A simple triangle inequality yields
[TABLE]
Since , for the second term on the right hand side of (36) we can write
[TABLE]
where the r.v.’s
[TABLE]
are independent, with zero mean (recall the ’s are i.i.d. draws from ) and bounded, because is bounded and A.1 and A.2 imply that . Therefore, it is an exercise in combinatorics to show that
[TABLE]
where is a constant independent of and . Combining (38) with (37) readily yields
[TABLE]
The inequality (39) implies that there exists an a.s. finite r.v. such that
[TABLE]
where is an arbitrarily small constant independent of (see [42, Lemma 4.1]).
If we expand the first term on the right hand side of (36) we arrive at
[TABLE]
where the r.v.’s , , are independent (because the samples are independent) and zero mean, as a result of Lemma 1333Note that , because is an unbiased estimator of , hence .. Since they are also bounded, namely as a consequence of A.1 and A.2, it is again an exercise to show that (41) implies
[TABLE]
in the same manner as we obtained the inequality (38). Resorting again to [42, Lemma 4.1], from (42) we deduce that there exists an a.s. finite r.v. , independent of , such that
[TABLE]
where is an arbitrarily small constant independent of .
Taking together (36), (40) and (43) we arrive at
[TABLE]
where is an a.s. finite r.v. independent of , and can be chosen to be arbitrarily small.
Substituting the inequalities (35) and (44) back into the relation (34) we arrive at the bound
[TABLE]
where the second inequality follows from the assumption and choosing . Since the r.v. is a a.s. finite, a.s. as well.
To conclude the proof, we substitute the inequality (45) twice into the relation (32). To be precise, we choose first and use (45) to obtain a bound for the first term on the right hand side of (32). Then, we choose and apply (45) again to find a bound for the second term on the right hand side of (32). As a result, we arrive at
[TABLE]
Since by assumption of Theorem 1, taking
[TABLE]
leads to the desired result and concludes the proof. \qed
Theorem 1 is a general result regarding nonlinear importance sampling. It holds true for any problem involving the approximation of the posterior probability distribution of the unknown parameters of a state space model as long as Assumptions 1 and 2 hold. These assumptions, in turn, are very mild and amount to the classical assumptions in the analysis of standard IS algorithms.
Remark 2
We draw attention to the fact that the error vanishes a.s. when even if the number of particles in the BF remains fixed and, hence, does not converge to . This property has been coined “exact approximation” in the MCMC literature (see [7]).
5 Computer simulations
5.1 State-space models
In order to illustrate the performance of the NPMC algorithm and the exact approximation property granted by Theorem 1 we have carried out computer simulations for the estimation of the unknown parameters in a problem consisting of the tracking of a target moving over a region monitored by a network of sensors.
5.1.1 Target dynamics
The target moves over a closed rectangular region . When it hits the border of , the target bounces back in according to the law of reflection [43]. The state of the system at time is {\bf x}_{n}=\left[\begin{array}[]{c}{\bf r}_{n}\\ {\bf v}_{n}\\ \end{array}\right]\in\mathbb{R}^{4}, where is the target position and its velocity. At time , we assume a uniform prior on for the position and a zero-mean Gaussian distribution for the velocity. To be specific, the prior probability measure is defined as
[TABLE]
where is the identity matrix, is the uniform distribution on and denotes the Gaussian distribution with mean and covariance matrix .
At time , the state vector evolves according to a linear-Gaussian equation if the target position remains within the bounded region but it “reflects” back in when the target reaches a border of . Specifically, let
[TABLE]
where is a Gaussian noise term with [math]-mean and covariance matrix
[TABLE]
is a time-discretisation step (we assume in our simulations), is a velocity variance parameter, and is a position variance parameter. The latter are assumed known and identical, . If generated in this way is inside , , then , otherwise , where is the reflection function detailed in A. Note that we do not provide an expression for the kernel but have just described how to draw samples from it instead. This is enough for the implementation of the bootstrap filter and the NPMC algorithm.
For illustration, Fig. 1 depicts the region and a sample trajectory (i.e., a sequence of positions ) which hits the borders of and is reflected back in at four different times. In the figure, the starting target position is represented by a red diamond, the direction of motion is indicated by arrows and the blue squares represent the position of the sensors used to monitor the target motion.
5.1.2 Observations
There are sensors deployed in and, at time , each sensor collects a measurement of the power of the radio signal transmitted by the target. To be specific, the observation recorded by sensor at time has the form
[TABLE]
where is the power of the transmitted radio signal, is the location of the th sensor, is the distance at time between the target and the sensor, is the path loss exponent, is the sensitivity of the sensor, i.e., the minimum power it can measure (note that when ) and is a Gaussian term accounting for observational errors. We assume is a known parameter.
At each time instant , a vector of observations is collected. The target is observed over time instants, and hence the available dataset is . We set for our computer simulations.
5.1.3 Problem statement
Given the state space model described in Sections 5.1.1 and 5.1.2 above, we aim at estimating the unknown parameters , and . All other parameters (namely the discretisation period and the relevant variances) are assumed known. For all computer simulations we have set ground truth values , and for the parameters to be estimated.
Since and , we apply the NPMC algorithm (together with competing algorithms to be described below) to approximate the posterior probability measure of the vector of unknowns . We assume prior distributions of the form , and . Note that, in natural units, the prior mean and variance of are and , respectively, while for the prior mean and variance are and .
The likelihood for the model does not have a closed form and, therefore, it is estimated using a BF, for the state space model described in Sections 5.1.1 and 5.1.2, to yield the approximation detailed in Section 2.2.
5.2 Competing methods
We have applied to this problem the NPMC method described in Section 3, a standard PMC procedure and a particle Metropolis-Hastings (pMH) algorithm. The PMC scheme we have used is identical to the NPMC algorithm of Section 3 except that TIWs are not computed, hence all approximations rely on the conventional IWs.
The pMH is a representative of the class of particle MCMC methods [7] that have become popular in the past two years. It generates a Markov chain on the space of the unknown parameter vector according to the following procedure:
Draw from the prior distribution of the parameters 2. 2.
At the -th iteration, and given the previous element :
- (a)
Draw a tentative new element , where both and the scale factor have been empirically chosen to optimise the performance of the algorithm. 2. (b)
Compute the (approximate) likelihood and prior density . The acceptance probability for is
[TABLE] 3. (c)
Draw . If then , else .
When we generate a chain of length using the procedure above we set a burn-in period of , hence estimates are computed from the samples in the chain.
To compare the pMH and PMC-like algorithms on a fair basis, we let , where is the number of iterations of the NPMC and PMC algorithms and is the number of samples generated per iteration.
All three methods (PMC, NPMC, pMH) rely on a BF with particles for the computation of . The value of is fixed for all algorithms as unless explicitly stated otherwise.
5.3 Results
Figure 2 shows the evolution of the MSE of the estimators of produced by the PMC, NPMC and pMH algorithms as the number of samples is increased.
The error for the NPMC algorithm is at least one order of magnitude below the errors of the conventional PMC and the pMH algorithms for every tested value of . For samples, for example, the MSE attained by the NPMC is , while for the standard PMC and pMH algorithms the errors are and , respectively.
Next, we aim at finding out the length of the chain, , required for the pMH algorithm to attain the same performance, in terms of MSE, as the NPMC algorithm. Figure 3 shows the MSE of the pMH method for different chain lengths (equivalently, number of generated samples).
For comparison, the performance of the NPMC algorithm for samples and iterations ( Monte Carlo samples overall) is also indicated in the plot. It can be seen that, in the pMH algorithm, chains that are around samples long are required to attain the same MSE as the NPMC algorithm (a 100-fold increase of the computational cost). While the parameters of the pMH scheme may be further tuned to improve this performance, the gap between the algorithms is large enough to conclude that the NMPC method is more efficient in this example.
Finally, we examine the exact approximation property of the NPMC scheme stated by Theorem 1. Figure 4 shows the MSE of the NPMC algorithm versus the number of Monte Carlo samples, , for different values of (the number of particles used by the BF to approximate the IWs). While Theorem 1 guarantees that the approximation errors vanish as , even if is fixed, it is reasonable to expect that for a fixed , greater values of lead to better performance. This is shown, indeed, by Fig. 4. Note, however, that the difference in performance is very small. For , the gap between the MSE of the NPMC scheme with and the NPMC scheme with is .
6 Conclusion
We have rigorously proved, under mild assumptions, that nonlinear importance samplers with clipped IWs converge a.s. with optimal Monte Carlo error rates even when the weights can only be estimated (and have a positive, non-vanishing variance) as long as these estimates are unbiased. Therefore, nonlinear importance samplers can perform exact approximation in the same manner as, e.g., particle MCMC schemes. Besides the theoretical contribution, we have numerically shown that the proposed algorithm can be more efficient than a particle Metropolis-Hastings algorithm of the same complexity for inference on a target tracking model.
Acknowledgments
This research has been partially supported by the Spanish Ministry of Economy and Competitiveness (projects TEC2015- 69868-C2-1-R ADVENTURE and FIS2013-40653-P), the Spanish Ministry of Education, Culture and Sport (mobility award PRX15/00378) and the Office of Naval Research (ONR) Global (Grant Award no. N62909-15-1-2011).
Appendix A Definition of function
Let us denote the upper right, upper left, lower left and lower right vertices of the monitored region by, respectively, , , and . The sides of the rectangle, obtained by joining adjacent vertices, are denoted (top), (left), (bottom) and (right). With this notation, Algorithm 1 can be used at time to generate a sample from . It accounts for the scenario in which the target hits one of the walls and deals with it by means of the law of reflection [43].
We are implicitly assuming that in step 5 above. If this is not the case, i.e., , then steps 3–5 can be run again to implement a second reflection.
References
- [1]
M. Jansson, B. Wahlberg, A linear regression approach to state-space subspace system identification, Signal Processing 52 (2) (1996) 103–129.
- [2]
G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Transactions Signal Processing 50 (2) (2002) 281–289.
- [3]
C. Andrieu, A. Doucet, Online expectation-maximization type algorithms for parameter estimation in general state space models, in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 6, IEEE, 2003, pp. VI–69.
- [4]
J. Ding, Y. Shi, H. Wang, F. Ding, A modified stochastic gradient based parameter estimation algorithm for dual-rate sampled-data systems, Digital Signal Processing 20 (4) (2010) 1238–1247.
- [5]
F. Ding, Y. Gu, Performance analysis of the auxiliary model-based stochastic gradient parameter estimation algorithm for state-space systems with one-step state delay, Circuits, Systems, and Signal Processing 32 (2) (2013) 585–599.
- [6]
J. Kokkala, S. Särkkä, Combining particle MCMC with Rao-Blackwellized Monte Carlo data association for parameter estimation in multiple target tracking, Digital Signal Processing 47 (2015) 84–95.
- [7]
C. Andrieu, A. Doucet, R. Holenstein, Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society B 72 (2010) 269–342.
- [8]
E. Koblents, J. Míguez, A population monte carlo scheme with transformed weights and its application to stochastic kinetic models, Statistics and Computing 25 (2) (2015) 407–425.
- [9]
D. Crisan, J. Miguez, Nested particle filters for online parameter estimation in discrete-time state-space markov models, arXiv 1308.1883v3 [stat.CO].
- [10]
N. Kantas, A. Doucet, S. S. Singh, J. M. Maciejowski, N. Chopin, On particle methods for parameter estimation in state-space models, Statistical Science 30 (2015) 328–351.
- [11]
J. Olsson, T. Ryden, Rao-Blackwellization of particle Markov chain Monte Carlo methods using forward filtering backward sampling, IEEE Transactions on Signal Processing 59 (10) (2011) 4606–4619.
- [12]
T. Vu, B.-N. Vo, R. Evans, A particle marginal Metropolis-Hastings multi-target tracker, IEEE Transactions on Signal Processing 62 (15) (2014) 3953–3964.
- [13]
J. Kwon, R. Dragon, L. Van Gool, Joint tracking and ground plane estimation, IEEE Signal Processing Letters 23 (11) (2016) 1514–1517.
- [14]
J. Ala-Luhtala, N. Whiteley, K. Heine, R. Piché, An introduction to twisted particle filters and parameter estimation in non-linear state-space models, IEEE Transactions on Signal Processing 64 (18) (2016) 4875–4890.
- [15]
W. J. Fitzgerald, Markov chain Monte Carlo methods with applications to signal processing, Signal Processing 81 (1) (2001) 3–18.
- [16]
N. Gordon, D. Salmond, A. F. M. Smith, Novel approach to nonlinear and non-Gaussian Bayesian state estimation, IEE Proceedings-F 140 (2) (1993) 107–113.
- [17]
A. Doucet, N. de Freitas, N. Gordon (Eds.), Sequential Monte Carlo Methods in Practice, Springer, New York (USA), 2001.
- [18]
A. Doucet, S. Godsill, C. Andrieu, On sequential Monte Carlo Sampling methods for Bayesian filtering, Statistics and Computing 10 (3) (2000) 197–208.
- [19]
P. M. Djurić, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, J. Míguez, Particle filtering, IEEE Signal Processing Magazine 20 (5) (2003) 19–38.
- [20]
O. Cappé, S. J. Godsill, E. Moulines, An overview of existing methods and recent advances in sequential Monte Carlo, Proceedings of the IEEE 95 (5) (2007) 899–924.
- [21]
C. P. Robert, G. Casella, Monte Carlo Statistical Methods, Springer, 2004.
- [22]
O. Cappé, A. Gullin, J. M. Marin, C. P. Robert, Population monte carlo, Journal of Computational and Graphical Statistics 13 (4) (2004) 907–929.
- [23]
N. Chopin, P. E. Jacob, O. Papaspiliopoulos, SMC2: an efficient algorithm for sequential analysis of state space models, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
- [24]
M. Hong, M. F. Bugallo, P. M. Djuric, Joint model selection and parameter estimation by population Monte Carlo simulation, IEEE Journal of Selected Topics in Signal Processing 4 (3) (2010) 526–539.
- [25]
L. Martino, V. Elvira, D. Luengo, J. Corander, An adaptive population importance sampler: Learning from uncertainty, IEEE Transactions on Signal Processing 63 (16) (2015) 4422–4437.
- [26]
M. F. Bugallo, L. Martino, J. Corander, Adaptive importance sampling in signal processing, Digital Signal Processing 47 (2015) 36–49.
- [27]
V. Elvira, L. Martino, D. Luengo, M. F. Bugallo, Improving population monte carlo: Alternative weighting and resampling schemes, Signal Processing 131 (2017) 77–91.
- [28]
N. Chopin, A sequential particle filter method for static models, Biometrika 89 (3) (2002) 539–552.
- [29]
P. Del Moral, A. Doucet, A. Jasra, Sequential Monte Carlo samplers, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (3) (2006) 411–436.
- [30]
A. Kong, J. S. Liu, W. H. Wong, Sequential imputations and Bayesian missing data problems, Journal of the American Statistical Association 9 (1994) 278–288.
- [31]
V. Elvira, L. Martino, D. Luengo, M. F. Bugallo, Efficient multiple importance sampling estimators, IEEE Signal Processing Letters 22 (10) (2015) 1757–1761.
- [32]
A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, Springer, 2008.
- [33]
B. D. O. Anderson, J. B. Moore, Optimal Filtering, Englewood Cliffs, 1979.
- [34]
G. Kitagawa, Monte Carlo filter and smoother for non-Gaussian nonlinear state-space models, J. Comput. Graph. Statist. 1 (1996) 1–25.
- [35]
R. Douc, O. Cappé, E. Moulines, Comparison of resampling schemes for particle filtering, in: Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005, pp. 64–69.
- [36]
P. Del Moral, Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications, Springer, 2004.
- [37]
J. Míguez, D. Crisan, P. M. Djurić, On the convergence of two sequential Monte Carlo methods for maximum a posteriori sequence estimation and stochastic global optimization, Statistics and Computing 23 (1) (2013) 91–107.
- [38]
C. Andrieu, G. Roberts, The pseudo-marginal approach for efficient Monte Carlo computations, Annals of Statistics 37 (2009) 697–725.
- [39]
A. Doucet, N. de Freitas, N. Gordon, An introduction to sequential Monte Carlo methods, in: A. Doucet, N. de Freitas, N. Gordon (Eds.), Sequential Monte Carlo Methods in Practice, Springer, 2001, Ch. 1, pp. 4–14.
- [40]
D. Crisan, J. Miguez, G. Ríos, A simple scheme for the parallelisation of particle filters and its application to the tracking of complex stochastic systems, arXiv arXiv:1407.8071v2 [stat.CO].
- [41]
E. Koblents, J. Miguez, M. A. Rodriguez, A. M. Schmidt, A nonlinear population Monte Carlo scheme for the Bayesian estimation of parameters of -stable distributions, Computational Statistics and Data Analysis 95 (2016) 57–74.
- [42]
D. Crisan, J. Miguez, Particle-kernel estimation of the filter density in state-space models, Bernoulli 20 (4) (2014) 1879–1929.
- [43]
G. Farin, D. Hansford, Practical linear algebra: A geometry toolbox, CRC Press, 2013.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Jansson, B. Wahlberg, A linear regression approach to state-space subspace system identification, Signal Processing 52 (2) (1996) 103–129.
- 2[2] G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Transactions Signal Processing 50 (2) (2002) 281–289.
- 3[3] C. Andrieu, A. Doucet, Online expectation-maximization type algorithms for parameter estimation in general state space models, in: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 6, IEEE, 2003, pp. VI–69.
- 4[4] J. Ding, Y. Shi, H. Wang, F. Ding, A modified stochastic gradient based parameter estimation algorithm for dual-rate sampled-data systems, Digital Signal Processing 20 (4) (2010) 1238–1247.
- 5[5] F. Ding, Y. Gu, Performance analysis of the auxiliary model-based stochastic gradient parameter estimation algorithm for state-space systems with one-step state delay, Circuits, Systems, and Signal Processing 32 (2) (2013) 585–599.
- 6[6] J. Kokkala, S. Särkkä, Combining particle MCMC with Rao-Blackwellized Monte Carlo data association for parameter estimation in multiple target tracking, Digital Signal Processing 47 (2015) 84–95.
- 7[7] C. Andrieu, A. Doucet, R. Holenstein, Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society B 72 (2010) 269–342.
- 8[8] E. Koblents, J. Míguez, A population monte carlo scheme with transformed weights and its application to stochastic kinetic models, Statistics and Computing 25 (2) (2015) 407–425.
