Bayesian Static Parameter Estimation for Partially Observed Diffusions via Multilevel Monte Carlo
Ajay Jasra, Kengo Kamatani, Kody J. H. Law, and Yan Zhou

TL;DR
This paper introduces a multilevel Monte Carlo approach combined with particle MCMC for efficient Bayesian parameter estimation in discretized partially observed diffusions, reducing computational cost while maintaining accuracy.
Contribution
It develops a novel MLMC method using coupling and importance sampling for Bayesian inference in discretized diffusions, improving efficiency over traditional methods.
Findings
Variance of weights is independent of data length
Method reduces computational cost for a given mean square error
Successfully applied to Ornstein-Uhlenbeck and Langevin processes
Abstract
In this article we consider static Bayesian parameter estimation for partially observed diffusions that are discretely observed. We work under the assumption that one must resort to discretizing the underlying diffusion process, for instance using the Euler-Maruyama method. Given this assumption, we show how one can use Markov chain Monte Carlo (MCMC) and particularly particle MCMC [Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods (with discussion). J. R. Statist. Soc. Ser. B, 72, 269--342] to implement a new approximation of the multilevel (ML) Monte Carlo (MC) collapsing sum identity. Our approach comprises constructing an approximate coupling of the posterior density of the joint distribution over parameter and hidden variables at two different discretization levels and then correcting by an importance sampling method. The variance of the…
| Model | Parameter | ML-PMCMC | PMCMC |
|---|---|---|---|
| Ornstein-Uhlenbech process | |||
| Langevin SDE | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
Bayesian Static Parameter Estimation for Partially
Observed Diffusions via Multilevel Monte Carlo
Ajay Jasra [email protected]
Kengo Kamatani [email protected] Department of Engineering Science, Osaka University, JP
Kody J. H. Law [email protected] Computer Science and Mathematics Division, Oak Ridge National Laboratory, USA
Yan Zhou [email protected]
Abstract
In this article we consider static Bayesian parameter estimation for partially observed diffusions that are discretely observed. We work under the assumption that one must resort to discretizing the underlying diffusion process, for instance using the Euler Maruyama method. Given this assumption, we show how one can use Markov chain Monte Carlo (MCMC) and particularly particle MCMC [Andrieu, C., Doucet, A. & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods (with discussion). J. R. Statist. Soc. Ser. B, 72, 269–342] to implement a new approximation of the multilevel (ML) Monte Carlo (MC) collapsing sum identity. Our approach comprises constructing an approximate coupling of the posterior density of the joint distribution over parameter and hidden variables at two different discretization levels and then correcting by an importance sampling method. The variance of the weights are independent of the length of the observed data set. The utility of such a method is that, for a prescribed level of mean square error, the cost of this MLMC method is provably less than i.i.d. sampling from the posterior associated to the most precise discretization. However the method here comprises using only known and efficient simulation methodologies. The theoretical results are illustrated by inference of the parameters of two prototypical processes given noisy partial observations of the process: the first is an Ornstein Uhlenbeck process and the second is a more general Langevin equation.
Key words: Multilevel Monte Carlo, Markov chain Monte Carlo, Diffusion Processes
1 Introduction
The Hidden Markov Model (HMM) is widely used in many disciplines, including applied mathematic, statistics, economics and finance; see [2] for an overview. In this article, we are interested in HMMs given by diffusions which are partially observed, discretely in time. In particular, we assume that in order to fit the model to the data, one must resort to a discretization of the diffusion, for instance, using Euler-Maruyama. In addition, we assume that associated to the model is a static (non-time-varying) finite dimensional parameter, which one is interested to infer given a fixed length data record. In simple terms, the discretization, of level say, where as one obtains the exact diffusion, induces a posterior say on the static parameter and hidden states at the observation times, say . We seek to approximate for appropriately defined real-valued functions. Ultimately, one might seek to remove the dependence upon and get the exact expectation with no discretization bias. We remark that the model will be formally introduced in the next section. This framework is relevant to a broad range of applications in science and engineering; see [2, 17]
The task of computing the expectation for any fixed is a non-trivial task, which often requires quite advanced Monte Carlo methods. As has been remarked in many articles in the literature, ofen the joint correlation between and means even standard MCMC methods may produce very inaccurate of inefficient approximations of the expectation of interest, despite their theoretical validity. An important algorithm that has, to an extent, helped to alleviate these difficulties is the particle MCMC (PMCMC) methods of [1] and their subsequent developments (e.g. [4]). Intrinsically, this method uses a sequential Monte Carlo (SMC) (e.g. [7]) method to help move the samples around the state-space, for instance, inside a Metropolis-Hastings acceptance/rejection scheme, although Gibbs versions also exist. PMCMC delivers a Markov chain which provides consistent estimates of expectations of the form , for any fixed SMC methods are well-known as being efficient techniques for filtering, when the state-variable at time , , is of moderate to low dimension and all the static parameters are fixed.
In the context of this article, there is an additional degree of freedom, which can be utilized to further enhance the PMCMC method. This is associated to the discretization level . We consider using the multilevel Monte Carlo (MLMC) framework [8, 9, 11]. This allows one to leverage in an optimal way the nested problems arising in this context, hence minimizing the necessary cost to obtain a given level of mean square error. Set as the posterior on with no discretization bias and as the time-discretized posterior on with time discretization , one has for an intergrable, real-valued function and (the levels)
[TABLE]
where is the expectation operator and . The idea of MLMC is then to approximate each summand by independently simulating samples from a dependent coupling of . In such scenarios, one can show that the overall mean square error (MSE) associated to the approximation of is:
[TABLE]
where
[TABLE]
and are a collection of constants. It is remarked that it is the coupled samples which induce to be a function of which is often critical as we explain below. Assuming the cost of per level, per sample, the cost of the algorithm is then . Fixing and given an appropriate parameterization of (e.g. ), one then chooses to ensure that and then given characterised as a function of optimizes to minimize the cost so that the term ; [8] gives the solution to this constrained optimization problem. In many scenarios of practical interest the associated MLMC algorithm can achieve a MSE of at a cost which is less than i.i.d. sampling from ; note that this has not yet been established in the problem under study here. The main issue is that sampling independently from the couples is not possible in our context.
In this paper we show how to implement a new approximation of the multilevel collapsing sum identity. Our approach comprises constructing an approximate coupling of the posterior density of the joint on the parameter and hidden space at two different discretization levels and then correcting by an importance sampling method, whose variance of the weights are independent of the length of the observed data set. The utility of such a method is that it comprises using known and efficient simulation methodologies, instead of coupling algorithms as explored in [13, 14, 15, 19]. In particular, our approach facilitates a mathematical analysis which allows us to establish that our approach can be better than sampling (e.g. by PMCMC) from the posterior associated to the most precise discretization. The algorithm presented here is distinct from either of the previously introduced multilevel MCMC (MLMCMC) algorithms [12, 16], and may be generalized.
This article is structured as follows. In Section 2 the model is described. In Section 3 we describe our approach and give a mathematical result associated to the MSE of the method. In Section 4 we give practical simulations to establish the theory. The appendix contains some of the proofs for the result of Section 3.
2 Model
We consider the following partially-observed diffusion process:
[TABLE]
with , , has initial probability density and a Brownian motion of appropriate dimension. is a static parameter of interest. The following assumptions will be made on the diffusion process.
Assumption 2.1**.**
, satisfy
- (i)
global Lipschitz property*: there is a such that for all and all ;*
- (ii)
bounded moments*: for all *
Notice that (i) and (ii) together imply that for all .
It will be assumed that the data are regularly spaced (i.e. in discrete time) observations , . It is assumed that conditional on , for discretization , is independent of all other random variables with density . For simplicity of notation let (which can always be done by rescaling time), so . It is noted that we assume that one does not have access to a non-negative and unbiased estimate of the transition density of the diffusion and we are forced to work with a discretized process.
The above formulation can then summarized as follows, on discretizing the diffusion process with discretization level . We have a pair of discrete-time stochastic processes, and , where (with associated algebra ) is an unobserved process and (with associated algebra ) is observed. Let be a parameter . The hidden process is a Markov chain with initial density at time [math] and transition density , i.e. for each
[TABLE]
where denotes probability, and is a dominating -finite measure. In addition, the observations conditioned upon are statistically independent and have marginal density , i.e.
[TABLE]
with and the dominating -finite measure. The HMM is given by equations (5)-(6) and is often referred to in the literature as a state-space model. In our context is a parameter of interest with prior .
Given the joint density on
[TABLE]
for , where are the bounded and real-valued measurable functions on and are the Lipschitz, measurable functions on , and for we would like to compute
[TABLE]
where . We will use the MLMC approach.
Consider only a single pair , . It is well known that if one can sample from a dependent coupling of , such as the maximal coupling, then Monte Carlo estimation of such a difference can be performed at a lower cost than i.i.d sampling from the independent coupling of [8, 9]. The main issue is that such couplings are typically not available up-to a non-negative and unbiased estimator. We consider the scenario where one samples from a sensible, approximate, coupling and corrects via importance sampling.
3 Method and Analysis
3.1 Method
We are to approximate the identity (7). Our procedure, when considering the summands from will be to run independent pairs of the idea to be described below. The case is simply using (e.g.) PMCMC to approximate ; we refer the reader to [1] for details on PMCMC - a simple decsription is below. We only consider a pair , . The methodology and analysis in this context of one pair will suffice to justify our approach as we will explain below.
Let and be any coupling (other than the independent one) of . For instance, in the context of an Euler discretization a description can be found in [15] (see also appendix B). Let (note that alternative choices of are possible). We propose to sample from the probability density on (write the associated algebra as )
[TABLE]
Then for :
[TABLE]
[TABLE]
where
[TABLE]
We note that our choice of ensures that and are uniformly upper-bounded by 1 and hence that the variance w.r.t. any probability is independent of .
3.1.1 Particle MCMC
Let be a measurable space such that . Let be any ergodic Markov kernel of invariant measure such that one can consistently estimate expectations w.r.t. . For instance, if for every
[TABLE]
Our construction allows a particle MCMC approach to be adopted, which is not quite as the displayed equation, but nonetheless allows one to infer . We focus on one particle MCMC method for completeness, but, we reiterate that one can use the analysis here for more advanced versions of the algorithm, or indeed, any MCMC of the form above.
We will now describe the particle marginal Metropolis-Hastings (PMMH) algorithm. Let and be fixed, and introduce random variables , which will denote the indices of the selected particles upon resampling at the given steps. One can run a particle filter [5] to approximate
[TABLE]
by sampling from the following joint, on the space
[TABLE]
where . Note that better algorithms can be constructed, but we just present the most simple approach. We remark that
[TABLE]
is an unbiased estimator of ; see [5].
The PMMH algorithm works as follows. The superscripts for are the iteration (time) counter of the MCMC.
Initialize: Sample from the prior and then sample from as in (9), and store as in (10). Select a path , constructed by drawing with probability proportional to , and setting ; set as the index of the selected path. Set . 2. 2.
Iterate: Sample according to a proposal with conditional density then from as in (9). Select a path with probability proportional to and constructed as described above; set as the index of the selected path. Set , with probability:
[TABLE]
otherwise , . Set and return to the start of 2.
We denote by the PMMH kernel and denote by the measurable space for which it is defined upon. The invariant measure is denoted . For the analysis, we assume the MCMC algorithm is started in stationarity.
Then one estimates (8) by
[TABLE]
This estimate is consistent in the limit as grows; see [1]. To simplify the notation we replace in the superscripts by from here on.
3.2 Multilevel Considerations
As described for MLMC in the introduction, we will approximate the expectation using the telescopic sum identity given in (1). We will establish error estimates for
[TABLE]
where
[TABLE]
is a consistent estimator of . Therefore (11) is a consistent estimator of and the the MSE (2) can be bounded, up to a constant, by the sum of the squared error of (11) and Bias, as given by (3), which is for example using Euler Maruyama.
Using to denote the expectation w.r.t. the law associated to our algorithm, assuming the Markov chain is started in stationarity, our objective is therefore to investigate
[TABLE]
so as to optimally allocate as described in the introduction. Thus we must investigate terms such as for a given .
3.3 Analysis
Below are the collection of probability measures on .
- (A1)
For every there exist such that for every , ,
[TABLE]
For every , is globally Lipschitz on .
- (A2)
For any , there exists a such that for any there exists a
[TABLE]
- (A3)
Suppose that for any there exist a and such that for each , , :
[TABLE]
is -reversible, that is, for any .
We note that (A(A1)) can be verified for some state-space models (especially if and are compact) and (A(A3)) can be verified for a PMCMC kernel, if are compact - indeed, the constants would all be independent of under appropriate settings of the algorithm.
Theorem 3.1**.**
Assume (A(A1)-(A3)). Then for any , there exists a such that for any there exists a such that
[TABLE]
[TABLE]
Proof.
The result follows by using Lemma C.3. of [14], the inequality, the boundedness of certain quantities and Proposition A.1.The proof is omitted as it is similar to the calculations in [14]. ∎
3.4 A Return to Multilevel Considerations
Returning to Section 3.2, we assume that and introduce the further assumption
Assumption 3.1**.**
The cost to simulate in (12) is controlled by , and the bias is controlled by
[TABLE]
for .
Following assumption (A(A2)), satisfies the above, but it may be larger, e.g. for Euler-Maruyama in which .
Given , in order to ensure the MSE is , the term (3) must be . Following from Assumption (A(A2)), it suffices to let so that .
Following from Theorem 3.1,
[TABLE]
and note that the constant may depend upon the time parameter , which has been suppressed from the notation; we return to this point below.
Suppose we minimize COST subject to as a function of . This is exactly considered in [8] for and later in [3] for , and yields that
[TABLE]
where (see also [14, 6]). This gives a cost of per time step. Hence the following corollary is immediate.
Corollary 3.1** (ML Cost).**
Given (A(A1)-(A3)) and Assumption 3.1, for any and any , can be chosen such that the estimator with given in (12), satisfies
[TABLE]
for some , for a total cost controlled by
[TABLE]
In contrast, for the same scenario, the computational cost of PMCMC is per time step, which is asymptotically greater than the method developed here.
It is remarked that all of our constants depend upon the time parameter (number of data points) and this element has been ignored. This is due to the technical complexity of the approach. We expect that the constants can be made time-uniform, and hence we conjecture that the results hold true uniformly in time. Then can be chosen as above, and for Euler Maruyama ( [10]) the cost for a given will be , with similar results for , according to (15). This results because one needs to take for the particle filter in PMMH [1] and the cost to obtain a single sample particle filter trajectory is . A verification of this is left for future work.
4 Numerical Simulations
4.1 Ornstein-Uhlenbeck process
First, we consider the following Ornstein-Uhlenbeck process,
[TABLE]
where denotes the Normal distribution with mean and variance . Further, the parameters are unknown and are given the following priors,
[TABLE]
where denotes the Gamma distribution with shape and scale . The remaining parameters are defined as constants, , , , and . A data set with 100 observations is simulated with and .
4.2 Langevin SDE
Consider the following Langevin SDE,
[TABLE]
where denote the probability density function of a Student’s -distribution with degrees of freedom. The parameters of interest are , and these are given prior,
[TABLE]
The constants are and . A data set with 1,000 observations is simulated with , , and .
4.3 Simulation settings
The simulations proceed as the following. Let be the accuracy parameter. At each level , we set the number of particles in the PMCMC kernel be fixed, and set the number of PMCMC samples for estimation according to the multilevel analysis. Let denote the number of samples at level within a simulation that targets -level error, . The value of is determined empirically with variance estimated from 100 samplers. For comparison, a single-level PMCMC sampler is also considered for each . Its number of samples is determined empirically by running 100 simulations simultaneously. And these chains are run until the estimated error of the 100 estimates matches that of the multilevel sampler. In all situations, a fixed burn-in period of 10,000 iterations is used. This is reasonable given the fast decorrelation of the chains, as illustrated by the estimated autocorrelation of the single level PMCMC sampler for in Figure 1. The autocorrelation functions look similar for all for the multilevel sampler.
4.4 Results
We consider the choice of . The main results of the cost vs. error are shown in Figure 2. The estimated cost rates are listed in table 1. It is shown in the appendix that for Euler discretization the method satisfies the assumptions (A(A1)-(A3)) with in (A(A2)), since the diffusion term is constant in [10]. Furthermore, Assumption 3.1 holds with . Therefore, the theoretical results of Theorem 3.1 and Corollary 3.1 predict the rate . Standard PMMH will incur a cost of . The numerical results confirm this.
Acknowledgements
AJ & YZ were supported by an AcRF tier 2 grant: R-155-000-161-112. AJ is affiliated with the Risk Management Institute, the Center for Quantitative Finance and the OR & Analytics cluster at NUS. KK & AJ acknowledge CREST, JST for additionally supporting the research. KJHL was supported by ORNL LDRD Strategic Hire grant 32112580.
Appendix A Technical Results
A Markov kernel can be viewed as a linear operator for on a Hilbert space
[TABLE]
with an inner product and norm . Let be the operator norm.
By Döblin condition (A(A3)), we have the total variation distance bound for some . Since is an Metropolis-Hastings kernel, it has -reversibility. Therefore, the total variation bound implies -spectral gap
[TABLE]
by Theorem 2.1 of [18].
For a finite measure on a measurable space and
[TABLE]
Defining as the relevant variables of from the MCMC kernel, and defining
[TABLE]
we are interested in estimates of the form:
[TABLE]
Proposition A.1**.**
Assume (A(A1)-(A3)). Suppose that is a Markov chain with the Markov kernel , and . Then for any , there exists a such that for any there exists a such that
[TABLE]
where is the relevant variables of .
Proof.
Denote the map by . Then
[TABLE]
for . By simple algebra,
[TABLE]
On the other hand, by Lemma A.1,
[TABLE]
Thus, the claim follows. ∎
Lemma A.1**.**
Assume (A(A1)-(A2)). Then for any , there exists a such that for any there exists a
[TABLE]
Proof.
We prove the result for , the case being almost the same. The result is proved by induction on . Set , then
[TABLE]
[TABLE]
[TABLE]
As is uniformly (in ) bounded below, the denominator on the R.H.S. is uniformly lower bounded by a constant that is independent of . The numerator on the R.H.S. is
[TABLE]
[TABLE]
Application of (A(A2)) hence yields
[TABLE]
Assuming the result for , , by the above argument we only have to consider
[TABLE]
[TABLE]
The R.H.S. can be upper-bounded by
[TABLE]
[TABLE]
The first term can be treated by the induction hypothesis and the second term via (A(A2)) which completes the proof. ∎
Appendix B Coupling Euler Approximations
Consider , the current position of the discretized diffusions. Now we have the discretization levels, with and for simplicity set . Associated to the discretization level (resp. ), one must sample (resp. ) points to obtain the sampled position of the diffusion at the next observation time. Set then one can sample the fine discretization, for as
[TABLE]
where ( is the identity matrix). For the course discretization, using the same simulated we set for
[TABLE]
Now, we want to check conditions (A(A2)) and (A(A3)) under Assumption 2.1 (i,ii), (A(A1)) and the following assumption.
Assumption B.1**.**
* is a compact set of , and and are continous and strictly positive.*
By assumption, is the density of given . Then, under Assumption 2.1 (i, ii), the condition (A(A2)) is satisfied with for any , since this is the bound of the Euler-Maruyama scheme (in fact for constant diffusion coefficient it coincides with the Milstein method and ) [10].
Next, we want to check the condition (A(A3)). The proposal density on of PMMH is
[TABLE]
where , , and is defined in (9). The transition kernel is
[TABLE]
where the acceptance probability is
[TABLE]
and the rejection probability is
[TABLE]
By (A(A1)) together with Assumption B.1, , and
[TABLE]
for a constant with a probability density . Thus, we have
[TABLE]
In particular, the condition (A3) is satisfied.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Andrieu , C., Doucet , A. & Holenstein , R. (2010). Particle Markov chain Monte Carlo methods (with discussion). J. R. Statist. Soc. Ser. B , 72 , 269–342.
- 2[2] Cappé , O., Ryden , T, & Moulines , É. (2005). Inference in Hidden Markov Models . Springer: New York.
- 3[3] Cliffe, K.A., Giles, M.B., Scheichl, R. and Teckentrup, A.L. (2011). Multilevel Monte Carlo methods and applications to elliptic PD Es with random coefficients. Comp. Visual. Sci. , 14 , 3–15.
- 4[4] Deligiannidis , G., Doucet , A. & Pitt , M. K. (2015). The correlated pseudo-marginal method. ar Xiv preprint ar Xiv:1511.0492 .
- 5[5] Del Moral , P. (2004). Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications . Springer: New York.
- 6[6] Del Moral, P., Jasra, A., Law, K. and Zhou, Y. (2016). Multilevel Sequential Monte Carlo Samplers for Normalizing Constants. ar Xiv preprint ar Xiv:1603.01136 .
- 7[7] Doucet , A. & Johansen , A. (2011). A tutorial on particle filtering and smoothing: Fifteen years later. In Handbook of Nonlinear Filtering (eds. D. Crisan & B. Rozovsky), Oxford University Press: Oxford.
- 8[8] Giles , M. B. (2008). Multilevel Monte Carlo path simulation. Op. Res. , 56 , 607-617.
