Exact Simulation for Multivariate It\^o Diffusions
Jose Blanchet, Fan Zhang

TL;DR
This paper introduces the first generic exact simulation algorithm for multivariate Itô diffusions, overcoming the limitations of Lamperti transformations which are only applicable in one dimension.
Contribution
It develops a novel exact sampling method for multivariate diffusions by combining rough path theory and multilevel Monte Carlo techniques.
Findings
First generic exact simulation algorithm for multivariate diffusions
Successfully overcomes Lamperti transformation limitations in higher dimensions
Demonstrates practical applicability of combined rough path and Monte Carlo methods
Abstract
We provide the first generic exact simulation algorithm for multivariate diffusions. Current exact sampling algorithms for diffusions require the existence of a transformation which can be used to reduce the sampling problem to the case of a constant diffusion matrix and a drift which is the gradient of some function. Such transformation, called Lamperti transformation, can be applied in general only in one dimension. So, completely different ideas are required for exact sampling of generic multivariate diffusions. The development of these ideas is the main contribution of this paper. Our strategy combines techniques borrowed from the theory of rough paths, on one hand, and multilevel Monte Carlo on the other.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Exact Simulation for Multivariate Itô Diffusions
Abstract
We provide the first generic exact simulation algorithm for multivariate diffusions. Current exact sampling algorithms for diffusions require the existence of a transformation which can be used to reduce the sampling problem to the case of a constant diffusion matrix and a drift which is the gradient of some function. Such transformation, called Lamperti transformation, can be applied in general only in one dimension. So, completely different ideas are required for the exact sampling of generic multivariate diffusions. The development of these ideas is the main contribution of this paper. Our strategy combines techniques borrowed from the theory of rough paths, on the one hand, and multilevel Monte Carlo on the other.
keywords:
Exact Simulation, Stochastic Differential Equation, Brownian Motion, Monte Carlo Method.
\authornames
J. BLANCHET AND F. ZHANG
\authorone
[Stanford University]Jose Blanchet \authortwo[Stanford University]Fan Zhang \addressoneHuang Engineering Center, 475 Via Ortega, Stanford, CA 94305, United States. \[email protected] \addresstwoHuang Engineering Center, 475 Via Ortega, Stanford, CA 94305, United States. \[email protected]
\ams
34K50,65C05,82B8097K60
1 Introduction
Consider a probability space and an Itô Stochastic Differential Equation (SDE)
[TABLE]
where is a -dimensional Brownian motion under , and and satisfy suitable regularity conditions. For instance, in order for (1) to have a strong solution, it is sufficient to assume that both and are uniformly Lipschitz.
Under additional regularity conditions on and , this paper provides the first Monte Carlo simulation algorithm which allows sampling any discrete skeleton exactly, without any bias.
The precise regularity conditions that we impose on and are stated in Section 3. In particular, it is sufficient for the validity of our Monte Carlo method to assume and to be three times continuously differentiable, both with Lipschitz continuous derivatives of order three. In addition, we must assume that is uniformly elliptic.
Exact simulation of SDEs has generated a substantial amount of interest in the applied probability and Monte Carlo simulation communities. The landmark paper of [5], introduced what has become the standard procedure for the design of generic exact simulation algorithms for diffusions. The authors in [5] propose a clever acceptance-rejection sampler which uses Brownian motion as a proposal distribution. The authors in [7] apply a localization technique which eliminates certain boundedness assumptions which are originally present in [5]; see also [4] for the use of retrospective simulation ideas to dispense with boundedness assumptions.
The fundamental assumption underlying the work of [5] and its extensions is that the underlying (target) process has a constant diffusion coefficient, i.e., for every . Beskos and Roberts [5] note that in the case , owing to Lamperti’s transformation, the constant diffusion coefficient assumption comes basically at no cost in generality.
Unfortunately, however, Lamperti’s transformation is only generally applicable in one dimension. In fact, [1] characterizes the multidimensional diffusions for which Lamperti’s transformation can be successfully applied and these models are very restrictive.
Moreover, even if Lamperti’s transformation is applicable in a multidimensional setting, another implicit assumption in the application of the Beskos and Roberts acceptance-rejection procedure is that the drift coefficient is the gradient of some function (i.e. for some ). This assumption, once again, comes at virtually no cost in generality in the one-dimensional setting, but it may be very restrictive in the multidimensional case.
Because of these limitations, a generic algorithm for exact simulation of multidimensional diffusions, even under the regularity conditions that we impose here, requires a completely different set of ideas.
The contribution in this paper is therefore not only the production of such a generic exact simulation algorithm, but also the development of the ideas that are behind its construction. In Section 2 we introduce an algorithm which assumes a constant diffusion coefficient, but removes the assumption of the drift coefficient being the gradient of some function. In Section 3, we eventually remove the requirement on a constant diffusion matrix and propose an algorithm applicable to general diffusions. The algorithms in Section 2 and Section 3 are different in nature. However, they share some common elements, such as the use of so-called Tolerance-Enforced Simulation techniques based on rough path estimates. Even though the algorithm in Section 3 is more general, we believe that there is significant value in developing the algorithm in Section 2 because of two reasons. The first one is pedagogical, the algorithm in Section 2 is easier to understand while building on a key idea, which involves localizing essential quantities within specific compact domains with probability one. The second reason is that the algorithm in Section 2, being simpler, may be subject to potential improvement methodologies to be pursued in future research.
Potential improvements are particularly interesting directions specially given that, unfortunately, the algorithms that we present have infinite expected termination time. We recognize that this issue should be resolved for the algorithms to be widely used in practice, and we discuss the elements which lead to infinite expected running time in Section 4. There are basically two types of elements that affect the running time of the algorithm in Section 3, one of them has to do with the types of issues that arise when trying to fully remove the bias in Tolerance-Enforced Simulation and related approximations, and the other issue has to do with the use of a density approximation coupled with Bernoulli factories. In contrast, the algorithm in Section 2 is only affected by removing the bias in Tolerance-Enforced Simulation type approximations. We must stress, however, that the present paper shows for the first time that it is possible to perform exact sampling of multidimensional diffusions in substantial generality and, in doing so, it provides a conceptual framework different from the prevailing use of Lamperti transformation, which is the only available generic approach for producing exact sampling of diffusions.
Now, despite the algorithm’s practical limitations, it is vital to recognize the advantages that unbiased samplers have over biased samplers in the context of a massive parallel computing environment, because it is straightforward to implement a parallel procedure to reduce the estimation error of an unbiased sampler.
Recently, there have been several unbiased estimation procedures which have been proposed for expectations of the form , assuming . For example, the work of [21] shows that if is twice continuously differentiable (with Lipschitz derivatives) and if there exists a discretization scheme which can be implemented with a strong convergence error of order 1, then it is possible to construct an unbiased estimator for with finite variance and finite expected termination time. The work of [12] shows that such a discretization scheme can be developed if and are sufficiently smooth under certain boundedness assumptions. The paper [13] also develops an unbiased estimator for using a regime-switching technique. Our work here is somewhat related to this line of research, but an important difference is that we actually generate exactly, while all of the existing algorithms which apply in multidimensional diffusion settings generate such that . So, for example, if is positive, one cannot guarantee that is positive using the type of samplers suggested in [21]. However, by sampling directly, one maintains the positivity of the estimator.
Another instance in which direct exact samplers are useful arises in the context of stochastic optimization. For instance, consider the case in which one is interested in optimizing a convex function of the form , where is differentiable. In this case, one can naturally construct an estimator such that using the results in [21] and optimize the mapping , which unfortunately will typically not be convex. So, having access to a direct procedure to sample in this setting is particularly convenient as convexity is preserved.
The rest of the paper is organized as follows. In Section 2, we consider the case of multidimensional diffusions with a constant diffusion coefficient and a Lipschitz continuous (suitably smooth) drift. The general case is discussed in Section 3, our development uses localization ideas which are introduced in Section 2, but also some basic estimates of the transition density of the underlying diffusion (e.g. Lipschitz continuity), these estimates are developed in Appendix A. As mentioned before we discuss the bottlenecks in the expected running time of the algorithm in Section 4.
2 Exact Simulation of SDEs with Identity Diffusion Coefficient
In case that the Lamperti’s transformation is applicable, the SDE of interest is reducible to another SDE whose diffusion matrix is the identity. As a result, it suffices to consider simulating the following SDE
[TABLE]
where is a -dimensional Brownian motion. In this section we concentrate on the identity diffusion case (2), but the development can be immediately extended to the case of a constant diffusion matrix. However, throughout this section we must impose the following assumptions.
{assumption}
The SDE (2) has a strong solution.
{assumption}
The drift coefficient is three times continuously differentiable.
Assumption 2 is the requirement of TES, the theoretical foundation of our algorithm, which we shall introduce later.
Let us introduce some notations first. For any set and , we use to denote the distance between and ; denotes the interior of ; denotes the boundary of ; denotes complementary of .
Consider a probability space endowed with a filtration , and supporting a -dimensional Brownian motion
[TABLE]
Let to be a -local martingale defined as
[TABLE]
where denotes the transpose of the column vector . Under Assumption 2, is a -martingale, see Corollary 3.5.16 of [15].
In this case we can define a probability measure through
[TABLE]
where denotes the indicator function of the set and is the expectation operator under .
Let
[TABLE]
be a -dimensional process defined by
[TABLE]
The following theorem provides the distribution of .
Theorem 2.1** (Girsanov Theorem).**
If Assumption 2 is satisfied, then the process is a -dimensional Brownian motion on probability space
Proof 2.2**.**
See, for instance, Theorem 3.5.1 of [15].
It is readily apparent from (4) that is a weak solution to SDE (2) on the probability space . The exact simulation problem becomes sampling under measure . Since follows a normal distribution under measure , we can attempt to use acceptance-rejection to sample . A direct application of acceptance-rejection may proceed by using the distribution of (which is simply normal distribution) as the proposal, which, if acceptance-rejection is applicable, would then be accepted with probability proportional to . However, there are two obstacles when trying to apply such a direct acceptance-rejection approach. First, the presence of the general stochastic integral appearing in the definition of makes the likelihood ratio difficult to directly compute. Second, a direct application of acceptance-rejection requires the likelihood ratio, , to be bounded, which is unfortunately violated.
In order to deal with the first obstacle, we note that it is really not necessary to accurately evaluate the likelihood ratio. In the standard procedure of acceptance-rejection, the likelihood ratio is only used for comparison with an independent uniform random variable. Thus, to address the first obstacle, we can approximate the likelihood ratio with a deterministic error bound, and keep refining until we can decide to either accept or reject the proposal. It turns out, as we shall see in Corollary 2.7, that the same approximation technique can actually be used to localize and also resolve the second obstacle. Then, we will sample the distribution of conditional on the localization of using acceptance-rejection, where the rejection scheme is suggested by Lemma 2.1.
The theoretical foundation for such approximation and refinement strategy is given by Tolerance-Enforced Simulation, which is presented in Theorem 2.3.
Theorem 2.3** (Tolerance-Enforced Simulation).**
Consider a probability space and the following SDE:
[TABLE]
where , and is a -dimensional Brownian motion. Suppose that is continuously differentiable and that is three times continuously differentiable. Then, given any deterministic , there is an explicit Monte Carlo procedure that allows us to simulate a piecewise constant process , such that
[TABLE]
with probability one. Furthermore, for any and , we can simulate conditional on .
Proof 2.4**.**
See Theorem 2.1, Theorem 2.2 and Section 2.1 of [6], where a detailed procedure of Tolerance-Enforced Simulation is also provided.
Remark 2.5**.**
Tolerance-Enforced Simulation is based on the Lévy-Ciesielski Construction of the driving Brownian motion up to a random level. Consequently, is obtained for free when we run TES in which a skeleton of the driving Brownian motion is simulated. In particular, for any and , we can simulate conditional on and .
As a straightforward consequence of Theorem 2.3, we develop a localization procedure of SDE in Corollary 2.7. Before moving forward to state the result, we define some notations that will be therein used.
Definition 2.6**.**
A family of (Borel measurable) sets is said to be a countable continuous partition for a -dimensional random vector , if and only if
The sets in are mutually disjoint, i.e. for ; 2. 2.
* is concentrated on , that is ;* 3. 3.
.
In addition, a function is defined such that if and only if .
Corollary 2.7**.**
Under the setting of Theorem 2.3, let be a countable continuous partition for , then there is an algorithm for simulating that terminates in finite time with probability one. In particular, for any set such that , there is an algorithm for simulating .
Proof 2.8**.**
Notice that , so holds almost surely. Recalling from Theorem 2.3 that a.s., which suggests that
[TABLE]
Thus, we pick and apply TES to simulate the approximation process . If
[TABLE]
then , which terminates the algorithm. Otherwise we keep refining the approximation of TES, by setting , until d\Big{(}Y_{\varepsilon}(1),G_{\Xi_{\mathcal{G}}(Y_{\varepsilon}(1))}^{c}\Big{)}>\varepsilon. The algorithm will ultimately terminate since
[TABLE]
The procedure for simulation of is just a particular case, by setting . The details of the algorithm are given in Algorithm 1.
The algorithm for simulating is performed in a two-stage fashion. At first stage, the likelihood ratio is localized with the help of Corollary 2.7. (The efficiency of the algorithm may be slightly improved if we localize and simultaneously at the first step, then applying acceptance-rejection based on localization. However, this does not solve the problem of the infinite expected running time.) Then, at second stage, is sampled conditional on the result of localization.
We now illustrate how to localize in detail. In order to write the dynamics of in standard form as in (5), we consider the SDE of under measure as follows,
[TABLE]
Let in the rest of this section. As (3) guarantees that is non-negative, it follows immediately that is a countable continuous partition for . Therefore, Algorithm 1 is directly applicable to sample using SDE (6). Without loss of generality, we assume in the rest of this section. It remains to sample conditional on under probability measure .
The following lemma provides an alternative expression of the conditional distribution of , which facilitates the simulation of conditional on localization.
Lemma 2.9**.**
Let independent of everything else under probability measure , then we have
[TABLE]
Proof 2.10**.**
Due to the definition of conditional probability,
[TABLE]
Recall that , we have
[TABLE]
Since on ,
[TABLE]
we can rewrite the expectation into a probability by introducing , namely,
[TABLE]
By substitution, it follows easily that
[TABLE]
It remains to prove that
[TABLE]
By a similar argument we can deduce that
[TABLE]
which ends the proof.
As a direct implication of Lemma 2.9, in order to obtain an example sample for under , given \Xi_{\mathcal{G}}\big{(}(L(1),X(1)\big{)}=i, we can simply simulate conditional on under probability measure . In order to do this sampling under , we can sample first and denote the value by . Then, observing that is the driving Brownian motion under the probability measure , Algorithm 1 is applied to the SDE
[TABLE]
to simulate the indicator function . In addition, according to Remark 2.5, when TES is employed in Algorithm 1, a sample of is also produced simultaneously. Thereafter, the value of is accepted if and only if ; otherwise we repeat the procedure in this paragraph, but we fix the parameter , because has already been sampled under the correct distribution . The output of the algorithm, once the value is finally accepted, follows the distribution of under without any bias.
We summarize the discussion in this section in the following theorem.
Theorem 2.11**.**
If Assumption 2 and 2 are satisfied, then there is an exact simulation algorithm for that terminates with probability one, see Algorithm 2.
3 Exact Simulation for General SDEs
In this section, we will develop an exact simulation algorithm for the SDE (1). We shall fix and the dependence of in some objects (such as the transition density of will be omitted).
We are still going to construct an exact simulation algorithm based on acceptance-rejection in this section. However, for SDEs with non-constant diffusion matrix, applying Girsanov’s theorem no longer provides a Brownian type proposal distribution for acceptance-rejection, so we will construct an acceptance-rejection algorithm based on the density of .
Throughout the rest of this section, we shall assume the following assumptions and conditions.
{assumption}
The drift coefficient is continuously differentiable, and the diffusion coefficient is three times continuously differentiable. Moreover, a strong solution to SDE (1) exists.
{condition}
The probability distribution of is absolutely continuous with respect to Lebesgue measure. In other words, has a density function denoted by with respect to the Lebesgue measure.
{condition}
For any relatively compact set , the density is Lipschitz continuous with Lipschitz constant , i.e.
[TABLE]
{condition}
For any relatively compact set , there exist such that
[TABLE]
As we have seen in the previous section, Assumption 3 is the necessary condition for the applicability of the TES result introduced in Theorem 2.3, which enables us to strongly approximate . Condition 3 will eventually be used to apply the acceptance-rejection technique using an absolutely continuous (with respect to the Lebesgue measure) proposal distribution. Conditions 3 and 3, as we shall see, will allow us to control the bound of the likelihood ratio when applying acceptance-rejection.
It is important to ensure that the constants and are explicitly computable in terms of and only, but we should also emphasize that we are not assuming that the density is known.
There are many ways in which the computability of and can be enforced. For instance, in Appendix A we discuss a set of assumptions involving classical estimators of the fundamental solutions of parabolic equations, which we review in order to compute and explicitly.
The standard use of the acceptance-rejection algorithm requires knowing the density , which seems hopeless for the general SDE problem that we study. An alternative approach is constructing a non-negative, bounded, and unbiased estimator of . While the density is unknown, an unbiased estimator of , denoted by in Section 3.1, can be constructed by means of a local approximation of the density. However, the unbiased estimator may be negative, so it cannot be directly used in acceptance-rejection. To remedy this problem, in Lemma 3.5 we construct a non-negative and unbiased estimator of using a random walk and a suitable Bernoulli factory. However, the estimator is unbounded, so we propose to sample enough information about the SDEs (the ancillary variable ), such that the estimator conditional on the sampled information is locally bounded. Consequently, conditional on the localization of and the additional information , the estimator is bounded and non-negative.
We now state the outline of our exact simulation algorithm. First of all, we apply a localization technique on the countable continuous partition defined as
[TABLE]
Since has countable components, we can enumerate and rewrite it in terms of , where is a unit hypercube. Obviously, Algorithm 1 is applicable to with respect to the countable continuous partition .
Then, we introduce an ancillary random variable coupled with , and simulate . As we shall see, the random variable will play an important role after we introduce a suitable family of random variables whose expectations converge to the density of at a given point. In the end, we will be able to sample conditional on and , using the estimator , which is bounded and non-negative for and .
The following theorem provides the main contribution of this paper.
Theorem 3.1**.**
If Assumption 3 and Condition 3-3 are satisfied, then there is an algorithm for exactly simulating which terminates in finite time with probability one; see Algorithm 3.
The rest of this section is organized as follows. Section 3.1 applies a technique borrowed from Multilevel Monte Carlo to construct the unbiased density estimator and the ancillary random variable . Section 3.2 explains how to sample using acceptance-rejection and a suitable Bernoulli factory [19, 17, 14] conditional on localization. Section 3.3 demonstrates how to sample conditional on , once again using a suitable localization.
3.1 A Multilevel Representation of the Density
In this section, we borrow an idea from Multilevel Monte Carlo [11] to construct an unbiased estimator for , and we also introduce the ancillary random variable .
In order to illustrate our idea, first we need to introduce some notations. For any in , we define as a sequence of open balls centered at , whose radii , form a decreasing sequence and as .
Let denote the volume of a -dimensional ball with radius (i.e. the volume of ). We define to be the average density over the ball , namely,
[TABLE]
Let denote a nonnegative unbiased estimator for , i.e.
[TABLE]
for , where is defined using the same realization for all and . We define and for notational simplicity. It follows immediately that for
The density is first decomposed into an infinite telescoping sums,
[TABLE]
Then, we introduce a randomization technique inspired by Randomized Multilevel Monte Carlo (see [18, 22]). The density can be decomposed as expectation of an infinite sum of estimators, which is truncated to a finite but random level so that the expectation is invariant. The idea is to introduce an integer-valued random variable , which is independent of everything else. Then can be expressed as
[TABLE]
where the third equality follows from Fubini’s theorem, which can be justified if
[TABLE]
We will show in the sequel. Moreover, by the tower property we have
[TABLE]
Therefore, if we define
[TABLE]
it follows easily that
[TABLE]
We now are interested in obtaining bounds for and its expectation for . To this end we first define as -neighborhood of set , which consists of all points that at a distance less than from , i.e.
[TABLE]
It is not hard to observe that is a relatively compact set, to which Conditions 3-3 are applicable. In the following lemma, we will demonstrate that under such conditions, one can judiciously pick the distribution of and the radii in order to establish explicit bounds for and , respectively.
Lemma 3.2**.**
Suppose that and Condition 3-3 are satisfied. If we pick
[TABLE]
for , then we have
[TABLE]
and
[TABLE]
Proof 3.3**.**
Let us construct the lower bound of first. By triangle inequality,
[TABLE]
On the one hand, from the definition of and Condition 3, we can conclude that
[TABLE]
On the other hand, using the triangle inequality
[TABLE]
Then, from Condition 3 we have
[TABLE]
Recall that is the average density over the ball and for . It then follows form the (11) that
[TABLE]
Consequently,
[TABLE]
Combining the above inequality with yields
[TABLE]
Similarly, observing that , for the upper bound we have
[TABLE]
We can also derive an upper bound of :
[TABLE]
which ends the proof.
In the rest of this paper, we will adopt the value of and the distribution of in Lemma 3.2.
Even though we have constructed an unbiased estimator for , acceptance-rejection is not applicable because may be negative and unbounded. In order to apply acceptance-rejection, we need a nonnegative unbiased estimator for , which will be constructed in Lemma 3.5. The idea of such construction borrows an idea from [8]. Let be i.i.d. copies of . We then define
[TABLE]
and
[TABLE]
By Wald’s first equation,
[TABLE]
Note that , but now we have an additional contribution , which can be interpreted as a probability. In order to sample a Bernoulli random variable with such probability, we will need the following result which we refer to as the Bernoulli Factory.
Theorem 3.4** (Bernoulli factory [19, 14]).**
Assume that and are two known constants and that we have an oracle that outputs i.i.d. Bernoullies with parameter . Then, there is an algorithm which takes the output of the oracle and produces a Bernoulli random variable with parameter . Moreover, if is the number of Bernoulli random variables required to output Bernoulli then .
We now can explain how to construct such that .
Lemma 3.5**.**
There exists a family of random variables , such that the following properties hold:
. 2. 2.
. 3. 3.
Given and , there is an algorithm for simulating .
Proof 3.6**.**
Let be a Bernoulli random variable with parameter , and independent of everything else. It follows that
[TABLE]
We write . Property 1 follows from the facts that , and that . Property 2 is justified directly by equation (12). To show that can be simulated, we just need to provide an algorithm for simulating .
Recall that , we have
[TABLE]
Consider Wald’s second equation
[TABLE]
which implies
[TABLE]
Now we shall proceed to simulate the random variable . Consider a random variable with distribution
[TABLE]
Since is the desired Bernoulli random variable with parameter , it then suffices to simulate . Towards this end, we apply acceptance-rejection again using another random variable as proposal, whose distribution is
[TABLE]
Since is nonnegative, Markov’s inequality asserts that
[TABLE]
where the last inequality follows from (13). Also, from the definition of we know that . Consequently, the likelihood ratio between and is given by
[TABLE]
From the above inequality we see that the likelihood ratio is bounded, so the acceptance-rejection procedure is applicable.
Conditional on the proposal , we then introduce a new Bernoulli random variable to decide whether or not the proposal is accepted as . The distribution of is defined as
[TABLE]
Hence it follows from (14) that
[TABLE]
Observe that the indicator is simulable, but its distribution is not explicitly accessible, so it is natural to sample via Bernoulli factory introduced in Theorem 3.4. Due to (15), the function involved in Bernoulli factory is a linear function as following
[TABLE]
We summarize the procedure of simulating in Algorithm 4.
We now introduce an ancillary random variable coupled with in the following way,
[TABLE]
Assuming can be simulated, can be easily simulated by acceptance-rejection as well due to the convenient density representation given by (18). The algorithm for sampling will be explained in the next section.
3.2 Conditional Sampling of
In this section we will focus on the procedure for simulating conditional on .
First we derive from equation (16) the probability mass function of , namely
[TABLE]
Due to the upper bound of given by Lemma 3.2, we have the following inequality
[TABLE]
Simulation of can be achieved by acceptance-rejection. Consider a Bernoulli random variable defined as
[TABLE]
Then the outline of the acceptance-rejection algorithm for simulating would be:
Step 1
Sample from random variable .
Step 2
Sample from uniform distribution .
Step 3
Simulate . If , go to Step
- Otherwise accept as a sample of .
The only difficult step in the above procedure is Step 3, namely, sample , which we will discuss now.
Lemma 3.2 implies that . Let , which is independent of everything else, then is a Bernoulli random variable with parameter . Due to (17), Bernoulli factory given in Theorem 3.4 is applicable to simulate , using as input and using the function
[TABLE]
To conclude, we synthesize the complete steps for simulating in the following algorithm.
3.3 Sampling of
In this section, we will focus on sampling .
Without loss of generality, let us assume throughout the the rest of this section. According to equation (16) and Lemma 3.5, the conditional distribution of can be written as
[TABLE]
where is a constant independent of . Once again, as we shall see can be be simulated by acceptance-rejection. We use the uniform distribution as the proposal distribution, and accept the proposal with probability , so we can accept if and only if , where is independent of everything else. The output of the acceptance-rejection follows the desired distribution. The explicit procedure for simulating is given in the following algorithm.
4 Conclusion
The main contribution of this paper is the construction of the first generic exact simulation algorithm for multidimensional diffusions. The algorithm extensively uses several localization ideas and the application of TES techniques. But it also combines ideas from multilevel Monte Carlo in order to construct a sequence of random variables which ultimately provides an unbiased estimator for the underlying transition density.
Although the overall construction can be implemented with a finite termination time almost surely, the expected running time is infinite. Thus, the contribution of the paper is of theoretical nature, showing that it is possible to perform exact sampling of multivariate diffusions without applying Lamperti’s transformation. However, more research is needed in order to investigate if the algorithm can be modified to be implemented in finite expected time, perhaps under additional assumptions. Alternatively, perhaps some controlled bias can be introduced while preserving features such as positivity and convexity in the applications discussed at the end of the Introduction. To this end, we conclude with a discussion of the elements in the algorithm which are behind the infinite expected termination time.
There are three basic problems that cause the algorithm to have infinite expected termination time. Two of them can be appreciated already from the constant diffusion discussion and involves the use of the localization techniques that we have introduced. The third problem has to do with the application of the Bernoulli factory.
Problem 1: The first problem arises when trying to sample a Bernoulli of the form . Given , sampling such that takes computational cost for any . So, if is a unit hypercube in dimensions, using the density estimates for , we obtain
[TABLE]
for some . Therefore, if is the computational cost required to sample we have that for some
[TABLE]
Therefore,
[TABLE]
which yields that .
Problem 2: The second problem arises in the acceptance-rejection step applied in Lemma 2.9, which requires sampling under conditional on . Directly sampling from this conditional law might be inefficient if is large. However, this problem can be mitigated using rare-event simulation techniques, which might be available in the presence of additional structure on the drift because under , follows a Brownian motion.
Problem 3: Arises because, as indicated in Theorem 3.4, the computational complexity of the Bernoulli factory of a linear function of the form scales in order . In our case, we are able to select and we invoke the Bernoulli factory in Algorithms 4 and 5. In Algorithm 4, , given and . In Algorithm 5, , given . Although the bound which is used to define in Lemma 3.5 is far from optimal, in its current form, , we have that .
\acks
We gratefully acknowledge support from the following NSF grants 1915967, 1820942, 1838676 as well as AFOSR.
Appendix A Transition Density Estimates
In the appendix, we will discuss some assumptions which are sufficient for the applicability of Conditions 3, 3, and 3. In addition, we also give explicit procedures for computing the constants which appears in such Conditions.
Consider a matrix valued function defined as
[TABLE]
{assumption}
Every component of and are three times continuously differentiable. Moreover, there exist a constant such that and for .
{assumption}
There exist constants , such that for all and , we have
[TABLE]
Under Assumption A and A, it is proved in [10] that the SDE (1) possesses a transition density denoted by , which satisfies
[TABLE]
for . Therefore, Condition 3 is proved given assumptions A and A.
In the following Proposition, we will establish Condition 3 via Kolmogorov forward equation.
Proposition A.1**.**
Suppose Assumptions A and A are satisfied, then for any relatively compact set , the density is Lipschitz continuous with Lipschitz constant , i.e.
[TABLE]
Furthermore, can be computed by Algorithm 7.
Proof A.2**.**
Our methodology is closely related to parametrix method introduced in [10]. Following the same scheme, we focus on explicit computation of the constants.
Under Assumptions A and A, is a solution of Kolmogorov forward equation, namely,
[TABLE]
We shall rewrite equation (19) into its non-divergence form as
[TABLE]
where
[TABLE]
and is a uniform parabolic operator. By Assumption A, it follows that
[TABLE]
We denote by the inverse matrix of , and define
[TABLE]
Assumption A implies that for all
[TABLE]
Following the idea of the parametrix method, we also define a partial differential equations with constant coefficients, namely,
[TABLE]
The fundamental solution of function is given by
[TABLE]
where
[TABLE]
[TABLE]
According to Theorem 1.3 and Theorem 1.10 in [10], can be represented by the parametrix method as
[TABLE]
where
[TABLE]
for . Furthermore, the partial derivatives of the fundamental solution admit the following expression,
[TABLE]
Let us pick , then we can derive a bound for as
[TABLE]
For the bound of , note that
[TABLE]
and that
[TABLE]
Combining the definition of and the above two equations imply that
[TABLE]
Applying the inequalities
[TABLE]
and
[TABLE]
we obtain
[TABLE]
by setting
[TABLE]
Similarly, we can derive a bound for and . For ,
[TABLE]
where
[TABLE]
By definition of we can observe that
[TABLE]
Suppose in the sequel. By considering the upper bounds of partial derivatives of , as well as (20) and Assumption A, we obtain
[TABLE]
where
[TABLE]
Now, in order to find a bound for , we need to introduce a technical lemma.
Lemma A.3** (Lemma 1.3 of [10]).**
If and are two constants in , then
[TABLE]
where Beta is Beta function.
Due to (22) and Lemma A.3, we can derive
[TABLE]
where
[TABLE]
By induction we can show that, for any positive integer ,
[TABLE]
It turns out that
[TABLE]
where
[TABLE]
Recalling equation (21), we can apply Lemma A.3 again and conclude that
[TABLE]
where
[TABLE]
Therefore, we obtain an upper bound for , by considering
[TABLE]
Therefore, for all we have
[TABLE]
where
[TABLE]
Next, we will propose a computational procedure for lower bounds of transition density. There is a substantial amount of literature that studies lower bounds for the transition density of diffusions, through analytical approaches or probabilistic approaches. For instance, Aronson [2] develops estimates of lower bounds of fundamental solutions of second order parabolic PDEs in divergence form. Using Malliavin calculus, Kusuoka and Stroock [16] derived a lower bound for the transition density of uniformly elliptic diffusions. Bally [3] generalized the idea of [16] to locally elliptic diffusions. We follow the approach suggested by Sheu [23] and review it in order to find explicit expressions to obtain a computable lower bound.
In order to keep our paper self-contained, first we need to introduce some notations used later.
Let be the generator of Komolgorov backward equation:
[TABLE]
The transition density as a function of coincides with the fundamental solution of Komolgorov backward equation:
[TABLE]
Throughout the rest of this section, we suppose that Assumptions A and A are in force.
Recall that is the inverse matrix of , and define
[TABLE]
For a fixed , we define
[TABLE]
and
[TABLE]
The continuity of the density implies
[TABLE]
For simplicity, we also define the logarithmic transform of and as
[TABLE]
To prepare the analysis, which is based on stochastic control, we introduce the space of control functions by . The class is defined as a family of measurable functions such that the SDE
[TABLE]
has a weak solution that satisfies
[TABLE]
Now we state a lemma that is crucial for proving the main result of this section.
Lemma A.4**.**
Recall the definition of and from previous paragraph, then we have
[TABLE]
Together with (23), we see that
[TABLE]
Proof A.5**.**
See [9].
Theorem A.6**.**
Suppose Assumptions A and A are satisfied. Then, for any relatively compact set , the density has a uniform lower bound in , i.e.
[TABLE]
Furthermore, can be computed by Algorithm 8.
Proof A.7**.**
Finding a lower bound of density is equivalent to finding an upper bound for . Towards this end, it suffices to find an upper bound for that is uniform in . We shall define as a linear function, such that . Write
[TABLE]
It is not hard to see that . Therefore,
[TABLE]
according to lemma A.4. Notice that
[TABLE]
It follows that
[TABLE]
and
[TABLE]
We now apply a Taylor expansion of around , where denotes the derivative of . For notational simplicity, we define
[TABLE]
We also define
[TABLE]
*and similarly for , , and . The Taylor expansion with remainders of third order is given as following *
[TABLE]
where
[TABLE]
Now we integrate all the above terms from [math] to with respect to variable , then take expectations, and analyze the upper bounds of the result term by term.
** Zeroth Order Term: Notice that is in quadratic form, with matrix , so**
[TABLE]
** First Order Terms: We treat first order term first. Noting that is a martingale due to (26), the first order terms**
[TABLE]
** Second Order Terms: We then treat the second order terms. As ,**
[TABLE]
Writing
[TABLE]
and noticing that is symmetric, we see that
[TABLE]
Assumption A implies the Lipschitz continuity of , which gives,
[TABLE]
Due to (27) and Jensen’s inequality,
[TABLE]
Observe that , so we have
[TABLE]
By the Lipschitz continuity of , it follows that
[TABLE]
Therefore,
[TABLE]
Combining (28), (29) and (30) yields
[TABLE]
By the chain rule and Assumption A, we obtain
[TABLE]
where are defined as
[TABLE]
Taking (27) into consideration, we obtain,
[TABLE]
** Third Order Terms: We proceed to analyze the third order remainder terms. Let us consider first. Notice that**
[TABLE]
thus,
[TABLE]
Then, by Burkholder-Davis-Gundy inequality,
[TABLE]
where is the explicit constant in the Burkholder-Davis-Gundy inequality. We can pick (See Proposition 4.4.3 of [20].). To summarize, we obtain
[TABLE]
Next, we consider the other two remainders . Observe that
[TABLE]
and
[TABLE]
Thus, by a similar argument, we can also derive
[TABLE]
and
[TABLE]
where and are defined as
[TABLE]
[TABLE]
Finally, let us consider . Since
[TABLE]
It follows that
[TABLE]
To conclude, let us summarize all the intermediate results, and substitute them into (25), we have
[TABLE]
where is defined as
[TABLE]
Therefore, if we pick , it follows that
[TABLE]
which ends the proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Ait-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. The Annals of Statistics 36, 906–937.
- 2[2] Aronson, D. (1967). Bounds for the fundamental solution of a parabolic equation. Bulletin of the American Mathematical Society 73, 890–896.
- 3[3] Bally, V. (2006). Lower bounds for the density of locally elliptic itô processes. The Annals of Probability 34, 2406–2440.
- 4[4] Beskos, A., Papaspiliopoulos, O. and Roberts, G. O. (2006). Retrospective exact simulation of diffusion sample paths with applications. Bernoulli 1077–1098.
- 5[5] Beskos, A. and Roberts, G. O. (2005). Exact simulation of diffusions. The Annals of Applied Probability 15, 2422–2444.
- 6[6] Blanchet, J., Chen, X., Dong, J. et al. (2017). ϵ italic-ϵ \epsilon - strong simulation for multidimensional stochastic differential equations via rough path analysis. The Annals of Applied Probability 27, 275–336.
- 7[7] Chen, N. and Huang, Z. (2013). Localization and exact simulation of brownian motion-driven stochastic differential equations. Mathematics of Operations Research 38, 591–616.
- 8[8] Fearnhead, P., Papaspiliopoulos, O., Roberts, G. O. and Stuart, A. (2010). Random-weight particle filtering of continuous time processes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 497–512.
