Efficient simulation of high dimensional Gaussian vectors
Nabil Kahale

TL;DR
This paper introduces a linear-storage Markov chain Monte Carlo method for efficiently simulating high-dimensional Gaussian vectors, significantly reducing computational costs compared to traditional Cholesky-based methods.
Contribution
The authors develop a novel MCMC algorithm that approximates Gaussian vectors with linear storage and provide theoretical bounds on its accuracy and efficiency.
Findings
Linear storage cost in dimension d
Faster than standard methods by nearly a factor of d
Provides bounds on Wasserstein distance
Abstract
We describe a Markov chain Monte Carlo method to approximately simulate a centered d-dimensional Gaussian vector X with given covariance matrix. The standard Monte Carlo method is based on the Cholesky decomposition, which takes cubic time and has quadratic storage cost in d. In contrast, the storage cost of our algorithm is linear in d. We give a bound on the quadractic Wasserstein distance between the distribution of our sample and the target distribution. Our method can be used to estimate the expectation of h(X), where h is a real-valued function of d variables. Under certain conditions, we show that the mean square error of our method is inversely proportional to its running time. We also prove that, under suitable conditions, our method is faster than the standard Monte Carlo method by a factor nearly proportional to d. A numerical example is given.
| Cholesky | Simulations | |
| decomposition | ||
| 10 | 0.000 | 0.007 |
| 0.001 | 0.19 | |
| 2.3 | 17.3 | |
| 2306 | 1702 |
| MCMC | MCMC | MCMC | |||
|---|---|---|---|---|---|
| Average | RMSE | comp. time | |||
| 2.38 | 0.119 | 2.7 | 4 | ||
| 3.02 | 0.069 | 2.7 | 26 | ||
| 3.57 | 0.049 | 2.7 | 173 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Efficient simulation of high dimensional Gaussian vectors
Nabil Kahalé ESCP Europe, Labex Refi and Big Data Research Center, 75011 Paris, France; e-mail: [email protected].
Abstract
We describe a Markov chain Monte Carlo method to approximately simulate a centered -dimensional Gaussian vector with given covariance matrix. The standard Monte Carlo method is based on the Cholesky decomposition, which takes cubic time and has quadratic storage cost in . In contrast, the storage cost of our algorithm is linear in . We give a bound on the quadractic Wasserstein distance between the distribution of our sample and the target distribution. Our method can be used to estimate the expectation of , where is a real-valued function of variables. Under certain conditions, we show that the mean square error of our method is inversely proportional to its running time. We also prove that, under suitable conditions, our method is faster than the standard Monte Carlo method by a factor nearly proportional to . A numerical example is given.
Keywords: Cholesky factorisation, Gaussian vectors, Markov chains, Monte Carlo simulation
1 Introduction
Monte Carlo simulation of Gaussian vectors is commonly used in a variety of fields such as weather prediction [gel2004], finance [Hull12, Chap. 13], and machine learning [russo2014learning, russoBVR2016]. This paper considers the problem of efficiently sampling a -dimensional Gaussian vector with a given mean and a given covariance matrix . Since any Gaussian random variable is an affine function of a standard Gaussian random variable, we assume throughout the paper that the components of are standard Gaussian random variables, and so the diagonal elements of are . Then can be simulated [glasserman2004Monte, Subsection 2.3.3] as follows. Let be a -dimensional vector of independent standard Gaussian random variables, and let be a matrix such that
[TABLE]
Then , i.e. is a -dimensional Gaussian vector with covariance matrix .
Such a matrix can be computed in time and space using Cholesky factorization or one of its variants [golub2013matrix, Subsections 4.2.5 and 4.2.8]. Once is calculated, can be computed in time. But, in several applications (see e.g. [gel2004]), is in the tens of thousands or more, and so the calculation of a Cholesky factorization on a standard computer may not be possible in practice, due to the high running time and/or storage cost. Alternative methods for generating Gaussian vectors have been developed in special cases. For instance, exact and efficient simulation of Gaussian processes on a regular grid in , , can be performed [WoodChan94, dietrich1997fast] using Fast Fourier transforms if the covariance matrix is stationary with respect to translations. Similar Fast Fourier Transform methods can be used for exact simulation of fractional Brownian surfaces on a regular mesh [stein2002fast]. Sparse Cholesky decomposition [rue2001fast] and iterative methods [aune2013iterative] have been proposed to generate efficiently Gaussian vectors when the precision matrix is sparse.
This paper develops a new Markov Chain Monte Carlo method for approximate generation of a Gaussian vector with correlation matrix . Our method is straightforward to implement and can be applied to any correlation matrix whose elements are known or easy to compute. It has a total storage cost of . At iteration , it produces a -dimensional vector whose distribution converges (according to the quadratic Wasserstein distance) to as goes to infinity. Assuming each element of can be computed in time, the running time of each iteration is . Our method can for instance be used to approximately simulate spatial Gaussian processes of various types such as Matérn, powered exponential, and spherical on any subset of size of with storage cost (background on spatial statistics can be found in [diggle2003introduction]). While FFT methods can simulate such processes on regular grids, certain applications (e.g. [gel2004]) require the simulation of spatial Gaussian processes on non-regular subsets of .
We now describe our method in more detail. Let ), , be a deterministic or a random sequence in , and let ), , be a sequence of independent standard Gaussian random variables, independent of , . Define the Markov chain of -dimensional column vectors , , as follows. Let and, for , let
[TABLE]
where is the -dimensional column vector whose -th coordinate is , and remaining coordinates are [math] (if and is a vector, is the scalar product of and ). Since is the -th column of , can be calculated from in time, with storage cost . The motivation behind (1.2) is explained in Section 2, where we also show that (1.2) is a variant of the hit-and-run algorithm. A general description of the hit-and-run algorithm can be found in [smith1984].
Section 3 shows that, if are independent random variables uniformly distributed over , then the quadratic Wasserstein distance between the distribution of and is at most . The quadratic Wasserstein distance between two probability distributions and over is defined as
[TABLE]
To put this result into perspective, denote by the distribution of , for . Then , by [dowson1982frechet, Eq. 16]. Thus, after steps, which can be performed in total time, the quadratic Wasserstein distance between the distribution of and is at most . Section 4 shows that, if is a real-valued function on satisfying certain conditions, and are independent random variables uniformly distributed over , then , where , is well approximated by . More precisely, Theorem 4.1 give explicit bounds on the mean square error
[TABLE]
of this estimator. For instance, if is -Lipschitz, Theorem 4.1 implies that . We give an example with where this bound is tight, up to a constant. To our knowledge, for general , no previous methods achieve a similar tradeoff between the running time and the Wasserstein distance, or between the running time and the mean square error, when . Section 5 assumes that is positive definite and shows that, under suitable conditions, as goes to infinity, where is a constant. It also gives an explicit geometric bound on the Wasserstein distance between the distribution of and , and an explicit bound on the mean square error of a related estimator of . Section 6 gives examples and a numerical simulation, and shows that, under certain conditions, the total time needed by our method to achieve a given standarized mean square error is . Concluding remarks are given in a closing section.
An introduction to MCMC methods can be found in [dellaportas2003introduction]. Our proof-techniques are based on coupling arguments. Conductance techniques can also be used to analyse mixing properties of Markov chains (see e.g. [sinclair1992improved, kahale1997semidefinite, diaconis2009markov]). Chernoff bounds for reversible discrete Markov chains in terms of the spectral gap have been established in [kahale1997large, gillman1998chernoff]. Previous theoretical results on the performance of hit-and-run algorithms have focused on their mixing properties (see [VempalaCousins2016, BRS1993hit] and references therein). For instance, after appropriate preprocessing, the hit-and-run algorithm for sampling from a convex body [lovasz1999hit] produces an approximately uniformly distributed sample point after steps. For general log-concave functions, after appropriate preprocessing [lovaszVempala2006], the hit-and-run algorithm mixes in steps. Note that, while the algorithms in [lovasz1999hit, lovaszVempala2006] require a pre-processing phase to make the target distribution ”well-rounded”, our method does not. When is positive definite, the Metropolis and Gibbs algorithms, and an algorithm for sampling from general log-concave functions using a Langevin stochastic differential equation [Moulines2016sampling], could be used to approximately sample from , but these algorithms require the calculation of . Standard algorithms for inverting a matrix take time and space, however, and so the pre-processing cost of these algorithms is as high as the Cholesky decomposition cost. Omitted proofs are in the appendix.
2 Motivation, notation and general properties
We motivate (1.2) by assuming that is positive definite, which implies the existence of a lower-triangular matrix satisfying (1.1) and such that and have the same first column [glasserman2004Monte, Subsection 2.3.3]. Let , where is the identity matrix, and a standard Gaussian random variable independent of . Set , and let and ). Note that is obtained from by replacing its first component with , and so . Hence . But, since (resp. ) is the first column of (resp. ), . Furthermore, since the first line of is , and so . Thus
[TABLE]
and so the RHS of (2.1) is a centered Gaussian vector with covariance matrix . (1.2) is obtained from (2.1) by replacing , , and with , , and , respectively.
We now describe a generic standard hit-and-run algorithm to approximately sample from a real-valued density function on . First, choose from a certain distribution. If we are currently at point , we first choose a random vector according to a certain distribution, and then set
[TABLE]
where is a random variable whose density at is proportional to .
If is positive definite, the density of at is , up to a multiplicative constant. In the standard hit and run algorithm, is a uniformly distributed unit vector. However, if we set and , it follows after some calculations that . Thus, we can choose , which implies by induction that .
If is a -dimensional vector, denote by the -norm of . For any matrix , the matrix is positive semi-definite. Let
[TABLE]
be the Frobenius norm of the matrix . If and are symmetric matrices, we say that if is positive semi-definite, and we denote by the largest eigenvalue of . If is a centered -dimensional random vector such that is finite, let denote the covariance matrix of .
For , let and . Note that . Thus is a unit vector and is a projection matrix, i.e. . Define the random sequence of -dimensional vectors , , as follows: and
[TABLE]
By rewriting (1.2) as
[TABLE]
it can be shown by induction that .
For , let , with , and let . Let be a -dimensional vector of independent standard Gaussian random variables which is independent of the sequence , . For , let
[TABLE]
Since for a positive semi-definite matrix , the following lemma implies that, if the sequence , , is deterministic, then is centered Gaussian and
[TABLE]
As a consequence, any entry of is upper-bounded, in absolute value, by .
Lemma 2.1**.**
If the sequence , , is deterministic, then, for , and are centered Gaussian vectors, , and
[TABLE]
Furthermore,
[TABLE]
, and
[TABLE]
Lemma 2.1 forms the basis for the proofs of our main results. Indeed, if goes to [math] as goes to infinity then, by (2.4), converges to as goes to infinity. Furthermore, if both and are sufficiently large then, by (2.3), and are nearly independent and, by (2.2), (resp. ) is close to (resp. ). Thus, and are nearly independent, as well, and so are and . These arguments are informal since we have not defined the terms ”nearly independent” and ”close”, but give intuition behind the proofs of Theorems 3.1 and 4.1.
Lemma 2.2 below generalizes some results of Lemma 2.1 when the sequence , , is random.
Lemma 2.2**.**
If the sequence , , is deterministic or random, the quadratic Wasserstein distance between the distribution of and is at most . Furthermore, , is centered, , and
[TABLE]
Proof.
By Lemma 2.1, conditioning on , . Thus, the unconditional distribution of is , and . On the other hand, by (2.5),
[TABLE]
and so, by the tower law,
[TABLE]
By (1.3), it follows that the quadratic Wasserstein distance between the distribution of and is at most . On the other hand, it follows from Lemma 2.1 that . By the tower law, we infer that is centered. Similarly, by Lemma 2.1,
[TABLE]
Hence, by the tower law, , and so . Once again, (2.6) follows from (2.5) by the tower law. ∎
3 Upper bound on the Wasserstein distance
We first show the following lemma.
Lemma 3.1**.**
If is a projection matrix and is a matrix, then .
Proof.
Let . Since , , and so . Since is positive semi-definite, so is , and so . Equivalently, . This concludes the proof. ∎
Under the conditions stated in Theorem 3.1 below, by an argument similar to that surrounding Lemma 2.1, it follows from (3.2) that each entry of the matrix is at most in absolute value.
Theorem 3.1**.**
Assume that , , are independent random variables uniformly distributed over . For , the quadratic Wasserstein distance between the distribution of and is at most ,
[TABLE]
and the sequence is decreasing. Furthermore, for , is centered, , and
[TABLE]
Proof.
For any non-negative integer ,
[TABLE]
Thus,
[TABLE]
and so
[TABLE]
Let be a unit -dimensional vector. For , set . Since is a projection and for , it follows that
[TABLE]
Hence
[TABLE]
As , we conclude that
[TABLE]
But
[TABLE]
and so, for any unit vector ,
[TABLE]
Thus, any diagonal entry of the matrix is at most . Hence,
[TABLE]
which implies (3.1). On the other hand, since , Lemma 3.1 shows that . Hence the sequence is decreasing. Since and have the same distribution, . Thus, the sequence is decreasing as well and so, by (3.1), . We conclude the proof using Lemma 2.2. ∎
4 Bounding the mean square error
We now define the class of -Lipschitz functions, with and .
Definition 4.1**.**
Let be a positive semi-definite matrix. We say that a real-valued Borel function of variables is -Lipschitz if
[TABLE]
for any centered Gaussian column vector with and , where and are -dimensional.
We say that a function is -Lipschitz if it is -Lipschitz. For instance, if a real-valued -Lipschitz function on , i.e. for , in , then is -Lipschitz for any positive semi-definite matrix . The following lemma gives an example of a -Lipschitz function in which is not -Lipschitz for any .
Lemma 4.1**.**
Let . Then is -Lipschitz for .
Let be a -Lipschitz function. Set and , where , and denote by the real-valued function on defined by . Note that if , since . In particular, by Lemma 2.2, for . In other words, , and so the average of , , where is a burn-in period, is an unbiased estimator of . The variance of this estimator equals , which we bound using Lemma 4.2 below. Choices for the parameters and will be given in the sequel.
Lemma 4.2**.**
Let , and be integers, with . Then
[TABLE]
For , let
[TABLE]
The second moments of and of can be bounded as follows.
Lemma 4.3**.**
Let and be integers, with . Then
[TABLE]
and
[TABLE]
Proof.
Assume first that the sequence , is deterministic. For ,
[TABLE]
The second equation follows from the relations and , and the last equation follows from (2.5). Thus, for any random sequence ,
[TABLE]
The first inequality in the lemma then follows by taking expectations and using the tower law. The second inequality follows from the first one and the Cauchy-Schwartz inequality. ∎
Combining Lemmas 4.2 and 4.3 yields the following.
Lemma 4.4**.**
Let , and be integers, with . If are independent random variables uniformly distributed over , then
[TABLE]
Proof.
Since , for any fixed ,
[TABLE]
and
[TABLE]
Hence, by Lemma 4.2,
[TABLE]
As
[TABLE]
it follows by Lemma 4.3 that
[TABLE]
Since , this concludes the proof. ∎
By applying Lemma 4.4 with and using (3.1), we get the following upper bound on .
Theorem 4.1**.**
Let be a -Lipschitz function on , with , where . If are independent random variables uniformly distributed over , then
[TABLE]
4.1 Tightness of MSE bound
We now give an example where the bound on the mean square error in Theorem 4.1 is optimal, up to a multiplicative constant. Let and let for . Thus is a -Lipschitz function on . By [forbes2011statistical, Sec. 11.3],
[TABLE]
which implies by induction that . Furthermore, it follows from (1.2) and by induction on that has at most non-zero components, and that the non-zero components of are independent standard Gaussian random variables. Thus , and so for . Thus,
[TABLE]
Hence, for ,
[TABLE]
and so
[TABLE]
Thus, the LHS of (4.3) is within an absolute constant from its RHS.
5 The positive semi-definite case
Let be a -Lipschitz function. Define , and as in Section 4. This section assumes that is positive definite and that, , , are independent random variables uniformly distributed over . Denote by the smallest eigenvalue of , and set
[TABLE]
The following lemma, combined with Lemma 2.2, implies a geometric bound on the Wasserstein distance between the distribution of and if is positive semi-definite.
Lemma 5.1**.**
For , .
Proof.
We use the same notation as in the proof of Theorem 3.1. Since the largest eigenvalue of is at most , it follows from (3.4) that
[TABLE]
On the other hand, (3.3) implies that
[TABLE]
and so . Hence, for any unit-vector ,
[TABLE]
Thus each diagonal element of is at most , and so . This concludes the proof. ∎
As noted before, is an unbiased estimator of . The following lemma implies that, if , the variance of this estimator is as goes to infinity.
Lemma 5.2**.**
As goes to infinity, converges to , where
[TABLE]
Theorem 5.1 below implies that, if , as goes to infinity.
Theorem 5.1**.**
As goes to infinity, converges to .
Proof.
Define and via 4.2, with . Let . By Lemma 4.3,
[TABLE]
The second inequality follows from Jensen’s inequality, and the last one from Lemma 5.1. Hence, by the Cauchy-Schwartz inequality,
[TABLE]
The last inequality follows from the relation (which is a consequence of Taylor’s formula with Lagrange remainder). On the other hand,
[TABLE]
But, by the Cauchy-Schwartz inequality and Lemma 5.2, for sufficiently large ,
[TABLE]
Using Lemma 5.2 once again, it follows that converges to as goes to infinity. This concludes the proof. ∎
We now show that the estimator of has an exponentially decreasing bias. We also give a bound on the mean square error of this estimator which, in certain cases, is smaller than the RHS of (4.3).
Theorem 5.2**.**
Set . For and even ,
[TABLE]
Furthermore, if ,
[TABLE]
Proof.
Set and define via (4.2). Using calculations similar to the proof of Theorem 5.1, it follows that
[TABLE]
Since for , it follows that
[TABLE]
This implies (5.1) since the ’s are centered.
We now prove (5.2). We first note that, by applying (4.1) with , , and independent, if follows after some calculations that
[TABLE]
and so
[TABLE]
On the other hand, by Lemma (5.1) and Jensen’s inequality,
[TABLE]
The second equation follows from the inequality . Thus,
[TABLE]
Thus, by applying Lemma 4.4, it follows that
[TABLE]
By (5.4), this implies (5.2). ∎
6 Examples
Let be a real-valued Borel function of variables that can be calculated at any point in time. Assume that is positive definite, and that both and exist and are finite, where . Denote by MCMC the algorithm that generates via (1.2), where the ’s are independent and identically distributed over , and estimates via
[TABLE]
where is a burn-in period. The standard Monte Carlo algorithm, referred to later as MC, first calculates a lower-triangular matrix satisfying (1.1) in time via the procedure described in [glasserman2004Monte, Subsection 2.3.3], then generates independent -dimensional vectors of independent standard Gaussian random variables , and estimates by taking the average of , . The variance of this estimator is .
6.1 Comparison of the MC and MCMC methods
The mean square error of the estimator of is defined as
[TABLE]
Given , samples of the MC algorithm are needed to ensure that (ignoring rounding issues). Calculating the Cholesky decomposition and , , takes
[TABLE]
time. On the other hand, for , if is -Lipschitz and , denote by the running time of the MCMC algorithm needed to ensure that , using burn-in period . If , by Theorem 4.1, after steps of the MCMC algorithm, . Thus,
[TABLE]
Thus, if there is a constant independent of such that (this is equivalent to saying that (5.3) is tight, up to a constant), then, for fixed ,
[TABLE]
Similarly, under the assumptions of Theorem 5.2,
[TABLE]
Hence, if there are positive constants and independent of such that and , then, for fixed ,
[TABLE]
Examples where (6.1) or (6.2) hold are given below.
6.2 A Basket option
Consider a set of stocks . For , denote by the price of at time . Assume that . A Basket call option with maturity and strike is a financial derivative that pays the amount at time . Under a standard pricing model [glasserman2004Monte, Subsection 3.2.3], the price of a basket option is , where is a centered Gaussian vector with covariance matrix given by for , and, for ,
[TABLE]
where is the risk-free rate, and is the volatility of . Assume that the ’s are bounded by a constant independent of . It follows from Lemma 6.1 below that is -Lipschitz, where as goes to infinity.
Lemma 6.1**.**
Let , where for . Then is -Lipschitz, where .
Thus, by Theorem 4.1, steps of the MCMC algorithm are sufficient to ensure that , and so . On the other hand, if (which is the case [Hull12, Sec. 25.14] if the volatilities and correlations are lower-bounded by a constant and , for instance), then . In practice, though, is quite small.
6.3 The multivariate normal function
Let , and . Set for , where if and only if for . The following lemma and the analysis in Subsection 6.1 show that, if there are positive constants and independent of such that , , and , then (6.2) holds for fixed
Lemma 6.2**.**
The function is -Lipschitz for any positive semi-definite matrix .
Proof.
Let . It is easy to see that if and, for , and , then . Hence
[TABLE]
By Chebyshev’s inequality, On the other hand, a simple calculation shows that, for , the density of any centered Gaussian random variable at is at most . Hence,
[TABLE]
and a similar relation holds for . Thus,
[TABLE]
Minimizing over implies that
[TABLE]
∎
6.4 The maximum function
For , let . Then is -Lipschitz. Let , where is a correlation matrix. Standard calculations show that for . Thus, it follows after some calculations that for and , and so . On the other hand, since
[TABLE]
where is the first coordinate of ,
[TABLE]
Since , we conclude that . Hence (6.2) holds for fixed if there is a positive constant independent of such that .
6.5 A numerical example
In [gel2004], the temperatures at a given set of locations in at a given future time are modelled as a Gaussian vector where is a known function of , and, for two different locations and ,
[TABLE]
where , and are positive constants, with . By simulating the vector , we can estimate the expected maximum temperature at these locations.
For simplicity of presentation, we assume thereafter that is centered for , and so is a standard Gaussian random variable. The correlation matrix of the Gaussian vector is given by
[TABLE]
for . Since the matrix is positive semi-definite [cressie2015statistics, Section 2.5], . We use the MC and MCMC algorithms to estimate Note that , where , and for . The analysis in Subsection 6.4 shows that (6.2) holds for fixed .
Our numerical simulations assume that , , , and that, for , the first (resp. second) coordinate of equals (resp. (, where . After scaling, the , , and parameters are close to those estimated in [gel2004]. Our experiments were performed on a desktop PC with an Intel Pentium 2.90 GHz processor and 4 Go of RAM, running Windows 7 Professional. The codes were written in the C++ programming language, and the compiler used was Microsoft Visual C++ 2013. Computing times are given in seconds. Table 1 gives computing times of the MC method. Running the Cholesky factorization for without external storage causes memory overflow. Extrapolating the results in Table 1 shows that the Cholesky factorization would take a few weeks for if enough internal memory were available. Table 2 compares the MC and MCMC methods for up to . For the tested parameters, the MCMC method is more efficient than the MC method, and its efficiency increases with . For , and , the MCMC average is , and is calculated in seconds. Lemma 6.2 shows that the MCMC method can also be used to calculate the probability that the maximum temperature over a set of points exceeds a certain level.
6.6 Other examples
Spatial Gaussian processes of various types such as Matérn, powered exponential, and spherical, restricted to any subset of size of , are centered Gaussian vectors with a covariance matrix whose entries can be calculated in time. For instance, the covariance matrix of a powered exponential process restricted to a subset of is given by [diggle2003introduction]
[TABLE]
where and . The techniques of our paper can be used to simulate the restriction of such processes to any finite subset of .
7 Conclusion
We have shown how to simulate a Markov chain , , such that the Wasserstein distance between the distribution of and is at most . It takes time to generate each step of the chain. Whereas the standard Monte Carlo simulation method has storage cost, the storage cost of our method is . Furthermore, by running the chain steps, our method can estimate , where is a centered Gaussian vector with covariance matrix and is a real-valued function of variables. Under certain conditions, we give an explicit upper bound on the mean square error of our estimate, and show that it is inversely proportional to the running time. We also prove that, in certain cases, the total time needed by our method to obtain a given standarized mean square error is time, whereas the standard Monte Carlo method takes time.
Appendix A Proof of Lemma 2.1
Recall first that if and are independent centered -dimensional random vectors such that and are finite, then , , and .
It can be shown by induction that is a linear combination of , and so is a centered Gaussian vector. Hence is also centered and Gaussian. Furthermore, is centered Gaussian since it is a linear combination of and of . Thus, and are independent. For ,
[TABLE]
Thus, since and are independent,
[TABLE]
It follows by induction that , and so . Thus, (2.3) holds when . Furthermore, since and are independent for , it follows from (A.1) that . It follows by induction on that for , Hence (2.3).
Since is a linear combination of , the vectors and are independent. Thus, as and , it follows from (2.2) that
[TABLE]
Hence (2.4), which implies that . On the other hand, by (2.2), and so,
[TABLE]
Hence,
[TABLE]
The second equation follows from the relation . Using (2.4), this implies (2.5). This concludes the proof. ∎
Appendix B Proof of Lemma 4.1
Let be a Gaussian vector in , with , and . We show the following, which immediately implies Lemma 4.1:
[TABLE]
Let
[TABLE]
Since for any centered Gaussian random variable ,
[TABLE]
The last equation follows from the inequalities and (which is a consequence of Taylor’s formula with Lagrange remainder) for any real number . On the other hand, since , and so
[TABLE]
Hence, as for any real number ,
[TABLE]
This implies (B.1).
Appendix C Proof of Lemma 4.2
We first prove the following lemma which implies that, under certain conditions, the covariances between the components of and can be used to bound the covariance between and .
Lemma C.1**.**
Let be a Gaussian column vector in , with and , where and are matrices, with and . Let and be two -Lipschitz functions on . Then
[TABLE]
Proof.
We first note that, if and , are independent, then
[TABLE]
Furthermore, if is a centered Gaussian vector in with and , and is a -Lipschitz function on , then
[TABLE]
The second equation follows from the fact that is a centered Gaussian vector with covariance matrix .
We now prove the lemma. Let , , , and be independent -dimensional Gaussian vectors such that , , and . Note that and exist since and are positive semi-definite. Since and is independent of , . Similarly, . Also, since , and are independent and centered,
[TABLE]
Thus, the covariance matrix of the Gaussian vector is \left(\begin{array}[]{cc}{I}&{A}{B}^{T}\\ {B}{A}^{T}&{I}\\ \end{array}\right). Hence the centered Gaussian vectors and have the same covariance matrix, and so they have the same distribution. Thus,
[TABLE]
The second equation follows by applying (C.1) to each of the pairs , , and . The fourth equation follows from (C.2) and the relations and .
Hence
[TABLE]
Replacing with concludes the proof.∎
We now prove Lemma 4.2. Assume first that the sequence , is deterministic. By Lemma 2.1, for , and
[TABLE]
Let and , with . Since is the product of projection matrices, it follows by induction on that for , and so . Similarly, . Since , it follows from Lemma C.1 that
[TABLE]
Thus,
[TABLE]
The third equation follows from observing that and are centered and have standard deviation . The last equation follows from the fact that each term (resp. ) occurs at most twice in the last line.
Assume now that the sequence , is random. By (C.4),
[TABLE]
We conclude the proof by taking expectations and using the tower law. ∎
Appendix D Proof of Lemma 5.2
Let and, for , let . Since , , is a time-homogeneous Markov Chain and , for . Hence
[TABLE]
and so
[TABLE]
On the other hand, by applying (C.3) with and , it follows that
[TABLE]
with . The second equation follows from Jensen’s inequality. But, since ,
[TABLE]
Furthermore, as , it follows from Theorem 3.1 that
[TABLE]
Since , we conclude that Hence, by Lemma 5.1, the series is absolutely convergent. Thus, the RHS of (D.1) converges to as goes to infinity. But since , and so . This concludes the proof.∎
Appendix E Proof of Lemma 6.1
Since the function is -Lipschitz with respect to , we can assume without loss of generality that . Let be a centered Gaussian vector with and , where and have dimension . Let (resp. ) be the -th component of (resp. ) and . It follows from Lemma 4.1 that
[TABLE]
On the other hand, by the Cauchy-Schwartz inequality,
[TABLE]
Taking expectations, it follows that
[TABLE]
as desired. ∎
Appendix F Estimating the mean square error
This section describes a numerical method to estimate the mean square error of the MCMC method for moderate values of . The method first calculates a matrix satisfying (1.1) using Cholesky factorization. An unbiased estimator of is then computed as follows. Define the Markov chain , , by setting and, for , let
[TABLE]
Thus satisfies the same recursion as . By rewriting (F.1) as
[TABLE]
and using calculations similar to those in the proof of Lemma 2.1, it can be shown by induction that . Thus,
[TABLE]
is an unbiased estimator of . Since equals the variance of plus its square bias,
[TABLE]
In Subsection 6.5, is estimated for using (F.2), each term in the RHS of (F.2) being calculated via independent simulations of and . For any , the same random variables and are used to calculate and via (1.2) and (F.1).
Appendix G Acknowledgments
This research has been presented at the Paris Bachelier Seminar, November 2016, and at the 2nd IMA Conference on the Mathematical Challenges of Big Data, London, December 2016. The author thanks Nicolas Chopin, Petros Dellaportas, Peter Glynn, Emmanuel Gobet, Benjamin Jourdain, and Didier Marteau for helpful conversations. This work was achieved through the Laboratory of Excellence on Financial Regulation (Labex ReFi) under the reference ANR-10-LABX-0095. It benefitted from a French government support managed by the National Research Agency (ANR).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] \harvarditem [Aune et al.]Aune, Eidsvik and Pokern 2013 aune 2013 iterative Aune, E., Eidsvik, J. and Pokern, Y. \harvardyearleft 2013 \harvardyearright . Iterative numerical methods for sampling from high dimensional Gaussian distributions, Statistics and Computing 23 (4): 501–521.
- 2[2] \harvarditem [Bélisle et al.]Bélisle, Romeijn and Smith 1993 BRS 1993 hit Bélisle, C. J., Romeijn, H. E. and Smith, R. L. \harvardyearleft 1993 \harvardyearright . Hit-and-run algorithms for generating multivariate distributions, Mathematics of Operations Research 18 (2): 255–266.
- 3[3] \harvarditem Cousins and Vempala 2016 Vempala Cousins 2016 Cousins, B. and Vempala, S. \harvardyearleft 2016 \harvardyearright . A practical volume algorithm, Mathematical Programming Computation 8 (2): 133–160.
- 4[4] \harvarditem Cressie 2015 cressie 2015 statistics Cressie, N. \harvardyearleft 2015 \harvardyearright . Statistics for spatial data , John Wiley & Sons, New York.
- 5[5] \harvarditem Dellaportas and Roberts 2003 dellaportas 2003 introduction Dellaportas, P. and Roberts, G. O. \harvardyearleft 2003 \harvardyearright . An introduction to MCMC, Spatial statistics and computational methods , Springer New York, pp. 1–41.
- 6[6] \harvarditem Diaconis 2009 diaconis 2009 markov Diaconis, P. \harvardyearleft 2009 \harvardyearright . The Markov chain Monte Carlo revolution, Bulletin of the American Mathematical Society 46 (2): 179–205.
- 7[7] \harvarditem Dietrich and Newsam 1997 dietrich 1997 fast Dietrich, C. and Newsam, G. N. \harvardyearleft 1997 \harvardyearright . Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrix, SIAM Journal on Scientific Computing 18 (4): 1088–1107.
- 8[8] \harvarditem [Diggle et al.]Diggle, Ribeiro Jr and Christensen 2003 diggle 2003 introduction Diggle, P. J., Ribeiro Jr, P. J. and Christensen, O. F. \harvardyearleft 2003 \harvardyearright . An introduction to model-based geostatistics, Spatial statistics and computational methods , Springer, New York, pp. 43–86.
