Donsker's theorem in {Wasserstein}-1 distance
L. Coutin (IMT), Laurent Decreusefond (INFRES, LTCI, DIG)

TL;DR
This paper establishes bounds on the Wasserstein-1 distance between a random walk and Brownian motion, providing new estimates and applications to convergence rates of local times.
Contribution
It introduces a novel estimate of the Lipschitz modulus of Stein's equation solution to analyze convergence in Wasserstein-1 distance.
Findings
Derived explicit bounds for Wasserstein-1 distance between random walk and Brownian motion
Provided a rate of convergence for the local time at zero of Brownian motion
Developed a new method based on Lipschitz estimates of Stein's equation
Abstract
We compute the Wassertein-1 (or Kolmogorov-Rubinstein) distance between a random walk in and the Brownian motion. The proof is based on a new estimate of the Lipschitz modulus of the solution of the Stein's equation. As an application, we can evaluate the rate of convergence towards the local time at 0 of the Brownian motion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeometric Analysis and Curvature Flows · Point processes and geometric inequalities · Random Matrices and Applications
Donsker’s theorem in Wasserstein-1 distance
L. Coutin
Institute of Mathematics
Université Toulouse 3
Toulouse, France
and
L. Decreusefond
LTCI, Tĺécom Paris, Institut polytechnique de Paris
Paris, France
Abstract.
We compute the Wassertein-1 (or Kolmogorov-Rubinstein) distance between a random walk in and the Brownian motion. The proof is based on a new estimate of the Lipschitz modulus of the solution of the Stein’s equation. As an application, we can evaluate the rate of convergence towards the local time at [math] of the Brownian motion.
Key words and phrases:
Donsker theorem, Malliavin calculus, Stein’s method, Wasserstein distance
1991 Mathematics Subject Classification:
60F15,60H07,60G15,60G55
The first author is partially supported by ANR MESA
1. Motivations
For a complete, separable metric space , the topology of convergence in distribution is metrizable [8] by considering the so-called Kolmogorov-Rubinstein or Wasserstein-1 distance:
[TABLE]
where
[TABLE]
The formulation (1) is well suited to evaluate distance by the Stein’s method. When , there is no particular difficulty to evaluate the K-R distance when is the Gaussian distribution. When, , it is only recently (see [9, 12, 15] and references therein) that some improvement of the standard Stein’s method has been proposed to get the K-R distance to the Gaussian measure on . The bottleneck is the estimate of the Lipschitz modulus of the second order derivative of the solution of the Stein’s equation when is only assumed to be Lipschitz continuous. Namely, for , for any , consider the function
[TABLE]
where is the standard Gaussian measure on . In dimension , the Stein’s equation reads as
[TABLE]
so that
[TABLE]
and the subsequent computations require to evaluate only the Lipschitz modulus of . For , it is classical to see that is infinitely differentiable and that
[TABLE]
where is the -th Hermite polynomial. On the other hand, if is -times differentiable, we have
[TABLE]
According to (3), we get
[TABLE]
It is apparent that the Lipschitz modulus of simply depends on the Lipschitz modulus of . However, in higher dimension, the Stein’s equation becomes
[TABLE]
whose solution is formally given by (2). The form of (5) entails that we need to estimate the Lipschitz modulus of , which requires to use (3) for . Unfortunately, we have to realize that
[TABLE]
Hence, until the very recent papers [9, 15], the strategy was to assume that is Lipschitz, apply once (4) to compute the first derivative of and then apply (3) to this expression:
[TABLE]
This means that instead of computing the supremum in the right-hand-side of (1), over Lipschitz functions, it is computed over functions whose first derivative is Lipschitz. This also defines a distance, which does not change the induced topology but the accuracy of the bound is degraded.
In infinite dimension, a new problem arises which is best explained by going back to the roots of the Stein’s method in dimension . Consider that we want to estimate the K-R distance in the standard Central Limit Theorem. Let be a sequence of independent, identically distributed random variables with and . Let . The Stein-Dirichlet representation formula [6] states that
[TABLE]
where
[TABLE]
with obvious notations. Now,
[TABLE]
The trick, which amounts to an integration by parts for a Malliavin structure on independent random variables (see [7]), is to write
[TABLE]
in view of the independence of the random variables. Then, we use the fundamental theorem of calculus in this expression around the point :
[TABLE]
Since,
[TABLE]
we get
[TABLE]
This formula confirms that the crux of the matter is now to estimate uniformly the Lipschitz modulus of . It also shows how we get the order of convergence. We have one occurrence of in the definition of , which appears in the expression of . The same factor appears a second time when we proceed to the Taylor expansion and then, it will appear a third time when we plug (3) into (7). This means that we have a factor which is summed up times, hence the rate of convergence which is known to be .
Now, if we are interested in the Donsker theorem, the process whose limit we would like to assess is
[TABLE]
where
[TABLE]
For reasons that will be explained below, the analog of the second order derivatives will involve
[TABLE]
where is the Malliavin derivative, is the Cameron-Martin space
[TABLE]
and
[TABLE]
Recall that in the context of Malliavin calculus, this space is identified to its dual which means that the dual of is not itself. The difficulty is then that we do not have a factor in the definition of and it is easily seen that , hence no multiplicative factor will pop up in (8). In [4], we bypassed this difficulty by assuming enough regularity of so that belong to the dual of . Then, in the estimate of terms as those appearing in (8), it is the -norm of which appears and it turns out that , hence the presence of a factor , which saves the proof.
The goal of this paper is to weaken the hypothesis on to be able to upper-bound the true K-R distance between the distribution of and the distribution of a Brownian motion, that is
[TABLE]
The space is a Banach space we can choose arbitrarily as far as it can be equipped with the structure of an abstract Wiener space and it contains the sample paths of and .
The main technical result of this article is Theorem 4.4 which gives a new estimate of the Lipschitz modulus of for . The main idea is to introduce a hierarchy of approximations. There is a first scale induced by the time discretization coming from the definition of . Then, we consider a coarser discretization onto which we project our approximations in order to benefit from the averaging effect of the ordinary CLT. It turns out that the optimal ratio is obtained when the mesh of the coarser subdivision is roughly the cubic root of the mesh of the reference partition. Moreover, after [3] and [4], we are convinced that it is simpler and as efficient to stick to finite dimension as long as possible. For, we consider the affine interpolation of the Brownian motion as an intermediary process. The distance between the Brownian sample-paths and their affine interpolation is well known. This reduces the problem to estimate the distance between and the affine interpolation of , a task which can be handled by the Stein’s method. It turns out that the bottleneck is in fact the rate of convergence of the Brownian interpolation to the Brownian motion.
This paper is organized as follows. In Section 2, we show how to view fractional Sobolev spaces as Wiener spaces. In Section 3, we explain the line of thoughts we used. The proofs are given in Section 4.
2. Preliminaries
2.1. Fractional Sobolev spaces
As in [5, 11], we consider the fractional Sobolev spaces defined for and as the the closure of functions with respect to the norm
[TABLE]
For , is the completion of for the norm:
[TABLE]
They are known to be Banach spaces and to satisfy the Sobolev embeddings [1, 10]:
[TABLE]
and
[TABLE]
As a consequence, since is separable (see [2]), so does . We need to compute the norm of primitive of step functions.
Lemma 2.1**.**
Let and consider
[TABLE]
There exists such that for any , we have
[TABLE]
Proof.
Remark that for any ,
[TABLE]
The result then follows from the definition of the norm. ∎
We denote by the space of continuous (hence bounded) functions on equipped with the uniform norm.
2.2. Fractional spaces as Wiener spaces
Let
[TABLE]
In what follows, we always choose and in . Consider a sequence of independent, standard Gaussian random variables and let be a complete orthonormal basis of . Then, we know from [13] that
[TABLE]
where is a Brownian motion. We clearly have the diagram
[TABLE]
where is the embedding from into . The space is dense in since polynomials do belong to . Moreover, Eqn. (10) and the Parseval identity entail that for any ,
[TABLE]
We denote by the law of on . Then, the diagram (11) and the identity (12) mean that is a Wiener space.
Definition 2.1** (Wiener integral).**
The Wiener integral, denoted as , is the isometric extension of the map
[TABLE]
This means that if in ,
[TABLE]
Definition 2.2** (Ornstein-Uhlenbeck semi-group).**
For any Lipschitz function on , for any ,
[TABLE]
where .
The dominated convergence theorem entails that is ergodic: For any , with probability ,
[TABLE]
Moreover, the invariance by rotation of Gaussian measures implies that
[TABLE]
Otherwise stated, the Gaussian measure on is the invariant and stationary measure of the semi-group . For details on the Malliavin gradient, we refer to [14, 17].
Definition 2.3**.**
Let be a Banach space. A function is said to be cylindrical if it is of the form
[TABLE]
where for any , belongs to the Schwartz space on , are elements of and belong to . The set of such functions is denoted by .
For ,
[TABLE]
which is equivalent to say
[TABLE]
It is proved in [16, Theorem 4.8] that
Theorem 2.2**.**
For , for any , for any
[TABLE]
where is complete orthonormal basis of .
Note that a non trivial part of this theorem is to prove that the terms are meaningful: that has values in instead of and that is trace-class. Actually, we only need a finite dimensional version of this identity in which all these difficulties do not appear.
3. Donsker’s theorem in
For , let , the regular subdivision of the interval . Let
[TABLE]
and for
[TABLE]
Consider
[TABLE]
where is a family of independent identically distributed, -valued, random variables. We denote by a random variable which has their common distribution. Moreover, we assume that and . Remark that is an orthonormal family in . Let
[TABLE]
For any , the map is the orthogonal projection from onto . Let , for , we write
[TABLE]
where
[TABLE]
where is the affine interpolation of the Brownian motion:
[TABLE]
The two terms and are of the same nature: We have to compare two processes which live on the same probability space. Since is Lipschitz, we can proceed by comparison of their sample-paths. The term is different as the two processes involved live on different probability spaces. This is for this term that the Stein’s method will be used.
We know from [11] that
Theorem 3.1**.**
For any there exists such that
[TABLE]
Moreover, we have
Theorem 3.2**.**
Let Assume that . There exists a constant such that
[TABLE]
This upper-bound is far from being optimal and it is likely that it could be improved to obtain a factor . However, in view of (15), it would bring no improvement to our final result.
Theorem 3.3**.**
Let Let belong to for some . Then, there exists such that for any ,
[TABLE]
The global upper-bound for (14) is proportional to
[TABLE]
See as a function of and note that this expression is minimal for . Plug this into the previous expressions to obtain the main result of this paper:
Theorem 3.4**.**
Assume that . Then, there exists a constant such that
[TABLE]
As an application of the previous considerations, we obtain as a corollary an approximation theorem for the local time of the Brownian motion.
The reflected Brownian motion is defined as
[TABLE]
and the reflected linear interpolation of random walk is
[TABLE]
The process is an expression of the local time of the Brownian motion at [math]. Note that the map is Lipschitz continuous from any into . One of the interest of our new result is that we can then apply the previous theorem in to and . We get
Corollary 3.5**.**
Assume that the hypothesis of Theorem 3.4 hold. There exists a constant such that
[TABLE]
4. Proofs
In what follows, denote a non significant constant which may vary from line to line. We borrow from the current usage in rough path theory the notation
[TABLE]
As a preparation to the proof of Theorem 3.2, we need the following lemma.
Lemma 4.1**.**
For all , there exists a constant such that for any sequence of independent, identically distributed random variables with and any sequence .
[TABLE]
where is the cardinality of the set .
Proof.
The Burkholder-Davis Gundy inequality applied to the discrete martingale yields
[TABLE]
Using Jensen inequality we obtain
[TABLE]
The proof is thus complete. ∎
Proof of Theorem 3.2.
Actually, we already proved in [4] that
[TABLE]
Assume that and belongs to the same sub-interval: There exists such that
[TABLE]
Then we have (see (18))
[TABLE]
Using Lemma 4.1, there exists a constant such that
[TABLE]
Note that and there is at most terms such that is non zero. Thus,
[TABLE]
as tends to infinity. Since ,
[TABLE]
For let and . We have
[TABLE]
Note that for all is the linear interpolation of along the subdivision ; hence, for , . Thus the median term vanishes and we obtain
[TABLE]
From (20), we deduce that
[TABLE]
and the same holds for . We infer from (19), (20) and (22) that
[TABLE]
A straightforward computation shows that
[TABLE]
The result follows (23) and (24). ∎
4.1. Stein method
We wish to estimate
[TABLE]
using the Stein’s method. For the sake of simplicity, we set
[TABLE]
The Stein-Dirichlet representation formula [6] stands that, for any ,
[TABLE]
where
[TABLE]
It is straightforward (see [4, Lemma 4.1]):
Lemma 4.2**.**
For any , there exists a constant such that for any sequence of independent, centered random vectors such that , for any , we have
[TABLE]
We now show, that as usual, the rate of convergence in the Stein’s method is related to the Lipschitz modulus of the second order derivative of the solution of the Stein’s equation. Namely, we have
Lemma 4.3**.**
For any , we have
[TABLE]
Proof of Lemma 4.3.
Let . Since the ’s are independent,
[TABLE]
according to the Taylor formula. Since , we have
[TABLE]
The result follows by difference. ∎
The main difficulty and then the main contribution of this paper is to find an estimate of
[TABLE]
for any
Theorem 4.4**.**
There exists a constant such that for any , for any , for any ,
[TABLE]
Proof of Theorem 4.4.
We know from [16, 4] that we have the following representation: for any ,
[TABLE]
where
[TABLE]
and is an independent copy of . Since the map is linear with respect to its three arguments,
[TABLE]
Hence,
[TABLE]
From Lemma 4.7, we know that
[TABLE]
for , and the same holds for the other conditional expectation. Use Cauchy-Schwarz inequality in (27) and take (28) into account to obtain
[TABLE]
since belongs to . Furthermore,
[TABLE]
We already know that
[TABLE]
and that at most two terms are non zero. Moreover, according to Lemma 2.1
[TABLE]
Thus,
[TABLE]
Plug estimation (30) into estimation (29) yields estimate (25). ∎
According to (25) and Lemma 4.3, since the cardinality of is , we obtain the following theorem.
Theorem 4.5**.**
If belongs to , for any , there exists such that
[TABLE]
If we combine Lemma 4.2 and (31), we get
[TABLE]
Optimizing with respect to yields Theorem 3.3.
It remains to prove (28). For the sake of simplicity, we give the proof for . The general situation is similar but with more involved notations.
We recall that
[TABLE]
where
[TABLE]
Lemma 4.6**.**
The covariance matrix of the Gaussian vector is invertible and satisfies
[TABLE]
Proof.
Since the are orthogonal in , for any
[TABLE]
Since a sub-interval of intersects at most two sub-intervals of , the matrix is tridiagonal. Furthermore, we know that
[TABLE]
and for each , there are at least terms of this kind which are equal to . Hence,
[TABLE]
Since is tridiagonal, this implies that it is invertible. Moreover, let be the diagonal matrix extracted from . We have proved that
For , there is at most one term of the sum (34) which yields a non zero scalar product, hence
[TABLE]
Set . The matrix has at most two non null entries and
[TABLE]
if . By iteration, we get for any ,
[TABLE]
Moreover,
[TABLE]
Thus,
[TABLE]
The proof is thus complete. ∎
Lemma 4.7**.**
There exists a constant which depends only on the dimension such that for all with , for any
[TABLE]
Proof.
Using the framework of Gaussian vectors, for all
[TABLE]
For any , on the one hand
[TABLE]
and on the other hand,
[TABLE]
This means that
[TABLE]
In view of Lemma 4.6, this entails that
[TABLE]
Once again we invoke (35) and the fact that at most two of the terms are non zero for a fixed , to deduce that
[TABLE]
Now then, according to the very definition of the conditional expectation
[TABLE]
Hence,
[TABLE]
according to (37). The constant has to be modified when . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. A. Adams and J; J. F. Fournier. Sobolev Spaces . Academic Press, jun 2003.
- 2[2] H. Brézis. Analyse fonctionnelle . Masson edition, 1987.
- 3[3] L. Coutin and L. Decreusefond. Stein’s method for Brownian approximations. Communications on Stochastic Analysis , 7(3):349–372, September 2013. 00000.
- 4[4] L. Coutin and L. Decreusefond. Convergence rate in the rough donsker theorem. ar Xiv:1707.01269 [math] , July 2017.
- 5[5] L. Decreusefond. Stochastic integration with respect to Volterra processes. Annales de l’Institut Henri Poincare (B) Probability and Statistics , 41(2):123–149, mar 2005.
- 6[6] L. Decreusefond. The Stein-Dirichlet-Malliavin method. ESAIM: Proceedings , page 11, 2015.
- 7[7] L. Decreusefond and H. Halconruy. Malliavin and Dirichlet structures for independent random variables. Stochastic Processes and their Applications , aug 2018.
- 8[8] R. M. Dudley. Real Analysis and Probability , volume 74 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge, 2002.
