Donsker's theorem in {Wasserstein}-1 distance

L. Coutin (IMT); Laurent Decreusefond (INFRES; LTCI; DIG)

arXiv:1904.07045·math.PR·April 29, 2025

Donsker's theorem in {Wasserstein}-1 distance

L. Coutin (IMT), Laurent Decreusefond (INFRES, LTCI, DIG)

PDF

Open Access

TL;DR

This paper establishes bounds on the Wasserstein-1 distance between a random walk and Brownian motion, providing new estimates and applications to convergence rates of local times.

Contribution

It introduces a novel estimate of the Lipschitz modulus of Stein's equation solution to analyze convergence in Wasserstein-1 distance.

Findings

01

Derived explicit bounds for Wasserstein-1 distance between random walk and Brownian motion

02

Provided a rate of convergence for the local time at zero of Brownian motion

03

Developed a new method based on Lipschitz estimates of Stein's equation

Abstract

We compute the Wassertein-1 (or Kolmogorov-Rubinstein) distance between a random walk in $R^{d}$ and the Brownian motion. The proof is based on a new estimate of the Lipschitz modulus of the solution of the Stein's equation. As an application, we can evaluate the rate of convergence towards the local time at 0 of the Brownian motion.

Equations236

dist_{KR} (μ, ν) = F \in Lip_{1} (X) sup (\int_{X} F d μ - \int_{X} F d ν)

dist_{KR} (μ, ν) = F \in Lip_{1} (X) sup (\int_{X} F d μ - \int_{X} F d ν)

Lip_{1} (X) = {F : X \to R, ∣ F (x) - F (y) ∣ \leq dist_{X} (x, y), \forall x, y \in X} .

Lip_{1} (X) = {F : X \to R, ∣ F (x) - F (y) ∣ \leq dist_{X} (x, y), \forall x, y \in X} .

P_{t} f : x \in R^{d} ⟼ \int_{R^{d}} f (e^{- t} x + 1 - e^{- 2 t} y) d μ_{d} (y)

P_{t} f : x \in R^{d} ⟼ \int_{R^{d}} f (e^{- t} x + 1 - e^{- 2 t} y) d μ_{d} (y)

- x h (x) + h^{'} (x) = f (x) - \int_{R} f d μ_{1},

- x h (x) + h^{'} (x) = f (x) - \int_{R} f d μ_{1},

h (x) = \int_{0}^{\infty} P_{t} f (x) d t

h (x) = \int_{0}^{\infty} P_{t} f (x) d t

(P_{t} f)^{(k)} (x) = (\frac{e ^{- t}}{1 - e ^{- 2 t}})^{k} \int_{R^{d}} f (e^{- t} x + 1 - e^{- 2 t} y) H_{k} (y) d μ_{d} (y)

(P_{t} f)^{(k)} (x) = (\frac{e ^{- t}}{1 - e ^{- 2 t}})^{k} \int_{R^{d}} f (e^{- t} x + 1 - e^{- 2 t} y) H_{k} (y) d μ_{d} (y)

(P_{t} f)^{(k)} = e^{- k t} P_{t} (f^{(k)}) .

(P_{t} f)^{(k)} = e^{- k t} P_{t} (f^{(k)}) .

h^{'} (x) = \int_{0}^{\infty} \frac{e ^{- t}}{1 - e ^{- 2 t}} \int_{R^{d}} f (e^{- t} x + 1 - e^{- 2 t} y) y d μ_{1} (y) d t .

h^{'} (x) = \int_{0}^{\infty} \frac{e ^{- t}}{1 - e ^{- 2 t}} \int_{R^{d}} f (e^{- t} x + 1 - e^{- 2 t} y) y d μ_{1} (y) d t .

- x .\nabla h (x) + Δ h (x) = f (x) - \int_{R^{d}} f d μ_{d},

- x .\nabla h (x) + Δ h (x) = f (x) - \int_{R^{d}} f d μ_{d},

(\frac{e ^{- t}}{1 - e ^{- 2 t}})^{k} \in / L^{1} ([0, + \infty); d t) .

(\frac{e ^{- t}}{1 - e ^{- 2 t}})^{k} \in / L^{1} ([0, + \infty); d t) .

Δ h (x) = \int_{0}^{\infty} \frac{e ^{- t}}{1 - e ^{- 2 t}} \int_{R^{d}} \nabla f (e^{- t} x + 1 - e^{- 2 t} y) . y d μ_{d} (y) d t .

Δ h (x) = \int_{0}^{\infty} \frac{e ^{- t}}{1 - e ^{- 2 t}} \int_{R^{d}} \nabla f (e^{- t} x + 1 - e^{- 2 t} y) . y d μ_{d} (y) d t .

E [f (T_{n})] - \int_{R} f d μ_{1} = E [\int_{0}^{\infty} L P_{t} f (T_{n}) d t]

E [f (T_{n})] - \int_{R} f d μ_{1} = E [\int_{0}^{\infty} L P_{t} f (T_{n}) d t]

L f (x) = - x f (x) + f^{'} (x) = L_{1} f (x) + L_{2} f (x),

L f (x) = - x f (x) + f^{'} (x) = L_{1} f (x) + L_{2} f (x),

L_{1} P_{t} f (T_{n}) = - T_{n} (P_{t} f)^{'} (T_{n}) = - \frac{1}{n} j = 1 \sum n X_{j} (P_{t} f)^{'} (T_{n}) .

L_{1} P_{t} f (T_{n}) = - T_{n} (P_{t} f)^{'} (T_{n}) = - \frac{1}{n} j = 1 \sum n X_{j} (P_{t} f)^{'} (T_{n}) .

\mathbf{E}\left[X_{j}(P_{t}f)^{\prime}(T_{n})\right]=\mathbf{E}\left[X_{j}\Bigl{(}(P_{t}f)^{\prime}(T_{n})-(P_{t}f)^{\prime}(T_{n}-X_{j}/\sqrt{n})\Bigr{)}\right]

\mathbf{E}\left[X_{j}(P_{t}f)^{\prime}(T_{n})\right]=\mathbf{E}\left[X_{j}\Bigl{(}(P_{t}f)^{\prime}(T_{n})-(P_{t}f)^{\prime}(T_{n}-X_{j}/\sqrt{n})\Bigr{)}\right]

\mathbf{E}\left[X_{j}\Bigl{(}(P_{t}f)^{\prime}(T_{n})-(P_{t}f)^{\prime}(T_{n}-X_{j}/\sqrt{n})\Bigr{)}\right]\\ =\frac{1}{\sqrt{n}}\int_{0}^{1}\mathbf{E}\left[X_{j}^{2}(P_{t}f)^{\prime\prime}(T_{n}+rX_{j}/\sqrt{n})\Bigr{)}\right]\text{ d}r.

\mathbf{E}\left[X_{j}\Bigl{(}(P_{t}f)^{\prime}(T_{n})-(P_{t}f)^{\prime}(T_{n}-X_{j}/\sqrt{n})\Bigr{)}\right]\\ =\frac{1}{\sqrt{n}}\int_{0}^{1}\mathbf{E}\left[X_{j}^{2}(P_{t}f)^{\prime\prime}(T_{n}+rX_{j}/\sqrt{n})\Bigr{)}\right]\text{ d}r.

\int_{0}^{1} E [X_{j}^{2} (P_{t} f)^{''} (T_{n}^{\neg j})] d r = E [(P_{t} f)^{''} (T_{n}^{\neg j})],

\int_{0}^{1} E [X_{j}^{2} (P_{t} f)^{''} (T_{n}^{\neg j})] d r = E [(P_{t} f)^{''} (T_{n}^{\neg j})],

LP_{t}f(T_{n})\\ =-\frac{1}{n}\sum_{j=1}^{n}\int_{0}^{1}\mathbf{E}\left[X_{j}^{2}\Bigr{(}(P_{t}f)^{\prime\prime}(T_{n}^{\neg j}+rX_{j}/\sqrt{n})-(P_{t}f)^{\prime\prime}(T_{n}^{\neg j})\Bigr{)}\right]\text{ d}r\\ +\frac{1}{n}\sum_{j=1}^{n}\mathbf{E}\left[(P_{t}f)^{\prime\prime}(T_{n}^{\neg j})-(P_{t}f)^{\prime\prime}(T_{n})\right].

LP_{t}f(T_{n})\\ =-\frac{1}{n}\sum_{j=1}^{n}\int_{0}^{1}\mathbf{E}\left[X_{j}^{2}\Bigr{(}(P_{t}f)^{\prime\prime}(T_{n}^{\neg j}+rX_{j}/\sqrt{n})-(P_{t}f)^{\prime\prime}(T_{n}^{\neg j})\Bigr{)}\right]\text{ d}r\\ +\frac{1}{n}\sum_{j=1}^{n}\mathbf{E}\left[(P_{t}f)^{\prime\prime}(T_{n}^{\neg j})-(P_{t}f)^{\prime\prime}(T_{n})\right].

S_{n} (t) = j = 1 \sum n X_{j} h_{j}^{n} (t)

S_{n} (t) = j = 1 \sum n X_{j} h_{j}^{n} (t)

h_{j}^{n} (t) = n \int_{0}^{t} 1_{[j / n, (j + 1) / n)} (s) d s .

h_{j}^{n} (t) = n \int_{0}^{t} 1_{[j / n, (j + 1) / n)} (s) d s .

⟨ h_{j}^{\otimes 2}, \nabla^{(2)} (P_{t} f) (S_{n}^{\neg j} S_{n} + r X_{j} / n) - \nabla^{(2)} (P_{t} f) (S_{n}^{\neg j}) ⟩_{I_{1, 2}^{\otimes 2}}

⟨ h_{j}^{\otimes 2}, \nabla^{(2)} (P_{t} f) (S_{n}^{\neg j} S_{n} + r X_{j} / n) - \nabla^{(2)} (P_{t} f) (S_{n}^{\neg j}) ⟩_{I_{1, 2}^{\otimes 2}}

I_{1, 2} = {f, \exists! \dot{f} \in L^{2} ([0, 1], d t) with f (t) = \int_{0}^{t} \dot{f} (s) d s}

I_{1, 2} = {f, \exists! \dot{f} \in L^{2} ([0, 1], d t) with f (t) = \int_{0}^{t} \dot{f} (s) d s}

∥ f ∥_{I_{1, 2}} = ∥ \dot{f} ∥_{L^{2}} .

∥ f ∥_{I_{1, 2}} = ∥ \dot{f} ∥_{L^{2}} .

f \in Lip_{1} (X) sup E [f (S_{n})] - E [f (B)] .

f \in Lip_{1} (X) sup E [f (S_{n})] - E [f (B)] .

∥ f ∥_{η, p}^{p} = \int_{0}^{1} ∣ f (t) ∣^{p} d t + \iint_{[0, 1]^{2}} \frac{∣ f ( t ) - f ( s ) ∣ ^{p}}{∣ t - s ∣ ^{1 + p η}} d t d s .

∥ f ∥_{η, p}^{p} = \int_{0}^{1} ∣ f (t) ∣^{p} d t + \iint_{[0, 1]^{2}} \frac{∣ f ( t ) - f ( s ) ∣ ^{p}}{∣ t - s ∣ ^{1 + p η}} d t d s .

∥ f ∥_{1, p}^{p} = \int_{0}^{1} ∣ f (t) ∣^{p} d t + \int_{0}^{1} ∣ f^{'} (t) ∣^{p} d t .

∥ f ∥_{1, p}^{p} = \int_{0}^{1} ∣ f (t) ∣^{p} d t + \int_{0}^{1} ∣ f^{'} (t) ∣^{p} d t .

W_{η, p} \subset \mbox H o l (η - 1/ p) for η - 1/ p > 0

W_{η, p} \subset \mbox H o l (η - 1/ p) for η - 1/ p > 0

W_{η, p} \subset W_{γ, q} for 1 \geq η \geq γ and η - 1/ p \geq γ - 1/ q .

W_{η, p} \subset W_{γ, q} for 1 \geq η \geq γ and η - 1/ p \geq γ - 1/ q .

h_{s_{1}, s_{2}} (t) = \int_{0}^{t} 1_{[s_{1}, s_{2}]} (r) d r .

h_{s_{1}, s_{2}} (t) = \int_{0}^{t} 1_{[s_{1}, s_{2}]} (r) d r .

∥ h_{s_{1}, s_{2}} ∥_{η, p} \leq c ∣ s_{2} - s_{1} ∣^{1/2 - η} .

∥ h_{s_{1}, s_{2}} ∥_{η, p} \leq c ∣ s_{2} - s_{1} ∣^{1/2 - η} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeometric Analysis and Curvature Flows · Point processes and geometric inequalities · Random Matrices and Applications

Full text

Donsker’s theorem in Wasserstein-1 distance

L. Coutin

Institute of Mathematics

Université Toulouse 3

Toulouse, France

[email protected]

and

L. Decreusefond

LTCI, Tĺécom Paris, Institut polytechnique de Paris

Paris, France

[email protected]

Abstract.

We compute the Wassertein-1 (or Kolmogorov-Rubinstein) distance between a random walk in $\mathbf{R}^{d}$ and the Brownian motion. The proof is based on a new estimate of the Lipschitz modulus of the solution of the Stein’s equation. As an application, we can evaluate the rate of convergence towards the local time at [math] of the Brownian motion.

Key words and phrases:

Donsker theorem, Malliavin calculus, Stein’s method, Wasserstein distance

1991 Mathematics Subject Classification:

60F15,60H07,60G15,60G55

The first author is partially supported by ANR MESA

1. Motivations

For a complete, separable metric space $X$ , the topology of convergence in distribution is metrizable [8] by considering the so-called Kolmogorov-Rubinstein or Wasserstein-1 distance:

[TABLE]

where

[TABLE]

The formulation (1) is well suited to evaluate distance by the Stein’s method. When $X=\mathbf{R}$ , there is no particular difficulty to evaluate the K-R distance when $\mu$ is the Gaussian distribution. When, $X=\mathbf{R}^{d}$ , it is only recently (see [9, 12, 15] and references therein) that some improvement of the standard Stein’s method has been proposed to get the K-R distance to the Gaussian measure on $\mathbf{R}^{d}$ . The bottleneck is the estimate of the Lipschitz modulus of the second order derivative of the solution of the Stein’s equation when $F$ is only assumed to be Lipschitz continuous. Namely, for $f\,:\,\mathbf{R}^{d}\to\mathbf{R}$ , for any $t>0$ , consider the function

[TABLE]

where $\mu_{d}$ is the standard Gaussian measure on $\mathbf{R}^{d}$ . In dimension $1$ , the Stein’s equation reads as

[TABLE]

so that

[TABLE]

and the subsequent computations require to evaluate only the Lipschitz modulus of $h^{\prime}$ . For $f\in L^{1}(\mu)$ , it is classical to see that $P_{t}f$ is infinitely differentiable and that

[TABLE]

where $H_{k}$ is the $k$ -th Hermite polynomial. On the other hand, if $f$ is $k$ -times differentiable, we have

[TABLE]

According to (3), we get

[TABLE]

It is apparent that the Lipschitz modulus of $h^{\prime}$ simply depends on the Lipschitz modulus of $f$ . However, in higher dimension, the Stein’s equation becomes

[TABLE]

whose solution is formally given by (2). The form of (5) entails that we need to estimate the Lipschitz modulus of $\Delta h$ , which requires to use (3) for $k=2$ . Unfortunately, we have to realize that

[TABLE]

Hence, until the very recent papers [9, 15], the strategy was to assume that $\nabla f$ is Lipschitz, apply once (4) to compute the first derivative of $P_{t}f$ and then apply (3) to this expression:

[TABLE]

This means that instead of computing the supremum in the right-hand-side of (1), over Lipschitz functions, it is computed over functions whose first derivative is Lipschitz. This also defines a distance, which does not change the induced topology but the accuracy of the bound is degraded.

In infinite dimension, a new problem arises which is best explained by going back to the roots of the Stein’s method in dimension $1$ . Consider that we want to estimate the K-R distance in the standard Central Limit Theorem. Let $(X_{n},\,n\geq 1)$ be a sequence of independent, identically distributed random variables with $\mathbf{E}\left[X\right]=0$ and $\mathbf{E}\left[X^{2}\right]=1$ . Let $T_{n}=n^{-1/2}\sum_{j=1}^{n}X_{j}$ . The Stein-Dirichlet representation formula [6] states that

[TABLE]

where

[TABLE]

with obvious notations. Now,

[TABLE]

The trick, which amounts to an integration by parts for a Malliavin structure on independent random variables (see [7]), is to write

[TABLE]

in view of the independence of the random variables. Then, we use the fundamental theorem of calculus in this expression around the point $T_{n}^{\neg j}=T_{n}-X_{j}/\sqrt{n}$ :

[TABLE]

Since,

[TABLE]

we get

[TABLE]

This formula confirms that the crux of the matter is now to estimate uniformly the Lipschitz modulus of $(P_{t}f)^{\prime\prime}$ . It also shows how we get the order of convergence. We have one occurrence of $n^{-1/2}$ in the definition of $T_{n}$ , which appears in the expression of $L_{1}$ . The same factor appears a second time when we proceed to the Taylor expansion and then, it will appear a third time when we plug (3) into (7). This means that we have a factor $n^{-3/2}$ which is summed up $n$ times, hence the rate of convergence which is known to be $n^{-1/2}$ .

Now, if we are interested in the Donsker theorem, the process whose limit we would like to assess is

[TABLE]

where

[TABLE]

For reasons that will be explained below, the analog of the second order derivatives will involve

[TABLE]

where $\nabla$ is the Malliavin derivative, $I_{1,2}$ is the Cameron-Martin space

[TABLE]

and

[TABLE]

Recall that in the context of Malliavin calculus, this space is identified to its dual which means that the dual of $L^{2}$ is not itself. The difficulty is then that we do not have a $n^{-1/2}$ factor in the definition of $S_{n}$ and it is easily seen that $\|h_{j}^{n}\|_{I_{1,2}}=1$ , hence no multiplicative factor will pop up in (8). In [4], we bypassed this difficulty by assuming enough regularity of $f$ so that $\nabla^{(2)}P_{t}f$ belong to the dual of $L^{2}$ . Then, in the estimate of terms as those appearing in (8), it is the $L^{2}$ -norm of $h_{j}^{n}$ which appears and it turns out that $\|h_{j}^{n}\|_{L^{2}}\leq c\,n^{-1/2}$ , hence the presence of a factor $n^{-1}$ , which saves the proof.

The goal of this paper is to weaken the hypothesis on $f$ to be able to upper-bound the true K-R distance between the distribution of $S_{n}$ and the distribution of a Brownian motion, that is

[TABLE]

The space $X$ is a Banach space we can choose arbitrarily as far as it can be equipped with the structure of an abstract Wiener space and it contains the sample paths of $S_{n}$ and $B$ .

The main technical result of this article is Theorem 4.4 which gives a new estimate of the Lipschitz modulus of $\nabla^{(2)}P_{t}f$ for $t>0$ . The main idea is to introduce a hierarchy of approximations. There is a first scale induced by the time discretization coming from the definition of $S_{n}$ . Then, we consider a coarser discretization onto which we project our approximations in order to benefit from the averaging effect of the ordinary CLT. It turns out that the optimal ratio is obtained when the mesh of the coarser subdivision is roughly the cubic root of the mesh of the reference partition. Moreover, after [3] and [4], we are convinced that it is simpler and as efficient to stick to finite dimension as long as possible. For, we consider the affine interpolation of the Brownian motion as an intermediary process. The distance between the Brownian sample-paths and their affine interpolation is well known. This reduces the problem to estimate the distance between $S_{n}$ and the affine interpolation of $B$ , a task which can be handled by the Stein’s method. It turns out that the bottleneck is in fact the rate of convergence of the Brownian interpolation to the Brownian motion.

This paper is organized as follows. In Section 2, we show how to view fractional Sobolev spaces as Wiener spaces. In Section 3, we explain the line of thoughts we used. The proofs are given in Section 4.

2. Preliminaries

2.1. Fractional Sobolev spaces

As in [5, 11], we consider the fractional Sobolev spaces ${W}_{\eta,p}$ defined for $\eta\in(0,1)$ and $p\geq 1$ as the the closure of ${\mathcal{C}}^{1}$ functions with respect to the norm

[TABLE]

For $\eta=1$ , ${W}_{1,p}$ is the completion of $\mathcal{C}^{1}$ for the norm:

[TABLE]

They are known to be Banach spaces and to satisfy the Sobolev embeddings [1, 10]:

[TABLE]

and

[TABLE]

As a consequence, since ${W}_{1,p}$ is separable (see [2]), so does ${W}_{\eta,p}$ . We need to compute the ${W}_{\eta,p}$ norm of primitive of step functions.

Lemma 2.1.

Let $0\leq s_{1}<s_{2}\leq 1$ and consider

[TABLE]

There exists $c>0$ such that for any $s_{1},s_{2}$ , we have

[TABLE]

Proof.

Remark that for any $s,t\in[0,1]$ ,

[TABLE]

The result then follows from the definition of the ${W}_{\eta,p}$ norm. ∎

We denote by $W_{0,\infty}$ the space of continuous (hence bounded) functions on $[0,1]$ equipped with the uniform norm.

2.2. Fractional spaces $W_{\eta,p}$ as Wiener spaces

Let

[TABLE]

In what follows, we always choose $\eta$ and $p$ in $\Lambda$ . Consider $(Z_{n},\,n\geq 1)$ a sequence of independent, standard Gaussian random variables and let $(z_{n},\,n\geq 1)$ be a complete orthonormal basis of $I_{1,2}$ . Then, we know from [13] that

[TABLE]

where $B$ is a Brownian motion. We clearly have the diagram

[TABLE]

where ${\mathfrak{e}}_{\eta,p}$ is the embedding from $I_{1,2}$ into $W_{\eta,p}$ . The space $I_{1,2}$ is dense in $W_{\eta,p}$ since polynomials do belong to $I_{1,2}$ . Moreover, Eqn. (10) and the Parseval identity entail that for any $z\in W^{*}$ ,

[TABLE]

We denote by $\mu_{\eta,p}$ the law of $B$ on $W_{\eta,p}$ . Then, the diagram (11) and the identity (12) mean that $(I_{1,2},W_{\eta,p},\mu_{\eta,p})$ is a Wiener space.

Definition 2.1 (Wiener integral).

The Wiener integral, denoted as $\delta_{\eta,p}$ , is the isometric extension of the map

[TABLE]

This means that if $h=\lim_{n\to\infty}{\mathfrak{e}}_{\eta,p}^{*}(\eta_{n})$ in $I_{1,2}$ ,

[TABLE]

Definition 2.2 (Ornstein-Uhlenbeck semi-group).

For any Lipschitz function on $W_{\eta,p}$ , for any $\tau\geq 0$ ,

[TABLE]

where $\beta_{\tau}=\sqrt{1-e^{-2\tau}}$ .

The dominated convergence theorem entails that $P_{\tau}$ is ergodic: For any $x\in W_{\eta,p}$ , with probability $1$ ,

[TABLE]

Moreover, the invariance by rotation of Gaussian measures implies that

[TABLE]

Otherwise stated, the Gaussian measure on ${W}_{\eta,p}$ is the invariant and stationary measure of the semi-group $P=(P_{\tau},\,\tau\geq 0)$ . For details on the Malliavin gradient, we refer to [14, 17].

Definition 2.3.

Let $X$ be a Banach space. A function $f\,:\,W_{\eta,p}\to X$ is said to be cylindrical if it is of the form

[TABLE]

where for any $j\in\{1,\cdots,k\}$ , $f_{j}$ belongs to the Schwartz space on $\mathbf{R}^{k}$ , $(h_{1},\cdots,h_{k})$ are elements of $I_{1,2}$ and $(x_{1},\cdots,x_{k})$ belong to $X$ . The set of such functions is denoted by $\mathfrak{C}(X)$ .

For $h\in I_{1,2}$ ,

[TABLE]

which is equivalent to say

[TABLE]

It is proved in [16, Theorem 4.8] that

Theorem 2.2.

For $f\in\operatorname{Lip}_{1}(W_{\eta,p})$ , for any $t>0$ , for any $x\in W_{\eta,p}$

[TABLE]

where $(h_{n},\,n\geq 1)$ is complete orthonormal basis of $H$ .

Note that a non trivial part of this theorem is to prove that the terms are meaningful: that $\nabla P_{t}f$ has values in $W_{\eta,p}^{*}$ instead of $I_{1,2}$ and that $\nabla^{(2)}P_{t}f(x)$ is trace-class. Actually, we only need a finite dimensional version of this identity in which all these difficulties do not appear.

3. Donsker’s theorem in $W_{\eta,p}$

For $m\geq 1$ , let $\mathcal{D}^{m}=\{i/m,\,i=0,\cdots,m\}$ , the regular subdivision of the interval $[0,1]$ . Let

[TABLE]

and for $a=(a_{1},\,a_{2})\in{\mathcal{A}}^{m}$

[TABLE]

Consider

[TABLE]

where $(X_{a},\,a\in{\mathcal{A}}^{m})$ is a family of independent identically distributed, $\mathbf{R}^{d}$ -valued, random variables. We denote by $X$ a random variable which has their common distribution. Moreover, we assume that $\mathbf{E}\left[X\right]=0$ and $\mathbf{E}\left[\|X\|_{\mathbf{R}^{d}}^{2}\right]=1$ . Remark that $(h_{a}^{m},a\in{\mathcal{A}}^{m})$ is an orthonormal family in $\mathbf{R}^{d}\otimes I_{1,2}:=I_{1,2}^{d}$ . Let

[TABLE]

For any $m>0$ , the map $\pi^{m}$ is the orthogonal projection from $H:=I_{1,2}^{d}$ onto ${\mathcal{V}}^{m}$ . Let $0<N<m$ , for $f\in\operatorname{Lip}_{1}(W_{\eta,p})$ , we write

[TABLE]

where

[TABLE]

where $B^{m}$ is the affine interpolation of the Brownian motion:

[TABLE]

The two terms $A_{1}$ and $A_{3}$ are of the same nature: We have to compare two processes which live on the same probability space. Since $f$ is Lipschitz, we can proceed by comparison of their sample-paths. The term $A_{2}$ is different as the two processes involved live on different probability spaces. This is for this term that the Stein’s method will be used.

We know from [11] that

Theorem 3.1.

For any $(\eta,p)\in\Lambda,$ there exists $c>0$ such that

[TABLE]

Moreover, we have

Theorem 3.2.

Let $(\eta,p)\in\Lambda.$ Assume that $X\in L^{p}(W;\mathbf{R}^{d},\mu_{\eta,p})$ . There exists a constant $c>0$ such that

[TABLE]

This upper-bound is far from being optimal and it is likely that it could be improved to obtain a factor $N^{1-\eta}$ . However, in view of (15), it would bring no improvement to our final result.

Theorem 3.3.

Let $(\eta,p)\in\Lambda.$ Let $X_{a}$ belong to $L^{p}(W;\mathbf{R}^{d},\mu_{\eta,p})$ for some $p\geq 3$ . Then, there exists $c>0$ such that for any $f\in\operatorname{Lip}_{1}(W_{\eta,p})$ ,

[TABLE]

The global upper-bound for (14) is proportional to

[TABLE]

See $N$ as a function of $m$ and note that this expression is minimal for $N\sim m^{1/3}$ . Plug this into the previous expressions to obtain the main result of this paper:

Theorem 3.4.

Assume that $X\in L^{p}(W;\mathbf{R}^{d},\mu_{\eta,p})$ . Then, there exists a constant $c>0$ such that

[TABLE]

As an application of the previous considerations, we obtain as a corollary an approximation theorem for the local time of the Brownian motion.

The reflected Brownian motion is defined as

[TABLE]

and the reflected linear interpolation of random walk is

[TABLE]

The process $L_{0}(t):=\sup_{0\leq s\leq t}\max\left(0,-B_{s}\right)$ is an expression of the local time of the Brownian motion at [math]. Note that the map $f\mapsto\left(t\mapsto f(t)+\sup_{0\leq s\leq t}\max\left(0,-f(s)\right)\right)$ is Lipschitz continuous from any $W_{\eta,p}$ into $W_{0,\infty}$ . One of the interest of our new result is that we can then apply the previous theorem in $W_{0,\infty}$ to $L_{0}^{m}$ and $L_{0}$ . We get

Corollary 3.5.

Assume that the hypothesis of Theorem 3.4 hold. There exists a constant $c>0$ such that

[TABLE]

4. Proofs

In what follows, $c$ denote a non significant constant which may vary from line to line. We borrow from the current usage in rough path theory the notation

[TABLE]

As a preparation to the proof of Theorem 3.2, we need the following lemma.

Lemma 4.1.

For all $p\geq 2$ , there exists a constant $c_{p}$ such that for any sequence of independent, identically distributed random variables $(X_{i},{i\in{\mathbf{N}}})$ with $X\in L^{p}$ and any sequence $(\alpha_{i},\,i\in{\mathbf{N}})$ .

[TABLE]

where $|A|$ is the cardinality of the set $A$ .

Proof.

The Burkholder-Davis Gundy inequality applied to the discrete martingale $(\sum_{i=1}^{n}\alpha_{i}X_{i},\,n\geq 0)$ yields

[TABLE]

Using Jensen inequality we obtain

[TABLE]

The proof is thus complete. ∎

Proof of Theorem 3.2.

Actually, we already proved in [4] that

[TABLE]

Assume that $s$ and $t$ belongs to the same sub-interval: There exists $l\in\{1,...,N\}$ such that

[TABLE]

Then we have (see (18))

[TABLE]

Using Lemma 4.1, there exists a constant $c$ such that

[TABLE]

Note that $|(h_{k}^{m},h_{l}^{N})_{{I_{1,2}}}|\leq\sqrt{\frac{N}{m}}$ and there is at most $\frac{m}{N}+2$ terms such that $(h_{k}^{m},h_{N}^{l})_{{I_{1,2}}}$ is non zero. Thus,

[TABLE]

as $m/N$ tends to infinity. Since $|t-s|\leq 1/N$ ,

[TABLE]

For $0\leq s\leq t\leq 1$ let $s_{+}^{N}:=\min\{l,\,s\leq\frac{l}{N}\}$ and $t_{-}^{N}:=\sup\{l,\,t\geq\frac{l}{N}\}$ . We have

[TABLE]

Note that for all $f\in W_{\eta,p},$ $\pi^{N}(f)$ is the linear interpolation of $f$ along the subdivision $\mathcal{D}_{N}$ ; hence, for $s,t\in\mathcal{D}_{N}$ , $\pi^{N}(S^{m})_{s,t}=S^{m}_{s,t}$ . Thus the median term vanishes and we obtain

[TABLE]

From (20), we deduce that

[TABLE]

and the same holds for $\mathbf{E}\left[\|\pi^{N}(S^{m})_{t_{-}^{N},t}\|^{p}\right]$ . We infer from (19), (20) and (22) that

[TABLE]

A straightforward computation shows that

[TABLE]

The result follows (23) and (24). ∎

4.1. Stein method

We wish to estimate

[TABLE]

using the Stein’s method. For the sake of simplicity, we set

[TABLE]

The Stein-Dirichlet representation formula [6] stands that, for any ${\tau_{0}}>0$ ,

[TABLE]

where

[TABLE]

It is straightforward (see [4, Lemma 4.1]):

Lemma 4.2.

For any $(\eta,p)\in\Lambda$ , there exists a constant $c>0$ such that for any sequence of independent, centered random vectors $(X_{a},\,a\in{\mathcal{A}}^{m})$ such that $\mathbf{E}\left[\|X\|^{p}\right]<\infty$ , for any $f\in\operatorname{Lip}_{1}(W_{\eta,p})$ , we have

[TABLE]

We now show, that as usual, the rate of convergence in the Stein’s method is related to the Lipschitz modulus of the second order derivative of the solution of the Stein’s equation. Namely, we have

Lemma 4.3.

For any $f\in\operatorname{Lip}_{1}(W_{\eta,p})$ , we have

[TABLE]

Proof of Lemma 4.3.

Let $S^{m}_{\neg a}=S^{m}-X_{a}h^{m}_{a}$ . Since the $X_{a}$ ’s are independent,

[TABLE]

according to the Taylor formula. Since $\mathbf{E}\left[X_{a}^{2}\right]=1$ , we have

[TABLE]

The result follows by difference. ∎

The main difficulty and then the main contribution of this paper is to find an estimate of

[TABLE]

for any $\varepsilon.$

Theorem 4.4.

There exists a constant $c$ such that for any $\tau>0$ , for any $v\in{\mathcal{V}}^{m}$ , for any $f\in\operatorname{Lip}({W}_{\eta,p})$ ,

[TABLE]

Proof of Theorem 4.4.

We know from [16, 4] that we have the following representation: for any $h\in{I_{1,2}}$ ,

[TABLE]

where

[TABLE]

and $\hat{B}$ is an independent copy of $B$ . Since the map $v$ is linear with respect to its three arguments,

[TABLE]

Hence,

[TABLE]

From Lemma 4.7, we know that

[TABLE]

for $m>8\,N$ , and the same holds for the other conditional expectation. Use Cauchy-Schwarz inequality in (27) and take (28) into account to obtain

[TABLE]

since $f_{N}$ belongs to $\operatorname{Lip}_{1}(W_{\eta,p})$ . Furthermore,

[TABLE]

We already know that

[TABLE]

and that at most two terms $\left\langle h_{a}^{m},h_{b}^{N}\right\rangle_{{I_{1,2}}}$ are non zero. Moreover, according to Lemma 2.1

[TABLE]

Thus,

[TABLE]

Plug estimation (30) into estimation (29) yields estimate (25). ∎

According to (25) and Lemma 4.3, since the cardinality of ${\mathcal{A}}^{m}$ is $dm$ , we obtain the following theorem.

Theorem 4.5.

If $X_{a}$ belongs to $L^{p}$ , for any ${\tau_{0}}>0$ , there exists $c>0$ such that

[TABLE]

If we combine Lemma 4.2 and (31), we get

[TABLE]

Optimizing with respect to $\tau_{0}$ yields Theorem 3.3.

It remains to prove (28). For the sake of simplicity, we give the proof for $d=1$ . The general situation is similar but with more involved notations.

We recall that

[TABLE]

where

[TABLE]

Lemma 4.6.

The covariance matrix $\Gamma$ of the Gaussian vector $(G_{b}^{m,N},\,b=0,\cdots,N-1)$ is invertible and satisfies

[TABLE]

Proof.

Since the $h_{a}^{m}$ are orthogonal in $L^{2}$ , for any $b,c\in\{0,\cdots,N-1\},$

[TABLE]

Since a sub-interval of $\mathcal{D}_{m}$ intersects at most two sub-intervals of $\mathcal{D}_{N}$ , the matrix $\Gamma$ is tridiagonal. Furthermore, we know that

[TABLE]

and for each $b$ , there are at least $(N/m-3)$ terms of this kind which are equal to $(N/m)^{-1/2}$ . Hence,

[TABLE]

Since $\Gamma$ is tridiagonal, this implies that it is invertible. Moreover, let $D$ be the diagonal matrix extracted from $\Gamma$ . We have proved that $\|D\|_{\infty}\geq 3/4.$

For $|b-c|=1$ , there is at most one term of the sum (34) which yields a non zero scalar product, hence

[TABLE]

Set $S=\Gamma-D$ . The matrix $D^{-1}S$ has at most two non null entries and

[TABLE]

if $m>8N$ . By iteration, we get for any $k\geq 1$ ,

[TABLE]

Moreover,

[TABLE]

Thus,

[TABLE]

The proof is thus complete. ∎

Lemma 4.7.

There exists a constant $c$ which depends only on the dimension $d$ such that for all $m,N$ with $m>8N$ , for any $a\in{\mathcal{A}}^{N}$

[TABLE]

Proof.

Using the framework of Gaussian vectors, for all $a\in\{0,\cdots,m-1\}$

[TABLE]

For any $c\in\{0,\cdots,N-1\}$ , on the one hand

[TABLE]

and on the other hand,

[TABLE]

This means that

[TABLE]

In view of Lemma 4.6, this entails that

[TABLE]

Once again we invoke (35) and the fact that at most two of the terms $\left\langle h_{a}^{m},\ h_{c}^{N}\right\rangle_{{I_{1,2}}}$ are non zero for a fixed $a$ , to deduce that

[TABLE]

Now then, according to the very definition of the conditional expectation

[TABLE]

Hence,

[TABLE]

according to (37). The constant $8$ has to be modified when $d>1$ . ∎

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. A. Adams and J; J. F. Fournier. Sobolev Spaces . Academic Press, jun 2003.
2[2] H. Brézis. Analyse fonctionnelle . Masson edition, 1987.
3[3] L. Coutin and L. Decreusefond. Stein’s method for Brownian approximations. Communications on Stochastic Analysis , 7(3):349–372, September 2013. 00000.
4[4] L. Coutin and L. Decreusefond. Convergence rate in the rough donsker theorem. ar Xiv:1707.01269 [math] , July 2017.
5[5] L. Decreusefond. Stochastic integration with respect to Volterra processes. Annales de l’Institut Henri Poincare (B) Probability and Statistics , 41(2):123–149, mar 2005.
6[6] L. Decreusefond. The Stein-Dirichlet-Malliavin method. ESAIM: Proceedings , page 11, 2015.
7[7] L. Decreusefond and H. Halconruy. Malliavin and Dirichlet structures for independent random variables. Stochastic Processes and their Applications , aug 2018.
8[8] R. M. Dudley. Real Analysis and Probability , volume 74 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge, 2002.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Donsker’s theorem in Wasserstein-1 distance

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Motivations

2. Preliminaries

2.1. Fractional Sobolev spaces

Lemma 2.1**.**

Proof.

2.2. Fractional spaces Wη,pW_{\eta,p}Wη,p​ as Wiener spaces

Definition 2.1** (Wiener integral).**

Definition 2.2** (Ornstein-Uhlenbeck semi-group).**

Definition 2.3**.**

Theorem 2.2**.**

3. Donsker’s theorem in Wη,pW_{\eta,p}Wη,p​

Theorem 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

Theorem 3.4**.**

Corollary 3.5**.**

4. Proofs

Lemma 4.1**.**

Proof.

Proof of Theorem 3.2.

4.1. Stein method

Lemma 4.2**.**

Lemma 4.3**.**

Proof of Lemma 4.3.

Theorem 4.4**.**

Proof of Theorem 4.4.

Theorem 4.5**.**

Lemma 4.6**.**

Proof.

Lemma 4.7**.**

Proof.

Lemma 2.1.

2.2. Fractional spaces $W_{\eta,p}$ as Wiener spaces

Definition 2.1 (Wiener integral).

Definition 2.2 (Ornstein-Uhlenbeck semi-group).

Definition 2.3.

Theorem 2.2.

3. Donsker’s theorem in $W_{\eta,p}$

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

Theorem 3.4.

Corollary 3.5.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Theorem 4.4.

Theorem 4.5.

Lemma 4.6.

Lemma 4.7.