Stein's method for normal approximation in Wasserstein distances with   application to the multivariate Central Limit Theorem

Thomas Bonis

arXiv:1905.13615·math.PR·May 12, 2020

Stein's method for normal approximation in Wasserstein distances with application to the multivariate Central Limit Theorem

Thomas Bonis

PDF

TL;DR

This paper develops Stein's method to bound Wasserstein distances for normal approximation, providing optimal convergence rates for the multivariate CLT under minimal moment conditions.

Contribution

It introduces a novel approach using stochastic processes to bound Wasserstein distances of any order, extending Stein's method for multivariate normal approximation.

Findings

01

Bounds Wasserstein distance of order 2 using stochastic process

02

Extends bounds to Wasserstein distances of any order p ≥ 1

03

Provides optimal convergence rates for multivariate CLT

Abstract

We use Stein's method to bound the Wasserstein distance of order $2$ between a measure $ν$ and the Gaussian measure using a stochastic process $(X_{t})_{t \geq 0}$ such that $X_{t}$ is drawn from $ν$ for any $t > 0$ . If the stochastic process $(X_{t})_{t \geq 0}$ satisfies an additional exchangeability assumption, we show it can also be used to obtain bounds on Wasserstein distances of any order $p \geq 1$ . Using our results, we provide optimal convergence rates for the multi-dimensional Central Limit Theorem in terms of Wasserstein distances of any order $p \geq 2$ under simple moment assumptions.

Equations457

W_{p} (ν, μ)^{p} = π in f \int_{R^{d} \times R^{d}} ∥ y - x ∥^{p} π (d x, d y),

W_{p} (ν, μ)^{p} = π in f \int_{R^{d} \times R^{d}} ∥ y - x ∥^{p} π (d x, d y),

W_{p} (ν_{n}, γ) \leq \frac{C _{p} E [ ∣ X _{1} ∣ ^{p + 2} ] ^{1/ p}}{n} .

W_{p} (ν_{n}, γ) \leq \frac{C _{p} E [ ∣ X _{1} ∣ ^{p + 2} ] ^{1/ p}}{n} .

W_{2} (ν_{n}, γ) \leq C β \frac{d lo g n}{n} .

W_{2} (ν_{n}, γ) \leq C β \frac{d lo g n}{n} .

W_{2} (ν_{n}, γ) \leq C β \frac{d}{n}

W_{2} (ν_{n}, γ) \leq C β \frac{d}{n}

W_{2} (ν_{n}, γ) \leq C_{2} \frac{C d}{n} .

W_{2} (ν_{n}, γ) \leq C_{2} \frac{C d}{n} .

W_{2} (ν_{n}, γ) \leq \frac{( C - 1 ) d}{n}

W_{2} (ν_{n}, γ) \leq \frac{( C - 1 ) d}{n}

\int_{R^{d}} - x \cdot \nabla ϕ (x) + ⟨ τ_{ν} (x), \nabla^{2} ϕ (x) ⟩_{H S} d ν (x) = 0,

\int_{R^{d}} - x \cdot \nabla ϕ (x) + ⟨ τ_{ν} (x), \nabla^{2} ϕ (x) ⟩_{H S} d ν (x) = 0,

W_{2} (ν, γ)^{2} \leq \int_{R^{d}} ∥ τ_{ν} - I_{d} ∥_{H S}^{2} d ν,

W_{2} (ν, γ)^{2} \leq \int_{R^{d}} ∥ τ_{ν} - I_{d} ∥_{H S}^{2} d ν,

\int_{R^{d}} - x ϕ (x) + τ_{ν} (x) \nabla ϕ (x) d ν (x) = 0,

\int_{R^{d}} - x ϕ (x) + τ_{ν} (x) \nabla ϕ (x) d ν (x) = 0,

W_{p} (ν, γ)^{p} \leq C_{p} \int_{R^{d}} ∥ τ_{ν} - I_{d} ∥_{p}^{p} d ν,

W_{p} (ν, γ)^{p} \leq C_{p} \int_{R^{d}} ∥ τ_{ν} - I_{d} ∥_{p}^{p} d ν,

\forall ϕ \in C_{c}^{\infty}, \int_{R^{d}} L_{ν} ϕ d ν = 0,

\forall ϕ \in C_{c}^{\infty}, \int_{R^{d}} L_{ν} ϕ d ν = 0,

\forall ϕ \in C_{c}^{\infty}, x \in R^{d}, L_{γ} ϕ (x) = - x \cdot \nabla ϕ (x) + ⟨ I_{d}, \nabla^{2} ϕ (x) ⟩_{H S} .

\forall ϕ \in C_{c}^{\infty}, x \in R^{d}, L_{γ} ϕ (x) = - x \cdot \nabla ϕ (x) + ⟨ I_{d}, \nabla^{2} ϕ (x) ⟩_{H S} .

\forall ϕ \in C_{c}^{\infty}, x \in R^{d}, L_{ν} ϕ (x) = \frac{1}{s} E [(X^{'} - X) (ϕ (X^{'}) + ϕ (X)) ∣ X = x],

\forall ϕ \in C_{c}^{\infty}, x \in R^{d}, L_{ν} ϕ (x) = \frac{1}{s} E [(X^{'} - X) (ϕ (X^{'}) + ϕ (X)) ∣ X = x],

\forall ϕ \in C_{c}^{\infty}, x \in R^{d}, L_{ν} ϕ (x) = \frac{1}{s} E [\int_{0}^{X^{'}} ϕ (y) d y - \int_{0}^{X} ϕ (y) d y ∣ X = x] .

\forall ϕ \in C_{c}^{\infty}, x \in R^{d}, L_{ν} ϕ (x) = \frac{1}{s} E [\int_{0}^{X^{'}} ϕ (y) d y - \int_{0}^{X} ϕ (y) d y ∣ X = x] .

\forall ϕ \in C_{c}^{\infty}, \forall x \in R^{d}, L_{ν} ϕ (x) = \frac{1}{s} E [ϕ (X^{'}) - ϕ (X) ∣ X = x] .

\forall ϕ \in C_{c}^{\infty}, \forall x \in R^{d}, L_{ν} ϕ (x) = \frac{1}{s} E [ϕ (X^{'}) - ϕ (X) ∣ X = x] .

s L_{ν} ϕ (x) = E [(X^{'} - X) \cdot \nabla ϕ (X) + \frac{1}{2} ⟨(X^{'} - X) (X^{'} - X)^{T}, \nabla^{2} ϕ (X) ⟩_{H S} ∣ X = x] + O (E [∥ X^{'} - X ∥^{3} ∣ X = x]) .

s L_{ν} ϕ (x) = E [(X^{'} - X) \cdot \nabla ϕ (X) + \frac{1}{2} ⟨(X^{'} - X) (X^{'} - X)^{T}, \nabla^{2} ϕ (X) ⟩_{H S} ∣ X = x] + O (E [∥ X^{'} - X ∥^{3} ∣ X = x]) .

\forall t > 0, ϕ \in C_{c}^{\infty}, x \in R^{d}, (L_{ν})_{t} ϕ (x) = \frac{1}{s} E [ϕ (X_{t}) - ϕ (X_{0}) ∣ X_{0} = x] .

\forall t > 0, ϕ \in C_{c}^{\infty}, x \in R^{d}, (L_{ν})_{t} ϕ (x) = \frac{1}{s} E [ϕ (X_{t}) - ϕ (X_{0}) ∣ X_{0} = x] .

\forall t \geq 0, (S_{n})_{t} = S_{n} + \frac{( X _{I}^{'} - X _{I} ) 1 _{∥ X_{I} ∥ \lor ∥ X_{I}^{'} ∥ \leq n (e^{2 t} - 1)}}{n}

\forall t \geq 0, (S_{n})_{t} = S_{n} + \frac{( X _{I}^{'} - X _{I} ) 1 _{∥ X_{I} ∥ \lor ∥ X_{I}^{'} ∥ \leq n (e^{2 t} - 1)}}{n}

W_{2} (ν_{n}, γ) \leq \frac{C d ^{1/4} ∥ E [ X _{1} X _{1}^{T} ∥ X _{1} ∥ ^{2} ] ∥ _{H S}^{1/2}}{n} .

W_{2} (ν_{n}, γ) \leq \frac{C d ^{1/4} ∥ E [ X _{1} X _{1}^{T} ∥ X _{1} ∥ ^{2} ] ∥ _{H S}^{1/2}}{n} .

W_{p} (ν_{n}, γ) \leq C_{p} \frac{d ^{1/4} ∥ E [ X _{1} X _{1}^{T} ∥ X _{1} ∥ ^{2} ] ∥ _{H S}^{1/2} + E [ ∥ X _{1} ∥ ^{p + 2} ] ^{1/ p}}{n} .

W_{p} (ν_{n}, γ) \leq C_{p} \frac{d ^{1/4} ∥ E [ X _{1} X _{1}^{T} ∥ X _{1} ∥ ^{2} ] ∥ _{H S}^{1/2} + E [ ∥ X _{1} ∥ ^{p + 2} ] ^{1/ p}}{n} .

α = (α_{1}, α_{2}, \dots, α_{d}) .

α = (α_{1}, α_{2}, \dots, α_{d}) .

∣ α ∣ : = i = 1 \sum d α_{i}

∣ α ∣ : = i = 1 \sum d α_{i}

α! : = i = 1 \prod d α_{i}! .

α! : = i = 1 \prod d α_{i}! .

x^{α} : = i = 1 \prod d x_{i}^{α_{i}} .

x^{α} : = i = 1 \prod d x_{i}^{α_{i}} .

\forall∣ α ∣ = k, (x^{\otimes k})_{α} : = x^{α} .

\forall∣ α ∣ = k, (x^{\otimes k})_{α} : = x^{α} .

⟨ x, y ⟩ : = ∣ α ∣ = k \sum \frac{k !}{α !} x_{α} y_{α},

⟨ x, y ⟩ : = ∣ α ∣ = k \sum \frac{k !}{α !} x_{α} y_{α},

∥ x ∥^{2} : = ∣ α ∣ = k \sum \frac{k !}{α !} x_{α}^{2} .

∥ x ∥^{2} : = ∣ α ∣ = k \sum \frac{k !}{α !} x_{α}^{2} .

∥ x^{\otimes k} ∥ = ∥ x ∥^{k} .

∥ x^{\otimes k} ∥ = ∥ x ∥^{k} .

\partial^{α} ϕ = \frac{\partial ^{α_{1}}}{\partial x _{1}^{α_{1}}} \frac{\partial ^{α_{2}}}{\partial x _{2}^{α_{2}}} \dots \frac{\partial ^{α_{d}}}{\partial x _{d}^{α_{d}}} ϕ .

\partial^{α} ϕ = \frac{\partial ^{α_{1}}}{\partial x _{1}^{α_{1}}} \frac{\partial ^{α_{2}}}{\partial x _{2}^{α_{2}}} \dots \frac{\partial ^{α_{d}}}{\partial x _{d}^{α_{d}}} ϕ .

\forall∣ α ∣ = k, (\nabla^{k} ϕ (x))_{α} : = \partial^{α} ϕ .

\forall∣ α ∣ = k, (\nabla^{k} ϕ (x))_{α} : = \partial^{α} ϕ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Stein’s method for normal approximation in Wasserstein distances with application to the multivariate Central Limit Theorem

Thomas Bonis

DataShape team, Inria Saclay, Université Paris-Saclay, Paris, France

[email protected]

Abstract

We use Stein’s method to bound the Wasserstein distance of order $2$ between a measure $\nu$ and the Gaussian measure using a stochastic process $(X_{t})_{t\geq 0}$ such that $X_{t}$ is drawn from $\nu$ for any $t>0$ . If the stochastic process $(X_{t})_{t\geq 0}$ satisfies an additional exchangeability assumption, we show it can also be used to obtain bounds on Wasserstein distances of any order $p\geq 1$ . Using our results, we provide optimal convergence rates for the multi-dimensional Central Limit Theorem in terms of Wasserstein distances of any order $p\geq 2$ under simple moment assumptions.

1 Introduction

Consider $n$ independent and, for simplicity, identically distributed random variables $X_{1},\dots,X_{n}$ taking values in $\mathbb{R}^{d}$ such that $\mathbb{E}[X_{1}]=0$ and $\mathbb{E}[X_{1}X_{1}^{T}]=I_{d}$ . By the Central Limit Theorem, it is well-known that, as $n$ grows to infinity, the law $\nu_{n}$ of $S_{n}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}X_{i}$ converges to the $d$ -dimensional Gaussian measure $\gamma$ . In order to strengthen this result, one can quantify this convergence for a given distance on the space of measures on $\mathbb{R}^{d}$ . Let us consider the family of Wasserstein distances of order $p\geq 1$ , defined between any two measures $\mu$ and $\nu$ with finite moment of order $p$ by

[TABLE]

where $\|\cdot\|$ denotes the Euclidean norm and $\pi$ is a measure on $\mathbb{R}^{d}\times\mathbb{R}^{d}$ with marginals $\mu$ and $\nu$ . In the univariate setting, rates of convergence for these distances have been obtained in [12] for $p\in[1,2]$ and in [2] for $p>2$ . More precisely, for any $p\geq 1$ , there exists a constant $C_{p}>0$ such that

[TABLE]

Furthermore, Theorem 5.1 [12] guarantees this bound to be tight in the general case. In the multivariate setting, convergence rates for the Wasserstein distance of order $2$ have been obtained under the assumption that $\|X_{1}\|\leq\beta$ with $\beta>0$ , see [17] and [4], in which case there exists $C>0$ such that

[TABLE]

As this result is $\sqrt{\log n}$ short of optimality in the one-dimensional case, it is conjectured in [17] that

[TABLE]

and such a bound is known to be matched thanks to Proposition 2 [17]. Let us note that, since $\beta$ is greater than $\sqrt{d}$ , this bound scales at least linearly with respect to the dimension which is probably suboptimal in many cases. Indeed, whenever the coordinates of the $X_{i}$ are i.i.d. random variables with fourth moment equal to $C>0$ , one can use (1) to obtain the following bound, scaling with $\sqrt{d}$ ,

[TABLE]

This optimal scaling with respect to the dimension as well as the optimal dependency in $n$ can be obtained whenever the measure of the $X_{i}$ satisfies a Poincaré inequality with constant $C\geq 1$ in which case Theorem 4.1 [3] guarantees that

[TABLE]

and similar bounds have also been obtained for Wasserstein distances of any order $p\geq 1$ in [5]. However, for a measure to satisfy a Poincaré inequality is a strong assumption compared to the simple moment assumption required in the univariate case.

Inequality (3) is derived through an approach introduced in [8] relying on a object called Stein kernel. Given a probability measure $\nu$ supported on $\mathbb{R}^{d}$ , a Stein kernel for $\nu$ is a matrix-valued function $\tau_{\nu}$ such that, for any smooth function $\phi$ with compact support,

[TABLE]

where $\langle\cdot,\cdot\rangle_{HS}$ is the Hilbert-Schmidt scalar product and $\nabla^{2}\phi$ denotes the Hessian matrix of $\phi$ . Since $\nu$ is equal to the Gaussian measure $\gamma$ if and only if $\tau_{\nu}=I_{d}$ , one can expect $\nu$ to be close to $\gamma$ whenever $\tau_{\nu}$ is close to $I_{d}$ . This intuition is formalized by the following bound, obtained in Proposition 3.1 [8],

[TABLE]

where $\|\cdot\|_{HS}$ is the Hilbert-Schmidt norm. Furthermore, if $\tau_{\nu}$ also verifies

[TABLE]

for any suitable function $\phi$ , then, by Proposition 3.4 [8], one also has

[TABLE]

where $\|\cdot\|_{p}$ is the Schatten $p$ -norm and $C_{p}>0$ is a constant depending only on $p$ . However, as Stein kernels do not necessarily exist for general measures and can be difficult to compute whenever they do exist, they are not an adequate tool to generalize (1).

In this work, we wish to apply the approach developed in [8] by replacing Stein kernels with more practical operators $\mathcal{L}_{\nu}$ satisfying the following property

[TABLE]

where $\mathcal{C}^{\infty}_{c}$ denotes the space of smooth functions with compact support. When an operator $\mathcal{L}_{\nu}$ verifies this property, in which case we say $\nu$ is invariant under $\mathcal{L}_{\nu}$ , one can expect $\nu$ to be close to $\gamma$ as soon as $\mathcal{L}_{\nu}$ is similar to the operator $\mathcal{L}_{\gamma}$ defined by

[TABLE]

There are many ways to obtain operators $\mathcal{L}_{\nu}$ under which $\nu$ is invariant; in fact, such operators have been extensively used in Stein’s method. For instance, the original approach of Stein [14] and its extension to the multidimensional setting [11] use pairs of random variables $(X,X^{\prime})$ both drawn from $\nu$ and such that $(X,X^{\prime})$ and $(X^{\prime},X)$ follow the same law. Given such a pair of random variables $(X,X^{\prime})$ , which is called an exchangeable pair, $\nu$ is invariant under the operator $\mathcal{L}_{\nu}$ defined by

[TABLE]

where $s>0$ is a rescaling factor. This operator $\mathcal{L}_{\nu}$ can then be compared to $\mathcal{L}_{\gamma}$ using a Taylor expansion. In fact, one does not even need an exchangeable pair to apply Stein’s method in dimension one. Indeed, as shown by [13], one can use two random variables $X,X^{\prime}$ , both drawn from $\nu$ but not necessarily forming an exchangeable pair, to construct operators of the form

[TABLE]

Similarly, many other constructs used to apply Stein’s method such as zero-bias coupling [6] and size-bias coupling [7] correspond to operators under which $\nu$ is invariant.

Among these various operators, those defined in (5) are perhaps the easiest to obtain as they can be constructed from any two random variables $X,X^{\prime}$ both drawn from the measure $\nu$ . However, since there is no notion of primitive functions in higher dimension, such operators are restricted to the univariate setting. Still, in the multidimensional setting, one can use any two random variables $X$ and $X^{\prime}$ drawn from $\nu$ to define an operator under which $\nu$ is invariant by taking

[TABLE]

Then, given any $\phi\in\mathcal{C}^{\infty}_{c}$ , one can use a Taylor expansion to obtain

[TABLE]

Thus, one can expect that if

•

$\frac{\mathbb{E}[X^{\prime}-X\mid X]}{s}\approx-X$ ;

•

$\frac{\mathbb{E}[(X^{\prime}-X)(X^{\prime}-X)^{T}\mid X]}{2s}\approx I_{d}$ and

•

$\frac{\|X^{\prime}-X\|^{3}}{s}\approx 0$

then $\mathcal{L}_{\nu}$ would be similar to $\mathcal{L}_{\gamma}$ and thus $\nu$ be close to $\gamma$ . However, one cannot prove such a result by applying the approach of [8] to such operators. Instead, we use stochastic processes $(X_{t})_{t\geq 0}$ such that $X_{t}$ is drawn from $\nu$ for any $t\geq 0$ and such that $\mathbb{E}[\|X_{t}-X_{0}\|]$ does not grow too fast with respect to $t$ to define a family of operators under which $\nu$ is invariant by taking

[TABLE]

In Theorem 2, we derive bounds for the Wasserstein distance of order $2$ between $\nu$ and the Gaussian measure from such a family of operators. We also provide bounds on Wasserstein distances of any order $p\geq 1$ for one-dimensional normal approximation in Theorem 7 and for multidimensional normal approximation in Theorem 9. This latter result uses a family of operators of the form (4) and thus requires the pairs $(X_{t},X_{0})$ and $(X_{0},X_{t})$ to follow the same law for any $t>0$ . Let us note that, while we mostly focus on operators defined in (7), proofs of our results can easily be adapted to other operators $\mathcal{L}_{\nu}$ under which $\nu$ is invariant such as size-bias or zero-bias couplings.

Our results can be readily applied to obtain rates in the Central Limit Theorem. Indeed, letting $X^{\prime}_{1},\dots,X^{\prime}_{n}$ be independent copies of $X_{1},\dots,X_{n}$ and $I$ be a uniform random variable on $\{1,\dots,n\}$ , the stochastic process $((S_{n})_{t})_{t\geq 0}$ defined by

[TABLE]

is such that $((S_{n})_{t},(S_{n})_{0})$ and $((S_{n})_{0},(S_{n})_{t})$ follow the same law for any $t\geq 0$ . Applying our results to this stochastic process, we obtain the following bounds.

Theorem 1.

Under the above setting, if $\mathbb{E}[\|X_{1}\|^{4}]<\infty$ , then there exists $C<14$ such that

[TABLE]

Furthermore, if $\mathbb{E}[\|X_{1}\|^{p+2}]<\infty$ for $p\geq 2$ , then there exists $C_{p}>0$ depending only on $p$ and such that

[TABLE]

This result both proves (2) and generalizes (1). However, our bound still scales at least linearly with respect to the dimension $d$ and thus fails to generalize (3) which can scale with $\sqrt{d}$ . Our approach can also be used to obtain more general results, presented in Theorems 11 and 12, which only require the random variables $X_{1},\dots,X_{n}$ to be independent and provide intermediary rates of convergence under weaker moment assumptions.

The paper is organized as follows. In Section 2, we introduce the notations used in the paper. In Section 3, we present the main arguments we use to apply Stein’s method and obtain bounds on the Wasserstein distance of order $2$ in normal approximation. The approach followed to obtain bounds on Wasserstein distances of any order $p$ is then detailed in Section 4. The computations required to apply our general Wasserstein bounds to obtain rates of convergence in the Central Limit Theorem are presented in Sections 5 and 6. Finally, Sections 7 and 8 contain technical results and approximation arguments used in the course of this paper.

2 Notations and definitions

Let $d$ be a positive integer. A $d$ -dimensional multi-index $\alpha$ is a $d$ -tuple of non-negative integers

[TABLE]

The absolute value of a multi-index $\alpha$ is given by

[TABLE]

and its factorial by

[TABLE]

For any $x\in\mathbb{R}^{d}$ and any multi-index $\alpha$ , let

[TABLE]

For any $k\in\mathbb{N},x\in\mathbb{R}^{d}$ , we denote by $x^{\otimes k}$ the family indexed by multi-indices with absolute value $k$ and such that

[TABLE]

In this work, we identify any symmetric matrix $M$ to the family indexed by multi-indices with absolute value $2$ by taking $M_{\alpha}\coloneqq M_{i,i}$ when $\alpha(i)=2$ and $M_{\alpha}\coloneqq M_{i,j}$ when $\alpha(i)=\alpha(j)=1$ . Let $\langle\cdot,\cdot\rangle$ be the Hilbert Schmidt scalar product defined between any two families $(x_{\alpha})_{|\alpha|=k},(y_{\alpha})_{|\alpha|=k}$ by

[TABLE]

and, by extension,

[TABLE]

Let us remark that, for any $k\in\mathbb{N},x\in\mathbb{R}^{d}$ , we have

[TABLE]

Let $\mathcal{C}^{k}$ be the set of functions from $\mathbb{R}^{d}$ to $\mathbb{R}$ with partial derivatives of order $k\in\mathbb{N}$ and by $\mathcal{C}^{k}_{c}$ the set of such functions with compact support. For any multi-index $\alpha$ and any $\phi\in\mathcal{C}^{|\alpha|}$ , let

[TABLE]

Let $\nabla^{k}\phi(x)\in(\mathbb{R}^{d})^{\otimes k}$ be the $k$ -th gradient of $\phi$ at $x$ defined by

[TABLE]

Let $\gamma$ denoted the $d$ -dimensional Gaussian measure and let $\mathcal{L}_{\gamma}$ be the operator defined by

[TABLE]

This operator is the infinitesimal generator of the Ornstein-Uhlenbeck semigroup $(P_{t})_{t\geq 0}$ whose reversible measure is $\gamma$ ; see e.g. [1] for a thorough presentation of this semigroup and its properties.

3 Bounds for the Wasserstein distance of order $2$

In this Section, we prove the following result.

Theorem 2.

Let $\nu$ be a probability measure on $\mathbb{R}^{d}$ with finite second moment and let $(X_{t})_{t\geq 0}$ be a stochastic process such that $X_{t}$ is drawn from $\nu$ for any $t>0$ . Suppose that

[TABLE]

Then, for any $s>0$ ,

[TABLE]

where

[TABLE]

Let $\nu$ be a measure on $\mathbb{R}^{d}$ and let $(X_{t})_{t\geq 0}$ be a stochastic process such that $X_{t}$ is drawn from $\nu$ for any $t\geq 0$ . Let us assume the measure $\nu$ admits a density $h$ with respect to $\gamma$ such that $h=\epsilon+f$ for some constant $\epsilon>0$ and $f\in\mathcal{C}^{\infty}_{c}$ and suppose the stochastic process $\|X_{t}-X_{0}\|$ is bounded for any $t>0$ . Let us note that, while such assumptions imply a Stein kernel exists, approximation arguments developed in Section 8 allow us to lift them in favor of the weaker (10).

For $t>0$ , let $\nu_{t}$ be the measure with density $P_{t}h$ . Since $\gamma$ is the reversible measure of $P_{t}$ , $\nu_{t}$ converges to $\gamma$ when $t$ grows to infinity. One can thus bound $W_{2}(\nu,\gamma)$ by controlling $W_{2}(\nu,\nu_{t})$ for any $t>0$ and letting $t$ grow. To this end, we use the following inequality, obtained in Lemma 2 [9],

[TABLE]

which yields

[TABLE]

The quantity $I(\nu_{t})$ is the Fisher information of the measure $\nu_{t}$ with respect to $\gamma$ . In Proposition 2.4 [8], this quantity is bounded using Stein kernels. In this work, we bound $I(\nu_{t})$ using the stochastic process $(X_{t})_{t\geq 0}$ .

Proposition 3.

Under the above setting, we have

[TABLE]

where $S(t)$ is defined in Theorem 2.

As injecting this bound in (11) and using the approximation arguments of Section 8 concludes the proof of Theorem 2, the remainder of this Section is dedicated to the proof of this Proposition.

Let $t>0$ and let $v_{t}\coloneqq\log P_{t}h$ . By Equation (2.12) [8], we have

[TABLE]

Hence, if an operator $\mathcal{L}_{\nu}$ verifies

[TABLE]

then

[TABLE]

Now, let $s>0$ and let $\mathcal{L}_{\nu}$ be the operator such that, for any $\phi\in L^{1}(\nu)$ and any $x\in\mathbb{R}^{d}$ ,

[TABLE]

Since $X_{t}$ and $X_{0}$ are drawn from the same law, integrating this operator with respect to $\nu$ gives

[TABLE]

Let us rewrite $\mathcal{L}_{\nu}$ using a Taylor expansion.

Lemma 4.

Let $\phi$ be a bounded and measurable function and let $t>0$ and $\alpha$ be a multi-index. Under the above setting, we have that

[TABLE]

exists and that

[TABLE]

We delay the proof of this result to Section 7.1. Let $k>0$ be an integer, after rearranging terms, we have

[TABLE]

Thus,

[TABLE]

Then, by (12),

[TABLE]

Let $\phi$ be a bounded and measurable function. By Equation (2.7.3) [1],

[TABLE]

In particular if $\phi$ is a function such that $\|\nabla\phi\|$ is bounded, we have $\nabla P_{t}\phi=e^{-t}P_{t}\nabla\phi$ . For any multi-index $\alpha$ , let $H_{\alpha}$ be the multivariate Hermite polynomial of index $\alpha$ , defined for any $x\in\mathbb{R}^{d}$ by

[TABLE]

Let $\phi\in\mathcal{C}^{\infty}$ be a bounded function. For any multi-index $\alpha$ , starting with (15) and integrating $|\alpha|$ times with respect to the Gaussian measure, we obtain

[TABLE]

Since Hermite polynomials form an orthogonal basis of $L^{2}(\gamma)$ with norms

[TABLE]

applying (16) to the vector field $\nabla v_{t}$ yields, for any $x\in\mathbb{R}^{d}$ and any multi-index $\alpha$ ,

[TABLE]

Therefore,

[TABLE]

Now, let

[TABLE]

Applying Cauchy-Schwarz inequality on (14) and using (17), we obtain

[TABLE]

Then, since $v_{t}=\log(P_{t}h)$ ,

[TABLE]

Finally, since $I(\nu_{t})$ is finite,

[TABLE]

and rearranging terms in $S(t)$ using (13) concludes the proof of Proposition 3.

4 Gaussian measure and Wasserstein distances of any order

Let $p\geq 1$ and let $\nu$ be a measure on $\mathbb{R}^{d}$ . Let us assume the measure $\nu$ admits a density $h$ with respect to $\gamma$ such that $h=\epsilon+f$ with $\epsilon>0$ and $f\in\mathcal{C}^{\infty}_{c}$ .

In order to bound the $W_{p}$ distance between $\nu$ and the $d$ -dimensional Gaussian measure $\gamma$ , it is possible to use Stein kernels to obtain a version of the score function $\nabla v_{t}\coloneqq\nabla\log P_{t}h$ [8]. Indeed, by Section 3 [16], this score function can be used to bound the Wasserstein distances between $\nu$ and $\gamma$ as

[TABLE]

leading to

[TABLE]

Let us provide a version of $v_{t}$ . Let $Z$ be a Gaussian random variable, $X_{0}$ be a random variable drawn form $\nu$ and let $F_{t}\coloneqq e^{-t}X_{0}+\sqrt{1-e^{-2t}}Z$ .

Lemma 5.

Let $t>0$ . Then, under the above notations,

[TABLE]

is a version of $\nabla v_{t}(F_{t})$ .

Proof.

Let $t>0$ . Integrating by parts with respect to $\gamma$ , we have, for any $\phi\in\mathcal{C}^{\infty}_{c}$ ,

[TABLE]

Thus,

[TABLE]

In fact, this property completely characterizes $\nabla v_{t}$ : if another vector field $\xi:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d}$ satisfies

[TABLE]

then

[TABLE]

implying that $\xi=\nabla v_{t}$ almost everywhere with respect to the measure $\nu_{t}$ .

Now, let $\phi\in\mathcal{C}^{\infty}_{c}$ . Integrating by parts with respect to the Gaussian measure, we have

[TABLE]

implying that $\rho_{t}$ it is a version of $\nabla v_{t}$ . ∎

Bounding $W_{p}(\nu,\gamma)$ can thus be achieved by estimating $\mathbb{E}[\|\rho_{t}\|^{p}]$ , where $\rho_{t}$ is defined in Lemma 5. To this end, suppose there exists a quantity $\tau_{t}$ such that $\mathbb{E}[\tau_{t}\mid F_{t}]=0$ almost surely. Then,

[TABLE]

and, by Jensen’s inequality,

[TABLE]

Therefore, if such a quantity $\tau_{t}$ is close to $e^{-t}X_{0}-\frac{e^{-2t}}{\sqrt{1-e^{-2t}}}Z$ then $\mathbb{E}[\|\rho_{t}\|^{p}]$ is small and, by (18), so is $W_{p}(\nu,\gamma)$ . Before showing how to compute such quantities in the following Sections, let us state the following result, proved in Section 7.2.

Lemma 6.

Let $Z$ be a normal random variable and let $(M_{\alpha})_{\alpha\in\mathbb{N}^{d}}\in\mathbb{R}^{d}$ . Then,

[TABLE]

4.1 One-dimensional case

In this Section, we bound the $W_{p}$ distance between $\nu$ and $\gamma$ in the case $d=1$ and obtain the following result.

Theorem 7.

Let $p\geq 1$ and let $\nu$ be a probability measure on $\mathbb{R}$ with finite moment of order $p$ . Let $(X_{t})_{t\geq 0}$ be a stochastic process such that $X_{t}$ is drawn from $\nu$ for any $t>0$ . Suppose that

[TABLE]

Then, for any $s>0$ ,

[TABLE]

where

[TABLE]

Let $s>0$ and let $(X_{t})_{t\geq 0}$ be a stochastic process such that for any $t\geq 0$ , $X_{t}$ is drawn from $\nu$ and $\|X_{t}-X_{0}\|$ is bounded. Again, thanks to approximation arguments developed in Section 8, this assumption as well as the assumptions made on the smoothness of the measure $\nu$ can be lifted in favor of the more general (21). For now, let us start by using $(X_{t})_{t\geq 0}$ to obtain a quantity $\tau_{t}$ such that $\mathbb{E}[\tau_{t}\mid F_{t}]=0$ .

Lemma 8.

Let $s,t>0$ . Letting

[TABLE]

where $H_{k}$ is the one-dimensional $k$ -th Hermite polynomial, we have

[TABLE]

Proof.

Let $\phi\in\mathcal{C}^{\infty}_{c}$ . For any $k\in\mathbb{N}$ , we denote by $\phi^{(k)}$ the $k$ -th derivative of $\phi$ . Let $k\in\mathbb{N}$ . Since $X_{0}$ and $Z$ are independent, applying (16) yields

[TABLE]

Thus,

[TABLE]

Now, let $\Phi$ be a primitive function of $\phi$ . By Lemma 4, the function $x\rightarrow\mathbb{E}[\Phi(F_{t})\mid X_{0}=x]=P_{t}\Phi(x)$ satisfies

[TABLE]

Then, since $X_{t}$ and $X_{0}$ are both drawn from $\nu$ ,

[TABLE]

implying that $\mathbb{E}[\tau_{t}\mid F_{t}]=0$ almost surely. ∎

Returning to the proof of Theorem 7, letting $s,t>0$ and using Lemma 8 along with Lemma 5 and Jensen’s inequality, we obtain

[TABLE]

Then, by Lemma 6,

[TABLE]

where

[TABLE]

Finally, by (18),

[TABLE]

and using approximation arguments concludes the proof of Theorem 7.

4.2 Multi-dimensional case

Unfortunately, it is not possible to use a multi-dimensional generalization of the random vector $\tau_{t}$ defined in Lemma 8 as we would only be able to show that

[TABLE]

which is not sufficient to assert that $\mathbb{E}[\tau_{t}\mid F_{t}]=0$ . Instead, one can add an exchangeability assumption on the stochastic process $(X_{t})_{t\geq 0}$ to obtain the following result.

Theorem 9.

Let $p\geq 1$ and let $\nu$ be a probability measure on $\mathbb{R}^{d}$ with finite moment of order $p$ . Let $(X_{t})_{t\geq 0}$ be a stochastic process such that $X_{0}$ is drawn from $\nu$ and such that the pairs $(X_{0},X_{t})$ and $(X_{t},X_{0})$ follow the same law for any $t>0$ . Suppose that, for any $\epsilon>0$ ,

[TABLE]

Then, for any $s>0$ ,

[TABLE]

where

[TABLE]

Let $s>0$ and let $(X_{t})_{t\geq 0}$ be a stochastic process such that, for any $t\geq 0$ , $(X_{t},X_{0})$ and $(X_{0},X_{t})$ follow the same law and $\|X_{t}-X_{0}\|$ is bounded. Again, this last assumption as well as our previous smoothness assumptions on the measure $\nu$ can be replaced by (22) thanks to approximation arguments derived in Section 8. Let us start by using the stochastic process $(X_{t})_{t\geq 0}$ to define a quantity $\tau_{t}$ such that $\mathbb{E}[\tau_{t}\mid F_{t}]=0$ .

Lemma 10.

Let $s,t>0$ . The quantity

[TABLE]

satisfies

[TABLE]

Proof.

Let $\phi\in\mathcal{C}^{\infty}_{c}$ . We have

[TABLE]

Hence, by (16),

[TABLE]

Let $F^{\prime}_{t}\coloneqq e^{-t}X_{t}+\sqrt{1-e^{-2t}}Z$ . By Lemma 4, we have

[TABLE]

Then, since the pairs $(X_{0},X_{t})$ and $(X_{t},X_{0})$ follow the same law,

[TABLE]

and thus $\mathbb{E}[\tau_{t}\mid F_{t}]=0$ . ∎

Returning to the proof of Theorem 9 and using Lemma 10 along with Lemma 5 and Jensen’s inequality, we obtain

[TABLE]

Thus, by Lemma 6,

[TABLE]

where

[TABLE]

Then, injecting this bound in (18) yields

[TABLE]

Finally, rearranging terms in $S_{p}(t)$ using (13) and using approximation arguments concludes the proof of Theorem 9.

5 Central Limit Theorem for the $W_{2}$ distance

Let $m\in(0,2]$ and $X_{1},\dots,X_{n}$ be independent random variables taking values in $\mathbb{R}^{d}$ and such that

•

$\forall i\in\{1,\dots,n\},\mathbb{E}[X_{i}]=0$ ;

•

$\sum_{i=1}^{n}\mathbb{E}[X_{i}^{\otimes 2}]=nI_{d}$ and

•

$\sum_{i=1}^{n}\mathbb{E}[\|X_{i}\|^{2+m}]<\infty$ .

It is known that the measure $\nu_{n}$ of the random variable $S_{n}\coloneqq n^{-1/2}\sum_{i=1}^{n}X_{i}$ converges to the Gaussian measure $\gamma$ . The remainder of this Section is dedicated to quantifying this convergence for the Wasserstein distance of order $2$ in order to obtain the following result.

Theorem 11.

Under the above setting, taking

[TABLE]

we have, for any $n>4$ ,

[TABLE]

Let $X^{\prime}_{1},\dots,X^{\prime}_{n}$ be independent copies of the variables $X_{1},\dots,X_{n}$ . For any $t>0$ , let $\Delta(t)\coloneqq e^{2t}-1$ and

[TABLE]

where $I$ is a uniform random variable taking values in $\{1,\dots,n\}$ and $\|X^{\prime}_{I}\|\vee\|X_{I}\|$ denotes the maximum between $\|X^{\prime}_{I}\|$ and $\|X_{I}\|$ .

For any $t\geq 0$ , $(S_{n})_{t}$ is drawn from the same measure as $S_{n}$ and $\|(S_{n})_{t}-S_{n}\|\leq 2\sqrt{n\Delta(t)}$ . Thus, we can apply Theorem 2 to the measure $\nu_{n}$ of $S_{n}$ using the stochastic process $((S_{n})_{t})_{t\geq 0}$ with $s=\frac{1}{n}$ to obtain

[TABLE]

where

[TABLE]

Let us bound $S(t)$ for $t>0$ . First, since $I$ and $S_{n}$ are independent,

[TABLE]

Then, since $\mathbb{E}[X^{\prime}_{i}]=0$ and since $X^{\prime}_{i}$ and $S_{n}$ are independent,

[TABLE]

and, since $\sum_{i=1}^{n}\mathbb{E}[X_{i}^{\otimes 2}]=nI_{d}$ ,

[TABLE]

Now, taking

[TABLE]

and applying Jensen’s inequality yields

[TABLE]

Therefore,

[TABLE]

From here, developing the squared terms and using the independence of the $(X_{i})_{1\leq i\leq n}$ and $(X^{\prime}_{i})_{1\leq i\leq n}$ , we obtain

[TABLE]

where, for any $i\in\{1,\dots,n\}$ ,

[TABLE]

and

[TABLE]

5.1 Bounding $I_{k}$

Let $i\in\{1,\dots,n\}$ and let $k$ be an odd integer. Since $X_{i}$ and $X^{\prime}_{i}$ are i.i.d.,

[TABLE]

Let us now deal with $I_{2,i}(t)$ . Since $\mathbb{E}[(X^{\prime}_{i}-X_{i})^{\otimes 2}]=2\mathbb{E}[X_{i}^{\otimes 2}]$ , we have

[TABLE]

First, since $(X^{\prime}_{i}-X_{i})^{\otimes 2}$ is positive,

[TABLE]

Now, taking $l\in(0,m]$ , we have

[TABLE]

and, since this bound is valid for any $l\in(0,m]$ ,

[TABLE]

Similarly, for any even integer $k>2$ and any $l\in[0,m]$ ,

[TABLE]

leading to

[TABLE]

Let us introduce the quantity $M(l)$ defined for $l=0$ by

[TABLE]

and, for any $l\in(0,m]$ , by

[TABLE]

By combining our bounds on the $I_{k,i}$ , we obtain

[TABLE]

5.2 Bounding $J_{k}$

Again, taking $i\in\{1,\dots,n\}$ , we have

[TABLE]

Then,

[TABLE]

Finally, for any integer $k>2$ ,

[TABLE]

Overall, letting

[TABLE]

we obtained

[TABLE]

5.3 Integration with respect to $t$

Thanks to the previous computations, we have

[TABLE]

and thus

[TABLE]

The next step of the proof consists in integrating $e^{-t}\mathbb{E}[R(t)]^{1/2}$ with respect to $t$ . First,

[TABLE]

And, since $\int_{0}^{\infty}e^{-t}dt=1$ , we have, by Jensen’s inequality,

[TABLE]

Let us now deal with the remaining term. Let us first assume that $m\leq 1$ . Taking $t_{0}\geq 0$ , we have

[TABLE]

Since $n\geq 4>e^{2}/2$ , taking $t_{0}=\frac{1}{n}$ , we have $-\frac{\log(t_{0}/2)}{2}>1$ and

[TABLE]

If $1\leq m\leq 2$ , performing the same computations with $l=1$ for $t>t_{0}$ yields

[TABLE]

Finally, if $m=2$ ,

[TABLE]

Then, taking $t_{0}=\frac{M(2)}{2M(0)\sqrt{n}}$ ,

[TABLE]

which concludes the proof of Theorem 11.

5.4 Simplifications whenever $\mathbb{E}[X_{i}^{\otimes 2}]=I_{d}$

Let us now assume that $\mathbb{E}[X_{i}^{\otimes 2}]=I_{d}$ for any $i\in\{1,\dots,n\}$ and let $i\in\{1,\dots,n\}$ . We have

[TABLE]

Furthermore,

[TABLE]

leading to

[TABLE]

Similarly,

[TABLE]

Therefore, taking

[TABLE]

we have

[TABLE]

Finally, remarking that $C^{\prime}<14$ and that $\mathbb{E}[X_{i}^{\otimes 2}]=I_{d}$ for all $i$ whenever $(X_{i})_{i\in\{1,\dots,n\}}$ are identically distributed concludes the proof of (8).

6 Rates of the multi-dimensional CLT for $W_{p}$ distances

Let $p>2,q\in[0,2]$ and $m=\min(2,p+q-2)$ . Let $X_{1},\dots,X_{n}$ be independent random variables taking values in $\mathbb{R}^{d}$ and such that

•

$\forall i\in\{1,\dots,n\},\mathbb{E}[X_{i}]=0$ ;

•

$\sum_{i=1}^{n}\mathbb{E}[X_{i}^{\otimes 2}]=nI_{d}$ and

•

$\sum_{i=1}^{n}\mathbb{E}[\|X_{i}\|^{p+q}]<\infty$ .

The aim of this Section is to prove the following result.

Theorem 12.

Under the above setting, taking

[TABLE]

we have that there exists $C_{p}>0$ such that

[TABLE]

Taking $((S_{n})_{t})_{t\geq 0}$ as in the previous question, we have that $((S_{n})_{0},(S_{n})_{t})$ and $((S_{n})_{t},(S_{n})_{0})$ follow the same law for any $t>0$ . Therefore, we can apply Theorem 9 and perform computations similar to those of the previous Section in order to obtain

[TABLE]

with

[TABLE]

Then, using a multi-dimensional version of Rosenthal inequality such as Theorem 5.2 [10], we obtain that there exists $C_{p}>0$ such that

[TABLE]

where the $I_{k,i}(t)$ and $J_{k,i}(t)$ are the same as in the previous Section and

[TABLE]

Then, using arguments similar to the ones used to bound the $J_{k,i}$ ,

[TABLE]

Therefore, there exists $C_{p}>0$ such that

[TABLE]

and integrating with respect to $t$ following the arguments of the previous Section concludes the proof of Theorem 12 while (9) is obtained following the same computations as in Section 5.4.

7 Technical results

In this Section, we provide the proofs of the intermediary results used to derive Theorems 2,7 and 9.

7.1 Proof of Lemma 4

Let $\phi$ be a bounded and measurable function on $\mathbb{R}^{d}$ , let $t>0$ and let $\alpha$ be a multi-index. By (16), we have

[TABLE]

and, since $\phi$ is bounded, there exists $M>0$ such that

[TABLE]

Then, since $\|X_{t}-X_{0}\|$ is bounded as well, we have that there exists $C>0$ such that

[TABLE]

almost surely. Therefore

[TABLE]

and

[TABLE]

exists.

Now, using a Taylor expansion with remainder, we obtain that there exists $\xi$ on the segment $[X_{0},X_{t}]$ such that

[TABLE]

From here, we have

[TABLE]

Then, by (25),

[TABLE]

and

[TABLE]

7.2 Proof of Lemma 6

Let $(M_{\alpha})_{\alpha\in\mathbb{N}^{d}}$ such that $M_{\alpha}\in\mathbb{R}^{d}$ for any multi-index $\alpha$ and let $Z$ be a Gaussian random variable. Let us start with the case $1\leq p<2$ . By Jensen’s inequality,

[TABLE]

Then, since $\mathbb{E}[H_{\alpha}(Z)H_{\alpha^{\prime}}(Z)]=0$ for any two different multi-indices $\alpha,\alpha^{\prime}$ ,

[TABLE]

Now, let $p>2$ and $t\coloneqq\log(\sqrt{p-1})$ . Since the Ornstein-Uhlenbeck semigroup $(P_{t})_{t\geq 0}$ is hypercontractive (see e.g. Theorem 5.2.3 [1]), we have

[TABLE]

This inequality can be readily extended to vector-valued functions $\phi$ , in which case we have

[TABLE]

For any multi-index $\alpha$ , the Hermite polynomial $H_{\alpha}$ is an eigenvector of $P_{t}$ with eigenvalue $e^{-|\alpha|t}=(p-1)^{-|\alpha|/2}$ . Therefore,

[TABLE]

concluding the proof.

8 Approximation arguments

In this Section, we present the approximation arguments necessary to conclude the proof of Theorem 9. Similar arguments can be used to obtain Theorems 2 and 7.

Suppose the measure $\nu$ and the stochastic process $(X_{t})_{t\geq 0}$ satisfy the assumptions of Theorem 9. Let $s>0$ and

[TABLE]

Let $R>1,\epsilon_{1}=R^{-1}$ and $0<\epsilon_{2}<1$ . For any $t>0$ , let $X_{t}^{R}$ be the orthogonal projection of $X_{t}$ on $\mathcal{B}(0,R)$ , the ball of radius $R$ centered at [math]. Let $Z$ be a standard normal random variable, $N$ be a random variable with smooth density and taking values in the ball of radius $1$ and let $I$ be a Bernoulli random variable with parameter $\epsilon_{1}$ such that $(X_{t})_{t\geq 0},Z,N$ and $I$ are independent. Finally, let $U=\epsilon_{1}N$ . For any $t>0$ , let

[TABLE]

Let $\tilde{\nu}_{R}$ be the law of $\tilde{X}_{0}$ . This measure admits a density $h$ with respect to the measure $\gamma$ such that $h=\epsilon_{1}+f$ with $f\in\mathcal{C}^{\infty}_{c}$ . Furthermore, for any $t>0$ , $(\tilde{X}_{0},\tilde{X}_{t})$ and $(\tilde{X}_{t},\tilde{X}_{0})$ follow the same law. Therefore, we can follow the computations of Section 4.2 and use the triangle inequality to obtain

[TABLE]

where

[TABLE]

and

[TABLE]

First, since $Z$ admits a finite moment of order $p$ , there exists $C>0$ such that

[TABLE]

Then, since $X_{0}^{R}$ is the orthogonal projection of $X_{0}$ on $\mathcal{B}(0,R)$ ,

[TABLE]

and, since $\nu$ admits a finite moment of order $p$ , there exists $C>0$ such that

[TABLE]

Therefore, there exists $C>0$ such that

[TABLE]

Now, let

[TABLE]

By the triangle inequality, we have that

[TABLE]

and, since $\|U\|\leq\epsilon_{1}$ ,

[TABLE]

Finally, let

[TABLE]

Since $(X^{R}_{t})_{t\geq 0}$ and $U$ are independent and since $X^{R}_{0}$ is $X_{0}$ -measurable, we have

[TABLE]

From here,

[TABLE]

Thus, applying the triangle inequality Jensen’s inequality yields

[TABLE]

Since $X^{R}_{t}$ is the orthogonal projection of $X_{t}$ on the convex set $\mathcal{B}(0,R)$ , we have $\|X_{0}^{R}\|\leq\|X_{0}\|$ and $\|X^{R}_{t}-X^{R}_{0}\|\leq\|X_{t}-X_{0}\|$ . Hence,

[TABLE]

By (22), there exists $\xi,M>0$ , depending on $\epsilon_{2}$ , such that, for any $t\in[\epsilon_{2},\epsilon_{2}^{-1}]$ ,

[TABLE]

Hence, using Hölder’s inequality, we obtain that there exists $M^{\prime}(\epsilon_{2}),C(\epsilon_{2})>0$ such that

[TABLE]

Combining this bound with (26), (27), (28), (29) and (30), we obtain that there exists $C>0$ and $C_{1}(\epsilon_{2}),C_{2}(\epsilon_{2})>0$ such that

[TABLE]

Since $X_{0}$ has a finite moment of order $p$ and since $\epsilon_{1}=R^{-1}$ , letting $R$ go to infinity and $\epsilon_{2}$ go to zero yields

[TABLE]

On the other hand, when $R$ goes to infinity, we have that $\tilde{\nu}_{R}$ converge weakly to $\nu$ and the $p$ -moment of $\tilde{\nu}_{R}$ converges to the $p$ -moment of $\nu$ . Thus, by Theorem 6.9 [15], $W_{p}(\tilde{\nu}_{R},\nu)$ converges to zero as $R$ goes to infinity. Therefore,

[TABLE]

concluding the proof of Theorem 9.

Acknowledgements

The author would like to thank Michel Ledoux for his many comments and advice regarding the redaction of this paper as well as Jérôme Dedecker, Yvik Swan, Frédéric Chazal and anonymous reviewers for their multiple remarks.

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bakry, D., Gentil, I., Ledoux, M.: Analysis and Geometry of Markov Diffusion operators. Grundlehren der mathematischen Wissenschaften, Vol. 348. Springer (2014)
2[2] Bobkov, S.G.: Entropic approach to e. rio’s central limit theorem for w 2 transport distance. Statistics and Probability Letters 83 (7), 1644–1648 (2013)
3[3] Courtade, T.A., Fathi, M., Pananjady, A.: Existence of Stein Kernels under a Spectral Gap, and Discrepancy Bound. Ar Xiv e-prints (2017)
4[4] Eldan, R., Mikulincer, D., Zhai, A.: The CLT in high dimensions: quantitative bounds via martingale embedding. Ar Xiv e-prints (2018)
5[5] Fathi, M.: Stein kernels and moment maps. Ar Xiv e-prints (2018)
6[6] Goldstein, L., Reinert, G.: Stein’s method and the zero bias transformation with application to simple random sampling. Ann. Appl. Probab. 7 (4), 935–952 (1997)
7[7] Goldstein, L., Rinott, Y.: Multivariate normal approximations by stein’s method and size bias couplings. Journal of Applied Probability 33 , 1–17 (1996)
8[8] Ledoux, M., Nourdin, I., Peccati, G.: Stein’s method, logarithmic sobolev and transport inequalities. Geometric and Functional Analysis 25 (1), 256–306 (2015)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Stein’s method for normal approximation in Wasserstein distances with application to the multivariate Central Limit Theorem

Abstract

1 Introduction

Theorem 1**.**

2 Notations and definitions

3 Bounds for the Wasserstein distance of order 222

Theorem 2**.**

Proposition 3**.**

Lemma 4**.**

4 Gaussian measure and Wasserstein distances of any order

Lemma 5**.**

Proof.

Lemma 6**.**

4.1 One-dimensional case

Theorem 7**.**

Lemma 8**.**

Proof.

4.2 Multi-dimensional case

Theorem 9**.**

Lemma 10**.**

Proof.

5 Central Limit Theorem for the W2W_{2}W2​ distance

Theorem 11**.**

5.1 Bounding IkI_{k}Ik​

5.2 Bounding JkJ_{k}Jk​

5.3 Integration with respect to ttt

5.4 Simplifications whenever E[Xi⊗2]=Id\mathbb{E}[X_{i}^{\otimes 2}]=I_{d}E[Xi⊗2​]=Id​

6 Rates of the multi-dimensional CLT for WpW_{p}Wp​ distances

Theorem 12**.**

7 Technical results

7.1 Proof of Lemma 4

7.2 Proof of Lemma 6

8 Approximation arguments

Acknowledgements

Theorem 1.

3 Bounds for the Wasserstein distance of order $2$

Theorem 2.

Proposition 3.

Lemma 4.

Lemma 5.

Lemma 6.

Theorem 7.

Lemma 8.

Theorem 9.

Lemma 10.

5 Central Limit Theorem for the $W_{2}$ distance

Theorem 11.

5.1 Bounding $I_{k}$

5.2 Bounding $J_{k}$

5.3 Integration with respect to $t$

5.4 Simplifications whenever $\mathbb{E}[X_{i}^{\otimes 2}]=I_{d}$

6 Rates of the multi-dimensional CLT for $W_{p}$ distances

Theorem 12.