Gaussian Approximations for Maxima of Random Vectors under   $(2+\iota)$-th Moments

Qiang Sun

arXiv:1905.11014·math.ST·May 28, 2019

Gaussian Approximations for Maxima of Random Vectors under $(2+\iota)$-th Moments

Qiang Sun

PDF

Open Access

TL;DR

This paper establishes a nonasymptotic Gaussian approximation for the maximum of sums of random vectors with $(2+ ext{iota})$-th moments, providing a versatile tool for statistical learning applications.

Contribution

It introduces a novel nonasymptotic Gaussian approximation theorem applicable to sums of random vectors with limited moments, using new technical methods.

Findings

01

Provides a general Gaussian approximation result for maxima of random vectors

02

Applicable to various statistical learning problems

03

Employs innovative proof techniques including Lindeberg telescoping

Abstract

We derive a Gaussian approximation result for the maximum of a sum of random vectors under $(2 + ι)$ -th moments. Our main theorem is abstract and nonasymptotic, and can be applied to a variety of statistical learning problems. The proof uses the Lindeberg telescopic sum device along with some other newly developed technical results.

Equations68

\displaystyle L_{n}(\gamma,\delta,\iota)=\min\bigg{\{}\gamma^{2}\delta^{-1}\mathbb{E}\bigg{(}\max_{j}\sum\big{|}X_{ij}\big{|}^{3}+\max_{j}\sum\big{|}Y_{ij}\big{|}^{3}\bigg{)},\gamma^{\frac{4+2\iota}{3}}\delta^{-\frac{2+\iota}{3}}\sum_{i=1}^{n}C_{i}(2+\iota)\bigg{\}},

\displaystyle L_{n}(\gamma,\delta,\iota)=\min\bigg{\{}\gamma^{2}\delta^{-1}\mathbb{E}\bigg{(}\max_{j}\sum\big{|}X_{ij}\big{|}^{3}+\max_{j}\sum\big{|}Y_{ij}\big{|}^{3}\bigg{)},\gamma^{\frac{4+2\iota}{3}}\delta^{-\frac{2+\iota}{3}}\sum_{i=1}^{n}C_{i}(2+\iota)\bigg{\}},

\displaystyle\mathbb{P}\big{(}|Z-Z^{\dagger}|\geq c_{\gamma}+3\delta\big{)}\lesssim\frac{\varepsilon+L_{n}(\gamma,\delta,\iota)}{1-\varepsilon}.

\displaystyle\mathbb{P}\big{(}|Z-Z^{\dagger}|\geq c_{\gamma}+3\delta\big{)}\lesssim\frac{\varepsilon+L_{n}(\gamma,\delta,\iota)}{1-\varepsilon}.

\displaystyle\mathbb{P}\big{(}Z\in A\big{)}-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}\leq\frac{\varepsilon+L_{n}(\gamma,\delta,\iota)}{1-\varepsilon}.

\displaystyle\mathbb{P}\big{(}Z\in A\big{)}-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}\leq\frac{\varepsilon+L_{n}(\gamma,\delta,\iota)}{1-\varepsilon}.

1 \leq j \leq d max x_{j} \leq ψ_{γ} (x) \leq 1 \leq j \leq d max x_{j} + c_{γ},

1 \leq j \leq d max x_{j} \leq ψ_{γ} (x) \leq 1 \leq j \leq d max x_{j} + c_{γ},

\displaystyle\mathbb{P}\big{(}Z\in A\big{)}\leq\mathbb{P}\big{(}\psi_{\gamma}(S_{n})\in A^{c_{\gamma}}\big{)}=\mathbb{E}\big{[}1_{A^{c_{\gamma}}}\{\psi_{\gamma}(S_{n})\}\big{]}.

\displaystyle\mathbb{P}\big{(}Z\in A\big{)}\leq\mathbb{P}\big{(}\psi_{\gamma}(S_{n})\in A^{c_{\gamma}}\big{)}=\mathbb{E}\big{[}1_{A^{c_{\gamma}}}\{\psi_{\gamma}(S_{n})\}\big{]}.

(1 - ε) 1_{A} (t) \leq g (t) \leq ε + (1 - ε) 1_{A^{3 δ}} (t) \mbox f or a l l t \in R,

(1 - ε) 1_{A} (t) \leq g (t) \leq ε + (1 - ε) 1_{A^{3 δ}} (t) \mbox f or a l l t \in R,

E [1_{A^{c_{γ}}} {ψ_{γ} (S_{n})}] \leq (1 - ε)^{- 1} E {g \circ ψ_{γ} (S_{n})} .

E [1_{A^{c_{γ}}} {ψ_{γ} (S_{n})}] \leq (1 - ε)^{- 1} E {g \circ ψ_{γ} (S_{n})} .

\displaystyle\big{|}\mathbb{E}f(S_{n})-\mathbb{E}f(S_{n}^{\dagger})\big{|}\lesssim L_{n}(\gamma,\delta,\iota),

\displaystyle\big{|}\mathbb{E}f(S_{n})-\mathbb{E}f(S_{n}^{\dagger})\big{|}\lesssim L_{n}(\gamma,\delta,\iota),

\displaystyle\mathbb{P}\big{(}Z\in A\big{)}-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}

\displaystyle\mathbb{P}\big{(}Z\in A\big{)}-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}

\displaystyle\leq\mathbb{E}\big{[}1_{A^{c_{\gamma}}}\{\psi_{\gamma}(S_{n})\}\big{]}-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}\leq(1-\varepsilon)^{-1}\mathbb{E}f(S_{n})-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}

\displaystyle\lesssim\frac{\mathbb{E}f(S_{n}^{\dagger})}{1-\varepsilon}-\mathbb{P}\big{(}Z^{\dagger}\in A^{c_{\gamma}+3\delta}\big{)}+\frac{L_{n}(\gamma,\delta,\iota)}{1-\varepsilon}\leq\frac{\varepsilon+L_{n}(\gamma,\delta,\iota)}{1-\varepsilon},

∣ E f (S_{n}) - E f (S_{n}^{†}) ∣ \leq L_{n} (γ, δ, ι),

∣ E f (S_{n}) - E f (S_{n}^{†}) ∣ \leq L_{n} (γ, δ, ι),

E f (S_{n}) - E f (S_{n}^{†}) = i = 1 \sum n E f (T_{i}) - E f (T_{i + 1}) .

E f (S_{n}) - E f (S_{n}^{†}) = i = 1 \sum n E f (T_{i}) - E f (T_{i + 1}) .

f (T_{i}) - f (T_{i + 1})

f (T_{i}) - f (T_{i + 1})

+ II_{i} \frac{1}{2} X_{i}^{T} \nabla^{2} f (L_{i}) X_{i} - \frac{1}{2} Y_{i}^{T} \nabla^{2} f (L_{i}) Y_{i} + R_{i},

E I = E i = 1 \sum n I_{i} = i = 1 \sum n {E (T_{i} - T_{i + 1})}^{T} E {\nabla f (L_{i})} = 0.

E I = E i = 1 \sum n I_{i} = i = 1 \sum n {E (T_{i} - T_{i + 1})}^{T} E {\nabla f (L_{i})} = 0.

E II

E II

\displaystyle=2^{-1}\sum_{i=1}^{n}\sum_{j,k}\mathbb{E}\big{\{}\partial_{jk}f(L_{i})\big{\}}\mathbb{E}\big{\{}X_{ij}X_{ik}-Y_{ij}Y_{ik}\big{\}}

= 0.

E R

E R

R_{i}

R_{i}

= 6^{- 1} E_{θ} ⎩ ⎨ ⎧ j, k, ℓ \sum (1 + θ)^{2} X_{ij} X_{ik} X_{i ℓ} \partial_{j k ℓ} f (L_{i} + θ X_{i}) ⎭ ⎬ ⎫

+ 6^{- 1} E_{θ} ⎩ ⎨ ⎧ j, k, ℓ \sum (1 + θ)^{2} Y_{ij} Y_{ik} Y_{i ℓ} \partial_{j k ℓ} f (L_{i} + θ Y_{i}) ⎭ ⎬ ⎫,

E R

E R

+ 6^{- 1} E ⎩ ⎨ ⎧ i = 1 \sum n j, k, ℓ \sum (1 + θ)^{2} Y_{ij} Y_{ik} Y_{i ℓ} \partial_{j k ℓ} f (L_{i} + θ Y_{i}) ⎭ ⎬ ⎫

\displaystyle\leq{6^{-1}}\mathbb{E}\left\{\sum_{j,k,\ell}\|\partial_{jk\ell}f\|_{\infty}\max_{j,k,\ell}\sum\big{|}X_{ij}X_{ik}X_{i\ell}\big{|}\right\}

\displaystyle\qquad\qquad+{6^{-1}}\mathbb{E}\left\{\sum_{j,k,\ell}\|\partial_{jk\ell}f\|_{\infty}\max_{j,k,\ell}\sum\big{|}Y_{ij}Y_{ik}Y_{i\ell}\big{|}\right\}

= A + B .

\displaystyle\sum_{j,k,\ell}^{d}\big{|}\partial_{jk\ell}f(x)\big{|}\leq\|g^{\prime\prime\prime}\|_{\infty}+6\gamma\|g^{\prime\prime}\|_{\infty}+6\gamma^{2}\|g^{\prime}\|_{\infty}\leq(7C+6)\gamma^{2}\delta^{-1}\lesssim\gamma^{2}\delta^{-1}.

\displaystyle\sum_{j,k,\ell}^{d}\big{|}\partial_{jk\ell}f(x)\big{|}\leq\|g^{\prime\prime\prime}\|_{\infty}+6\gamma\|g^{\prime\prime}\|_{\infty}+6\gamma^{2}\|g^{\prime}\|_{\infty}\leq(7C+6)\gamma^{2}\delta^{-1}\lesssim\gamma^{2}\delta^{-1}.

A

A

\displaystyle\lesssim\gamma^{2}\delta^{-1}\mathbb{E}\left\{\max_{j,k,\ell}\sum\big{|}X_{ij}X_{ik}X_{i\ell}\big{|}\right\}\lesssim\gamma^{2}\delta^{-1}\mathbb{E}\left\{\max_{j}\sum\big{|}X_{ij}\big{|}^{3}\right\}.

B

B

E R

E R

E R

E R

E R_{i}

E R_{i}

\displaystyle\qquad+\gamma^{2}\Big{(}\max_{1\leq j\leq d}|X_{ij}|^{2}+\max_{1\leq j\leq d}|Y_{ij}|^{2}\Big{)},\gamma^{3}\Big{(}\max_{1\leq j\leq d}|X_{ij}|^{3}+\max_{1\leq j\leq d}|Y_{ij}|^{3}\Big{)}\Big{\}}.

\displaystyle\min\big{\{}a+x+x^{2},x^{3}\big{\}}\leq 3a^{(1-\iota)/3}x^{2+\iota}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and statistical mechanics · Probability and Risk Models · Bayesian Methods and Mixture Models

Full text

Gaussian Approximations for Maxima of Random Vectors under $(2+\iota)$ -th Moments

Qiang Sun Department of Statistical Sciences, University of Toronto, 100 St. George Street, Toronto, ON M5S 3G3, Canada; E-mail: [email protected].

Abstract

We derive a Gaussian approximation result for the maximum of a sum of random vectors under $(2+\iota)$ -th moments. Our main theorem is abstract and nonasymptotic, and can be applied to a variety of statistical learning problems. The proof uses the Lindeberg telescopic sum device along with some other newly developed technical results.

keywords Gaussian Approximation, Maxima.

1 Introduction and Main Result

We derive a Gaussian approximation result for maxima of sums of high dimensional random vectors under $(2+\iota)$ -th moments for some $0\leq\iota\leq 1$ . This complements the results of Chernozhukov et al. (2014) which require third moment condition; see Theorem 4.1 therein. Later, Chernozhukov et al. (2017) provided high-dimensional central limit and bootstrap theorems for sparsely convex sets. Our derivation utilizes the Lindeberg telescopic sum device along with some other newly developed technical results.

Let $X_{1},\ldots,X_{n}$ be independent random vectors in $\mathbb{R}^{d}$ with mean zero and finite $(2+\iota)$ -th moments, that is, $\mathbb{E}(X_{ij})=0$ and $\mathbb{E}\big{(}|X_{ij}|^{2+\iota}\big{)}<\infty$ , for some $0\leq\iota\leq 1$ . Let $\Sigma\equiv\mathbb{E}\big{(}X_{i}X_{i}^{\mathrm{\scriptscriptstyle T}}\big{)}$ . Consider the statistic $Z=\max_{1\leq j\leq d}\sum_{i=1}^{n}X_{ij}.$ Let $Y_{1},\ldots,Y_{n}$ be independent random vectors in $\mathbb{R}^{d}$ with $Y_{i}\sim\mathcal{N}(0,\Sigma).$ For $0\leq\iota\leq 1$ and $\gamma,q>0$ such that $\gamma\delta>1$ , let

[TABLE]

where $C_{i}(q)=\mathbb{E}\big{(}\max_{1\leq j\leq d}|X_{ij}|^{q}+\max_{1\leq j\leq d}|Y_{ij}|^{q}\big{)}.$

Let “ $\lesssim$ ” stand for “ $\leq$ ” up to a universal constant. Our main result follows.

Theorem 1.1.

For any positive scalers $\delta,\gamma$ such that $\delta\gamma>1$ and $\varepsilon=\gamma\delta\exp\{-(\gamma^{2}\delta^{2}-1)/2\}<1$ , there exists a random variable $Z^{\dagger}{\buildrel d\over{=}}\max_{1\leq j\leq d}\sum_{i=1}^{n}Y_{ij}$ such that

[TABLE]

Proof of Theorem 1.1.

The proof of this theorem exploits the smooth approximations for the nonsmooth $\max$ and indicator functions, and the device of Lindeberg’s telescopic sum Lindeberg (1922). Because $X_{ij}$ ’s only have bounded $(2+\iota)$ -th moments, the Gaussian comparison inequalities developed previously (Chernozhukov et al., 2014) can not be applied, at least not immediately. The key technical difference is Lemma 2.1, where we uses the device of Lindeberg’s telescopic sum.

The rest of the proof follows from that in Chernozhukov et al. (2014). We outline it here for completeness. We start by using a version of Strassen’s theorem to prove Theorem 1.1, i.e. Lemma 4.1 in Chernozhukov et al. (2014). Using this lemma, the conclusion follows immediately if we can prove that for every Borel subset $A$ of $\mathbb{R}$ ,

[TABLE]

We shall fix any Borel subset $A$ of $\mathbb{R}$ throughout the proof. The first two steps are standard, which involve smooth approximations to the non-smooth maps as discussed previously. We first approximate the non-smooth map $\mathbb{R}^{d}\mapsto\mathbb{R}:x\mapsto\max_{1\leq j\leq d}x_{j}$ by the smooth function $\psi_{\gamma}:\mathbb{R}^{d}\mapsto\mathbb{R}$ defined by $\psi_{\gamma}(x)=\gamma^{-1}\log\big{(}\sum_{j=1}^{d}e^{\gamma x_{j}}\big{)}$ for $x\in\mathbb{R}^{d}$ . By elementary calculations, we have for any $x=(x_{1},\ldots,x_{d})^{\mathrm{\scriptscriptstyle T}}$ ,

[TABLE]

where $c_{\gamma}=\gamma^{-1}\log d$ . Similarly, let $S_{n}=\sum_{i=1}^{n}X_{i}$ and $S_{n}^{\dagger}=\sum_{i=1}^{n}Y_{i}$ , the Gaussian analogue of $S_{n}$ . Then

[TABLE]

Then we approximate the indicator function $t\mapsto 1_{A}(t)$ by a smooth function. We utilize the following lemma, which is taken from Chernozhukov et al. (2014) and can be traced back to Pollard (2002).

Lemma 1.2.

Let $\gamma>0$ and $\delta>\gamma^{-1}$ . For every Borel subset $A$ of $\mathbb{R}$ , there exists a smooth function $g:\mathbb{R}\mapsto\mathbb{R}$ such that $\|g^{\prime}\|_{\infty}\leq\delta^{-1}$ , $\|g^{\prime\prime}\|_{\infty}\leq C\delta^{-1}\gamma$ , $\|g^{\prime\prime\prime}\|_{\infty}\leq C\delta^{-1}\gamma^{2}$ and

[TABLE]

where $C>0$ is an absolute constant and $\varepsilon=\varepsilon(\gamma,\delta)=\gamma\delta\exp\{-(\gamma^{2}\delta^{2}-1)/2\}<1$ .

We take a suitable function $g$ as justified in Lemma 1.2 to the set $A^{c_{\gamma}}$ and obtain

[TABLE]

For simplicity, we write $f=g\circ\psi_{\gamma}$ , i.e., $f(x)=g(\psi_{\gamma}(x))$ for $x\in\mathbb{R}$ . Then, it suffices to compare $\mathbb{E}\{f(S_{n})\}$ and $\mathbb{E}\{f(S_{n}^{\dagger})\}$ using the smoothness of $f$ . If we can establish the following inequality,

[TABLE]

which is provided in the Lemma 2.1. Then, applying Lemma 1.2 again, it follows

[TABLE]

where we used the property of the smooth approximation $\psi_{\gamma}$ in the last inequality. Therefore, we only need to prove (1.4). This completes the proof. ∎

2 Statement and Proof of Lemma 2.1

Lemma 2.1.

Recall the definitions for $f$ , $S_{n}$ and $S_{n}^{\dagger}$ in the proof of Lemma 1.1. Then, for any $0\leq\iota\leq 1$ , we have

[TABLE]

where $L_{n}(\gamma,\delta,\iota)$ is defined in (1.1).

Proof of Lemma 2.1.

We use the device of Lindeberg’s telescopic sum (Lindeberg, 1922) to prove this lemma. Let $T_{i}=\sum_{k=1}^{i-1}Y_{k}+\sum_{k=i}^{n}X_{k}$ , with $T_{1}=\sum_{k=1}^{n}X_{k}$ . Then, we write $\mathbb{E}f(S_{n})-\mathbb{E}f(S_{n}^{\dagger})$ as a telescopic sum:

[TABLE]

In order to bound the left-hand side in the above identity, we instead bound the telescopic sum. Let $\Delta_{i}=T_{i}-T_{i+1}$ and $L_{i}=\sum_{k=1}^{i-1}Y_{k}+\sum_{k=i+1}^{n}X_{k}$ . We use $\nabla f$ to denote the derivative, and $\nabla^{2}f=(\partial_{jk}f)_{1\leq j,k\leq p}$ the Hessian. $f(V_{i})-f(V_{i+1})$ can be decomposed as follows:

[TABLE]

where $R_{i}$ is the remainder term such that $R_{i}=f(T_{i})-f(T_{i+1})-\text{I}_{i}-\text{II}_{i}$ .

Let $R=\sum_{i=1}^{n}R_{i},\ \text{I}=\sum_{i=1}^{n}\text{I}_{i},\ \text{and}\ \text{II}=\sum_{i=1}^{n}\text{II}_{i}.$ Then $\mathbb{E}f(S_{n})-\mathbb{E}f(S_{n}^{\dagger})=\mathbb{E}\text{I}+\mathbb{E}\text{II}+\mathbb{E}{R}$ . In what follows, we bound the expectation of terms I, II, and $R_{i}$ respectively. Starting with I, because $T_{i}-T_{i+1}=X_{i}-Y_{i}$ , which is independent of $L_{i}$ , we have

[TABLE]

For II, the expectation of II can be bounded by

[TABLE]

In the following lemma, we give an upper bound for the expectation of $R$ .

Lemma 2.2.

Let $f(x):\mathbb{R}^{d}\mapsto\mathbb{R}$ be defined as in Theorem 1.1. Then we must have

[TABLE]

Proof of Lemma 2.2.

Recall the definition of $R=\sum_{i=1}^{n}R_{i}$ . Let $\theta$ be a uniform distributed random variable over $[0,1]$ , independent of all other random variables. Using the third order Taylor approximation for multivariate functions, we obtain

[TABLE]

where the first and second-order terms canceled out. Therefore, $\mathbb{E}R$ can be bounded as

[TABLE]

Now we bound $A$ and $B$ respectively. We start with $A$ . Following elementary calculations along with Lemma 1.2, we obtain

[TABLE]

which, combined with equation (2), yields

[TABLE]

Similarly,

[TABLE]

Now using the fact that $0\leq f(x)\leq 1$ and $\mathbb{E}\text{I}=\mathbb{E}\text{II}=0$ , we obtain

[TABLE]

Putting the upper bounds (2.3), (2.4), and (2.5) together yields

[TABLE]

Using the fact that $R_{i}=f(T_{i})-f(T_{i+1})-\text{I}_{i}-\text{II}_{i}$ and in a similar argument, we shall obtain

[TABLE]

We need the following lemma, which enables the relaxation of the moment conditions.

Lemma 2.3.

Let $a\geq 1$ and $x\geq 0$ . For any $0\leq\iota\leq 1$ , we have

[TABLE]

Proof of Lemma 2.3.

Using the fact that $a>1$ and splitting the support of $x$ , we obtain

[TABLE]

∎

Applying Lemma 2.3 with $x=\gamma\max(|X_{ij}|,|Y_{ij}|)$ , we obtain

[TABLE]

where $C_{i}(2\!+\!\iota)=\mathbb{E}\big{(}\max_{1\leq j\leq d}|X_{ij}|^{2+\iota}+\max_{1\leq j\leq d}|Y_{ij}|^{2+\iota}\big{)}$ . Combining two different bounds together yields Lemma 2.2. ∎

∎

Bibliography4

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Chernozhukov et al. (2014) Chernozhukov, V. , Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. The Annals of Statistics 42 1564–1597.
2Chernozhukov et al. (2017) Chernozhukov, V. , Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. The Annals of Probability 45 2309–2352.
3Lindeberg (1922) Lindeberg, J. W. (1922). Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 15 211–225.
4Pollard (2002) Pollard, D. (2002). A User’s Guide to Measure Theoretic Probability , vol. 8. Cambridge University Press.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Gaussian Approximations for Maxima of Random Vectors under (2+ι)(2+\iota)(2+ι)-th Moments

Abstract

1 Introduction and Main Result

Theorem 1.1**.**

Proof of Theorem 1.1.

Lemma 1.2**.**

2 Statement and Proof of Lemma 2.1

Lemma 2.1**.**

Proof of Lemma 2.1.

Lemma 2.2**.**

Proof of Lemma 2.2.

Lemma 2.3**.**

Proof of Lemma 2.3.

Gaussian Approximations for Maxima of Random Vectors under $(2+\iota)$ -th Moments

Theorem 1.1.

Lemma 1.2.

Lemma 2.1.

Lemma 2.2.

Lemma 2.3.