Gaussian Approximations for Maxima of Random Vectors under $(2+\iota)$-th Moments
Qiang Sun

TL;DR
This paper establishes a nonasymptotic Gaussian approximation for the maximum of sums of random vectors with $(2+ ext{iota})$-th moments, providing a versatile tool for statistical learning applications.
Contribution
It introduces a novel nonasymptotic Gaussian approximation theorem applicable to sums of random vectors with limited moments, using new technical methods.
Findings
Provides a general Gaussian approximation result for maxima of random vectors
Applicable to various statistical learning problems
Employs innovative proof techniques including Lindeberg telescoping
Abstract
We derive a Gaussian approximation result for the maximum of a sum of random vectors under -th moments. Our main theorem is abstract and nonasymptotic, and can be applied to a variety of statistical learning problems. The proof uses the Lindeberg telescopic sum device along with some other newly developed technical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · Probability and Risk Models · Bayesian Methods and Mixture Models
Gaussian Approximations for Maxima of Random Vectors under -th Moments
Qiang Sun Department of Statistical Sciences, University of Toronto, 100 St. George Street, Toronto, ON M5S 3G3, Canada; E-mail: [email protected].
Abstract
We derive a Gaussian approximation result for the maximum of a sum of random vectors under -th moments. Our main theorem is abstract and nonasymptotic, and can be applied to a variety of statistical learning problems. The proof uses the Lindeberg telescopic sum device along with some other newly developed technical results.
keywords Gaussian Approximation, Maxima.
1 Introduction and Main Result
We derive a Gaussian approximation result for maxima of sums of high dimensional random vectors under -th moments for some . This complements the results of Chernozhukov et al. (2014) which require third moment condition; see Theorem 4.1 therein. Later, Chernozhukov et al. (2017) provided high-dimensional central limit and bootstrap theorems for sparsely convex sets. Our derivation utilizes the Lindeberg telescopic sum device along with some other newly developed technical results.
Let be independent random vectors in with mean zero and finite -th moments, that is, and \mathbb{E}\big{(}|X_{ij}|^{2+\iota}\big{)}<\infty, for some . Let \Sigma\equiv\mathbb{E}\big{(}X_{i}X_{i}^{\mathrm{\scriptscriptstyle T}}\big{)}. Consider the statistic Let be independent random vectors in with For and such that , let
[TABLE]
where C_{i}(q)=\mathbb{E}\big{(}\max_{1\leq j\leq d}|X_{ij}|^{q}+\max_{1\leq j\leq d}|Y_{ij}|^{q}\big{)}.
Let “” stand for “” up to a universal constant. Our main result follows.
Theorem 1.1**.**
For any positive scalers such that and , there exists a random variable such that
[TABLE]
Proof of Theorem 1.1.
The proof of this theorem exploits the smooth approximations for the nonsmooth and indicator functions, and the device of Lindeberg’s telescopic sum Lindeberg (1922). Because ’s only have bounded -th moments, the Gaussian comparison inequalities developed previously (Chernozhukov et al., 2014) can not be applied, at least not immediately. The key technical difference is Lemma 2.1, where we uses the device of Lindeberg’s telescopic sum.
The rest of the proof follows from that in Chernozhukov et al. (2014). We outline it here for completeness. We start by using a version of Strassen’s theorem to prove Theorem 1.1, i.e. Lemma 4.1 in Chernozhukov et al. (2014). Using this lemma, the conclusion follows immediately if we can prove that for every Borel subset of ,
[TABLE]
We shall fix any Borel subset of throughout the proof. The first two steps are standard, which involve smooth approximations to the non-smooth maps as discussed previously. We first approximate the non-smooth map by the smooth function defined by \psi_{\gamma}(x)=\gamma^{-1}\log\big{(}\sum_{j=1}^{d}e^{\gamma x_{j}}\big{)} for . By elementary calculations, we have for any ,
[TABLE]
where . Similarly, let and , the Gaussian analogue of . Then
[TABLE]
Then we approximate the indicator function by a smooth function. We utilize the following lemma, which is taken from Chernozhukov et al. (2014) and can be traced back to Pollard (2002).
Lemma 1.2**.**
Let and . For every Borel subset of , there exists a smooth function such that , , and
[TABLE]
where is an absolute constant and .
We take a suitable function as justified in Lemma 1.2 to the set and obtain
[TABLE]
For simplicity, we write , i.e., for . Then, it suffices to compare and using the smoothness of . If we can establish the following inequality,
[TABLE]
which is provided in the Lemma 2.1. Then, applying Lemma 1.2 again, it follows
[TABLE]
where we used the property of the smooth approximation in the last inequality. Therefore, we only need to prove (1.4). This completes the proof. ∎
2 Statement and Proof of Lemma 2.1
Lemma 2.1**.**
Recall the definitions for , and in the proof of Lemma 1.1. Then, for any , we have
[TABLE]
where is defined in (1.1).
Proof of Lemma 2.1.
We use the device of Lindeberg’s telescopic sum (Lindeberg, 1922) to prove this lemma. Let , with . Then, we write as a telescopic sum:
[TABLE]
In order to bound the left-hand side in the above identity, we instead bound the telescopic sum. Let and . We use to denote the derivative, and the Hessian. can be decomposed as follows:
[TABLE]
where is the remainder term such that .
Let Then . In what follows, we bound the expectation of terms I, II, and respectively. Starting with I, because , which is independent of , we have
[TABLE]
For II, the expectation of II can be bounded by
[TABLE]
In the following lemma, we give an upper bound for the expectation of .
Lemma 2.2**.**
Let be defined as in Theorem 1.1. Then we must have
[TABLE]
Proof of Lemma 2.2.
Recall the definition of . Let be a uniform distributed random variable over , independent of all other random variables. Using the third order Taylor approximation for multivariate functions, we obtain
[TABLE]
where the first and second-order terms canceled out. Therefore, can be bounded as
[TABLE]
Now we bound and respectively. We start with . Following elementary calculations along with Lemma 1.2, we obtain
[TABLE]
which, combined with equation (2), yields
[TABLE]
Similarly,
[TABLE]
Now using the fact that and , we obtain
[TABLE]
Putting the upper bounds (2.3), (2.4), and (2.5) together yields
[TABLE]
Using the fact that and in a similar argument, we shall obtain
[TABLE]
We need the following lemma, which enables the relaxation of the moment conditions.
Lemma 2.3**.**
Let and . For any , we have
[TABLE]
Proof of Lemma 2.3.
Using the fact that and splitting the support of , we obtain
[TABLE]
∎
Applying Lemma 2.3 with , we obtain
[TABLE]
where C_{i}(2\!+\!\iota)=\mathbb{E}\big{(}\max_{1\leq j\leq d}|X_{ij}|^{2+\iota}+\max_{1\leq j\leq d}|Y_{ij}|^{2+\iota}\big{)}. Combining two different bounds together yields Lemma 2.2. ∎
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chernozhukov et al. (2014) Chernozhukov, V. , Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. The Annals of Statistics 42 1564–1597.
- 2Chernozhukov et al. (2017) Chernozhukov, V. , Chetverikov, D. and Kato, K. (2017). Central limit theorems and bootstrap in high dimensions. The Annals of Probability 45 2309–2352.
- 3Lindeberg (1922) Lindeberg, J. W. (1922). Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 15 211–225.
- 4Pollard (2002) Pollard, D. (2002). A User’s Guide to Measure Theoretic Probability , vol. 8. Cambridge University Press.
