Rates of convergence in the CLT for nonlinear statistics under relaxed moment conditions
Nguyen Tien Dung

TL;DR
This paper establishes explicit convergence rates in the central limit theorem for nonlinear statistics under relaxed moment conditions using Stein's method, extending previous results to broader moment assumptions.
Contribution
It provides new explicit rates of convergence for nonlinear statistics with relaxed moment conditions, including cases with vanishing third moments.
Findings
Rates of convergence are of optimal order $O(n^{-rac{ ext{ extdelta}}{2}})$ and $O(n^{-rac{1+ extdelta}{2}})$.
Results apply to nonlinear statistics with finite moments of order $2+ extdelta$ and $3+ extdelta$.
Method uses covariance identities and solutions to Stein's equation.
Abstract
This paper is concerned with normal approximation under relaxed moment conditions using Stein's method. We obtain the explicit rates of convergence in the central limit theorem for (i) nonlinear statistics with finite absolute moment of order (ii) nonlinear statistics with vanishing third moment and finite absolute moment of order When applied to specific examples, these rates are of the optimal order and Our proof are based on the covariance identify formula and simple observations about the solution of Stein's equation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Statistical Methods and Bayesian Inference · Statistical Distribution Estimation and Applications
Rates of convergence in the CLT for nonlinear statistics under relaxed moment conditions
Nguyen Tien Dung Department of Mathematics, VNU University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam. Email: [email protected]
Abstract
This paper is concerned with normal approximation under relaxed moment conditions using Stein’s method. We obtain the explicit rates of convergence in the central limit theorem for (i) nonlinear statistics with finite absolute moment of order (ii) nonlinear statistics with vanishing third moment and finite absolute moment of order When applied to specific examples, these rates are of the optimal order and Our proof are based on the covariance identify formula and simple observations about the solution of Stein’s equation.
Keywords: Central limit theorem, rate of convergence, nonlinear statistics, Stein’s method.
2010 Mathematics Subject Classification: 60F05, 62E17.
1 Introduction
Let be a vector of independent random variables (not necessarily identically distributed). We consider the problem of normal approximations for nonlinear statistics of the form
[TABLE]
We recall that this is one of the most fundamental problems in the theory of mathematical statistics. The main task is to investigate the rate of convergence in the central limit theorem (CLT) for When has finite absolute moments of order this problem has been well studied. A significant amount of normal approximation results for and its special forms can be found in the literature. The reader can consult the monograph [7] for a detailed representation of this topic.
We now consider the case, where only has finite absolute moments of order This case is more difficult to study and it requires some new ideas. It seems to us that not too many general results can be found in the literature.
- •
For the partial sum of -valued independent random variables , the classical result proved by Lyapunov in 1901 says that converges in distribution to a standard normal random variable if
[TABLE]
where and Sixty five years latter, the rate of convergence in Lyapunov’s central limit theorem was established by Bikjalis [3] and Ibragimov [14]. They obtained the following error bound
[TABLE]
for some constant depending only on where denotes the Wasserstein distance. If, in addition, the random variables are identically distributed, then and this rate is optimal, see e.g. [17] for a short survey.
- •
More recently, the optimal rate of convergence for certain nonlinear statistics was also obtained in [2, 6, 7]. Meanwhile, Bentkus et al. [2] focus on -statistics, Chen and Shao [6, 7] investigate the sum of locally dependent random variables and the nonlinear statistics that can be written as where is a linear statistic and is an error term.
- •
Surprisingly, to the best of our knowledge, a systematic study for nonlinear statistics of the general form (1.1) is still missing. This is the first motivation of the present paper. In fact, our Theorem 2.1 will partially fill up this gap by providing explicit bounds on Wasserstein distance for the rate of convergence.
Another motivation of this paper comes from the vanishing third moment phenomenon discussed in Section 4.8 of [8]. This phenomenon says that, under additional moment assumptions, the standard convergence rate can be improved to Let us recall the following.
Proposition 1.1**.**
Let be a standard normal distribution and be independent and identically distributed mean zero, variance one random variables with and finite. Then, for we have
[TABLE]
where and denotes the supremum norm.
The moment condition is the best possible one to achieve the rate So there are two open questions arising here: (i) Find the rate of convergence under moment condition for some (ii) Generalize the above phenomenon for nonlinear statistics (1.1). Our Theorem 2.2 will provide a complete answer to both those questions. In fact, when applied to we obtain the rate
Powerful as it is, Stein’s method will be the main tool to prove our Theorems 2.1 and 2.2. Recall that this method was proposed by Stein in 1970’s and since then, many different techniques have been developed to use it. The present paper will continue employing the technique based on difference operators which was used in our recent paper [13]. The key allowing us to relax moment conditions is simple observations about the solution of Stein’s equation, see Propositions 4.2 and 4.3.
The rest of the paper is organized as follows. Our main results (Theorems 2.1 and 2.2) are described in Section 2. Some illustrative examples with detailed computations are given in Section 3. Proofs of the main theorems are given in Section 4. Some useful moment inequalities are provided in Section 5.
2 The main results
Throughout this paper let denote a standard normal random variable. To measure the distance to normality of a random variable we will use the following two distances
-distance (or Wasserstein distance) defined by
[TABLE]
-distance defined by
[TABLE]
where is the space of -times differentiable real-valued functions on and denotes the supremum norm.
We now describe the main results of this paper. Let be a measurable space and be a vector of independent random variables, defined on some probability space and taking values in Let be a measurable function, the random variable is called a nonlinear statistic. We introduce the -fields
[TABLE]
and
[TABLE]
Definition 2.1**.**
Given a random variable we define the difference operators by
[TABLE]
where denotes the expectations with respect to In addition, for each we define
[TABLE]
We note that the difference operators are very useful in the study of concentration inequalities., see e.g. [4]. In the context of normal approximations, recent papers [10, 13] have successfully used those operators for nonlinear statistics with finite fourth moment. The following two properties of will be used in our present work.
Proposition 2.1**.**
Let be in for some Then, for every we have
(i)
(ii) for all
Proof.
The point was already proved in Proposition 2.2 of [13]. Let us prove the point Using the discrete Hölder inequality we get
[TABLE]
Then, by Lyapunov’s inequality,
[TABLE]
This finishes the proof of Proposition. ∎
The next theorem is the first main result of the present paper where we provide explicit bounds on the Wasserstein distance for nonlinear statistics with finite absolute moment of order lesser than
Theorem 2.1**.**
Fix and let be centered with For any we always have
[TABLE]
where
Since the topology induced by Wasserstein distance is stronger than that of convergence in distribution, we obtain the following CLT for nonlinear statistics which can be considered as a natural generalization of the classical Lyapunov central limit theorem. It should be noted that, in the case of partial sums the condition (2.3) is itself satisfied and the condition (2.4) is exact (1.2).
Corollary 2.1** (Lyapunov’s CLT).**
Let be a sequence of nonlinear statistics in with We put
[TABLE]
Assume that there exist and such that
[TABLE]
where Then, converges in distribution to a standard normal random variable as if
[TABLE]
Proof.
Follows directly from the bound (2.2) and the following relation
[TABLE]
∎
The second main result of the present paper is formulated in the next theorem where we investigate the vanishing third moment phenomenon under relaxed moment condition
Theorem 2.2**.**
Fix and let be centered with and For any we always have
[TABLE]
where is as in Theorem 2.1 and
[TABLE]
Remark 2.1*.*
By the fundamental inequality we have
[TABLE]
Hence, the point of Proposition 2.1 tells us that the condition implies and So our bounds (2.1) and (2.2) are well defined. Similarly, the condition ensures that the bounds (2.5) and (2.6) are well defined.
Remark 2.2*.*
In the statement of Theorems 2.1 and 2.2, we introduced three new parameters and Let us give here an example to show the role of those parameters. Consider the sequence
[TABLE]
where be the independent and identically distributed random variables with Note that and Hence, in both Theorems, the moment condition is satisfied with and we expect to obtain the optimal rates of convergence and for the distances and respectively.
We have
[TABLE]
[TABLE]
and hence,
[TABLE]
The choice . The bound (2.1) with becomes
[TABLE]
So this choice fails to prove the central limit theorem for because as
The choice . We have and and Now the bound (2.1) with will yield the optimal rate for the Wasserstein distance. Indeed, we have
[TABLE]
Furthermore, we have Hence, by choosing the bound (2.5) with gives us
[TABLE]
The reader can verify that the choice will fail to give the above optimal rate.
3 Examples
In this section, we provide some examples to illustrate the applicability of our abstract results. Although our examples are fundamental ones, to the best of our knowledge, the results of this section are new (except the bound (3.1) which was already obtained in [3, 14]).
3.1 Partial sums
Let are independent -valued random variables with and for some Define and
We have
[TABLE]
Hence, for any
[TABLE]
and, for any
[TABLE]
It is easy to see that
[TABLE]
So our bound (2.2) yields
[TABLE]
which recovers the classical bound (1.3) with
Let us now investigate the vanishing third moment phenomenon for
Proposition 3.1**.**
Suppose that and for and for some Then, we have
[TABLE]
where C_{\delta}:=8+2^{2+\delta}\big{(}3^{\frac{\delta}{3}}(2+3^{\frac{3+\delta}{3}})\big{)}^{\frac{3}{3+\delta}}+2^{4+\delta}.
Proof.
We have
[TABLE]
and hence,
[TABLE]
So we can get
[TABLE]
We also have
[TABLE]
Since this allows us to use the bound (2.6) and we obtain (3.2). ∎
Clearly, when the random variables have the same distribution with mean zero and variance one, the bound (3.2) becomes This is a generalization of Proposition 1.1 because
3.2 A sum of dependent random variables
Fix an integer number and let be independent random variables taking values in Let are measurable functions. In this section, we generalize the classical Lyapunov bound (1.3) to the following sum of dependent random variables
[TABLE]
where We note that the run and scan statistics are two important examples of the form (3.3), see e.g. [1].
Proposition 3.2**.**
We consider the nonlinear statistic defined by (3.3).
I.* Assume that for and for some Then, we have*
[TABLE]
where c_{m,\delta}:=(2m)^{2+\delta}\big{(}8(2m-1)+2^{2+\delta}\big{)}.
II.* Assume that and for and for some Then, we have*
[TABLE]
where C_{m,\delta}:=(2m)^{3+\delta}\big{(}16(4m-2)^{2}+2^{3+\delta}(4m-2)+2^{3+\delta}\big{)}.
Proof.
I. In Theorem 2.1, we choose to use Then, the bound (2.2) gives us
[TABLE]
where
We observe that if does not depends on Hence, using the convention if or we obtain
[TABLE]
Then, we can get
[TABLE]
with the convention if By the fundamental inequality and the point of Proposition 2.1, we deduce
[TABLE]
Consequently,
[TABLE]
By using the Lyapunov and Hölder inequalities, we get So it holds that
[TABLE]
Inserting this relation into (3.6) yields
[TABLE]
Furthermore, from (3.7) and the point of Proposition 2.1, we deduce
[TABLE]
and hence,
[TABLE]
So (3.4) follows.
II. Choosing the bound (2.6) gives us
[TABLE]
where is as in the part I and
[TABLE]
Using the same arguments as in the proof of (3.8) we obtain
[TABLE]
On the other hand, we have, for
[TABLE]
So we can deduce
[TABLE]
We now observe that
[TABLE]
which, in turn, implies that
[TABLE]
We therefore obtain
[TABLE]
Inserting (3.10) and (3.11) into (3.9) we obtain the bound (3.5) because
[TABLE]
The proof of Proposition is complete. ∎
Remark 3.1*.*
In the proof of Proposition 3.2, we used The reader can verify that the other choices of and give us similar bounds to (3.4) and (3.5), the value of constants and may vary.
Example 3.1**.**
Let be independent and identically distributed -valued random variables with zero mean and unit variance. We consider the sequence of -runs defined by
[TABLE]
It is easy to see that and Hence, if then the bound (3.4) gives us
[TABLE]
If and then the bound (3.5) gives us
[TABLE]
3.3 CLT for quadratic forms without finite fourth moment
Let be independent -valued random variables with zero means, unit variances and be a symmetric matrix with vanishing diagonal, where each is a real number depending on For the simplicity of notations, we will write instead of The central limit theorem and normal approximation results for the quadratic form
[TABLE]
has been extensively discussed in the literature. The most of works require the finite fourth moment condition, i.e. The best known result proved by de Jong [11] tells us that the converges to a standard normal random variable in distribution if
[TABLE]
where and \mathrm{Tr}(A^{4})=\sum\limits_{u,v=1}^{n}\big{(}\sum\limits_{k=1}^{n}a_{ku}a_{kv}\big{)}^{2}. Also see [12, 18] for the rates of convergence obtained there.
Here, in the next Proposition, we only require the random variables to have the finite absolute moment of order This is a significant supplement to the literature.
Proposition 3.3**.**
Assume that
[TABLE]
and
[TABLE]
for some Then, converges in distribution to a standard normal random variable as Moreover, we have
[TABLE]
where is a positive constant depending only on
Proof.
We first use the bound (2.2) to prove (3.14). We have
[TABLE]
We choose to use Then, we obtain
[TABLE]
and
[TABLE]
where
[TABLE]
We observe that \mathfrak{D}_{l}Z^{(\alpha)}_{k}=(X^{2}_{k}-1)\big{(}\sum\limits_{u=1}^{n}a_{ku}X_{u}\big{)}^{2} if and for
[TABLE]
Hence, we obtain
[TABLE]
It follows from the fundamental inequality that
[TABLE]
By the inequalities (5.1) and (5.2) below we deduce
[TABLE]
and
[TABLE]
To estimate the third addend in the right hand side of (3.15), we put
[TABLE]
We have and
[TABLE]
Hence, by the inequality (5.2), we obtain
[TABLE]
Once again, we use the inequality (5.2) to get
[TABLE]
and
[TABLE]
Combining the above computations, we obtain from (3.15) that
[TABLE]
where is a positive constant depending only on Consequently, for some
[TABLE]
On the other hand, we use the inequality (5.1) to get
[TABLE]
and hence,
[TABLE]
Inserting the estimates (3.16) and (3.17) into (2.2) gives us the bound (3.14).
To finish the proof, we observe that
[TABLE]
Hence, the conditions (3.12) and (3.13) ensure that as So converges in distribution to
The proof of Proposition is complete. ∎
Remark 3.2*.*
In the proof of Proposition 3.3, we used because we want to obtain similar conclusions as in [11]. If we choose to use then the bound (3.14) depends on \sum\limits_{u,v=1}^{n}\big{|}\sum\limits_{k=u}^{n}a_{ku}a_{kv}\big{|}^{\frac{2+\delta}{2}} instead of \sum\limits_{u,v=1}^{n}\big{|}\sum\limits_{k=1}^{n}a_{ku}a_{kv}\big{|}^{\frac{2+\delta}{2}}.
4 Proofs of the main results
Our proof will repeatedly use the following covariance formula, see Proposition 2.3 in [13].
Proposition 4.1**.**
Let and be two random variables in For any we have
[TABLE]
where we recall that
Here, we note that the condition can be replaced by and for some with In particular, if is bounded, we only need This is due to the fact that, under such conditions, all expectations in (4.1) exist and hence, this formula still holds true. We also note that the formula (4.1) can be seen as an extension of the covariance identity on page 7 of [15].
In the proof, we also use the following notations. We let be an independent copy of Given a random variable for each we write and denote by the expectation with respect to
4.1 Proof of Theorem 2.1
As mentioned in Introduction, the key allowing us to relax moment conditions is simple observations about the solution of Stein’s equation. We have the following.
Proposition 4.2**.**
Given an absolutely continuous function we consider Stein’s equation
[TABLE]
There then exists a solution to the equation (4.2) that satisfies, for any
[TABLE]
Proof.
It is known from Lemma 2.4 in [8] that there exists a solution to the equation (4.2) that satisfies and Hence, if then
[TABLE]
If we have
[TABLE]
This finishes the proof. ∎
Proof of Theorem 2.1. Without loss of generality, we can and will assume that Let be the solution to Stein’s equation (4.2) as in Proposition 4.2. Then, the Wasserstein distance can be represented as follows
[TABLE]
We separate the proof into three steps.
Step 1. In this step, we claim that, for every
[TABLE]
where is bounded by
[TABLE]
To prove (4.4), we observe that and
[TABLE]
Then, by the Lagrange theorem, there exists a random variable lying between and such that
[TABLE]
Taking the expectation with respect to we obtain (4.4) with defined by
[TABLE]
It follows from the estimate (4.3) that
[TABLE]
So the claim (4.4) is verified.
Step 2. We now use the covariance formula (4.1) to get, for any
[TABLE]
As a consequence,
[TABLE]
We note that is bounded, finite and Hence, once again, we can use the covariance formula (4.1) to get
[TABLE]
for any Since the estimate (4.3) gives us
[TABLE]
We therefore obtain
[TABLE]
For the second addend in the right hand side of (4.6), recalling (4.5), we obtain the following estimate
[TABLE]
Thus we can conclude that
[TABLE]
Then taking the supremum over all satisfying yields
[TABLE]
and the bound (2.1) follows replacing by
Step 3. In this step, we verify the bound (2.2). We use the Hölder inequality and the point of Proposition 2.1 to get
[TABLE]
On the other hand, by the independence, we have Then, by Lyaponov’s inequality, we obtain and hence, we also have
[TABLE]
So it holds that
[TABLE]
With the same arguments above, we obtain
[TABLE]
Inserting (4.7) and (4.8) into (2.1) we obtain the bound (2.2).
The proof of Theorem 2.1 is complete.
4.2 Proof of Theorem 2.2
For the distance we need the following observation about the solution of Stein’s equation.
Proposition 4.3**.**
Let with bounded derivatives. Then the equation (4.2) has a solution in that satisfies, for any
[TABLE]
Proof.
It is known from Theorem 1.1 in [9] that there exists a solution to the equation (4.2) that satisfies and Hence, the proof of (4.9) is similar to that of (4.3). So we omit it. ∎
We also need a technical lemma.
Lemma 4.1**.**
Let be centered and be as in Theorem 2.2. It holds that
[TABLE]
Proof.
It is easy to check that
[TABLE]
Hence, by the covariance formula (4.1), we obtain
[TABLE]
This completes the proof. ∎
Proof of Theorem 2.2. It suffices to consider Let be a solution of the equation (4.2) that has the properties mentioned in Proposition 4.3. We have
[TABLE]
We separate the proof into three steps.
Step 1. We claim that, for every
[TABLE]
where is bounded by
[TABLE]
By the Taylor expansion, there exists a random variable lying between and such that
[TABLE]
We observe that Hence, by taking the expectation with respect to we obtain (4.10) with defined by
[TABLE]
Thanks to the estimate (4.9) we get
[TABLE]
This completes the proof of (4.10).
Step 2. For any by the covariance formula (4.1) and the result of the previous step, we deduce
[TABLE]
where
[TABLE]
Since we obtain
[TABLE]
for any
By using the same argument as in the proof of (4.4) we have
[TABLE]
where Moreover, it follows from the estimate (4.9) that is bounded by
[TABLE]
Inserting (4.13) into (4.12) yields
[TABLE]
From Lemma 4.1, Hence, once again, we can employ the covariance formula (4.1) to rewrite (4.15) as follows
[TABLE]
for any Furthermore, by the estimate (4.9), we have
[TABLE]
Those, combined with (4.11) and (4.14), imply that
[TABLE]
As a consequence, by taking the supremum over all satisfying we deduce
[TABLE]
So we obtain the bound (2.5) by replacing by
Step 3. This step is similar to Step 3 in the proof of Theorem 2.1. We have
[TABLE]
[TABLE]
[TABLE]
So the bound (2.6) follows from (2.5).
The proof of Theorem 2.2 is complete.
5 Appendix: Moment inequalities
In this Section, to make the paper self-contained, we provide some useful moment inequalities which are stated in terms of difference operators More moment inequalities for nonlinear statistics can be found in Chapter 15 of [5].
Proposition 5.1** (Marcinkiewicz-Zygmund type inequality).**
Let for some We have
[TABLE]
where denotes the norm in
Proof.
It follows from the proof of Proposition 2.3 in [13] that we can write where Then, the inequality (5.1) follows directly from Theorem 2.1 in [16] and the fact that ∎
Proposition 5.2** (von Bahr-Esseen type inequality).**
Let for some We have
[TABLE]
In particular, for we have the Efron-Stein inequality that reads
[TABLE]
Proof.
Put and It is easy to check that
[TABLE]
Hence,
[TABLE]
By Taylor’s expansion we have
[TABLE]
An application of Proposition 4.1 gives us
[TABLE]
This finishes the proof of (5.2). When we have
[TABLE]
So the proof of Proposition is complete. The reader can consult Section 3.1 in [5] for the different versions of the Efron-Stein inequality. ∎
Acknowledgments. The author thanks the anonymous referee for valuable comments for improving the paper. This research was funded by Viet Nam National Foundation for Science and Technology Development (NAFOSTED) under grant number 101.03-2019.08. A part of this paper was done while the author was visiting the Vietnam Institute for Advanced Study in Mathematics (VIASM). The author would like to thank the VIASM for financial support and hospitality.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Balakrishnan, M. V. Koutras, Runs and scans with applications. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], New York, 2002.
- 2[2] V. Bentkus, B.-Y. Jing, W. Zhou, On normal approximations to U 𝑈 U -statistics. Ann. Probab. 37 (2009), no. 6, 2174–2199.
- 3[3] A. Bikjalis, Estimates of the remainder term in the central limit theorem. (Russian) Litovsk. Mat. Sb. 6 1966 323–346.
- 4[4] S. G. Bobkov, F. Götze, H. Sambale, Higher order concentration of measure. Commun. Contemp. Math. 21 (2019), no. 3, 1850043, 36 pp.
- 5[5] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities. A nonasymptotic theory of independence. With a foreword by Michel Ledoux. Oxford University Press, Oxford, 2013.
- 6[6] L. H. Y. Chen, Q.-M. Shao, Normal approximation under local dependence. Ann. Probab. 32 (2004), no. 3A, 1985–2028.
- 7[7] L. H. Y. Chen, Q.-M. Shao, Normal approximation for nonlinear statistics using a concentration inequality approach. Bernoulli 13 (2007), no. 2, 581–599.
- 8[8] L. H. Y. Chen, L. Goldstein, and Q.-M. Shao, Normal approximation by Stein’s method. Probability and its Applications (New York). Springer, Heidelberg, 2011.
