Total variation distance between stochastic polynomials and invariance principles
Vlad Bally, Lucia Caramellino

TL;DR
This paper develops bounds for the total variation distance between stochastic polynomials, leading to an invariance principle that generalizes existing results and applies to U-statistics and quadratic forms.
Contribution
It introduces a general method to estimate total variation distances between stochastic polynomials, extending previous invariance principles and CLT applications.
Findings
Established bounds for total variation distance between stochastic polynomials
Derived an invariance principle generalizing known results
Applied results to U-statistics and quadratic forms
Abstract
The goal of this paper is to estimate the total variation distance between two general stochastic polynomials. As a consequence one obtains an invariance principle for such polynomials. This generalizes known results concerning the total variation distance between two multiple stochastic integrals on one hand, and invariance principles in Kolmogorov distance for multi-linear stochastic polynomials on the other hand. As an application we first discuss the asymptotic behavior of U-statistics associated to polynomial kernels. Moreover we also give an example of CLT associated to quadratic forms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Total variation distance
between stochastic polynomials
and invariance principles
Vlad Bally
Lucia Caramellino Université Paris-Est, LAMA (UMR CNRS, UPEMLV, UPEC), INRIA, F-77454 Marne-la-Vallée, France. Email: [email protected] di Matematica and INDAM-GNAMPA, Università di Roma “Tor Vergata”, Via della Ricerca Scientifica 1, I-00133 Roma, Italy. Email: [email protected]
Abstract
The goal of this paper is to estimate the total variation distance between two general stochastic polynomials. As a consequence one obtains an invariance principle for such polynomials. This generalizes known results concerning the total variation distance between two multiple stochastic integrals on one hand, and invariance principles in Kolmogorov distance for multi-linear stochastic polynomials on the other hand. As an application we first discuss the asymptotic behavior of U-statistics associated to polynomial kernels. Moreover we also give an example of CLT associated to quadratic forms.
AMS 2010 Mathematics Subject Classification: 60F17, 60H07.
Keywords: Stochastic polynomials; Invariance principles; Quadratic Central Limit Theorem; U-statistics; Abstract Malliavin calculus.
Contents
1 Introduction
This paper deals with stochastic polynomials of the following type: given a sequence of independent random variables which have finite moments of any order and, given and one looks to
[TABLE]
The coefficients are symmetric and null on the diagonals (that is, if for and only a finite number of them are non null, so the above sum is finite. Let us mention that here, for notation simplicity, we take but in the paper we work with Note also that we use the centred random variables , , but, if the polynomial is given in terms of we may always re-write it in terms of centred random variables.
Our goal is to estimate the total variation distance between the laws of two such polynomials and moreover to establish an invariance principle, that is to estimate the error done by changing by a centred Gaussian random variable which has the same covariance matrix as . Note that this Gaussian vector does not keep the structure given by the powers in the original vector
Since the total variation distance concerns measurable functions, a “regularization effect” has to be at work. This leads us to make the following assumption (known as Doeblin’s condition): there exists and such that and on the ball It is easy to see that this is equivalent with saying that
[TABLE]
where is a probability density with the support included in and is a probability measure. The decomposition (1.3) being given, one constructs three independent random variable with and Bernoulli with parameter and then employs the identity of laws
[TABLE]
The density may be chosen (see (3.6)) in order that has nice properties and this allows one to built an abstract Malliavin type calculus based on and to use this calculus in order to obtain the “regularization effect” which is needed. We have already used this argument in [1, 5, 3, 4]. In an independent way, Nourdin and Poly in [30] have used similar arguments in a similar problem: they take so has a uniform distribution, and they use a chaos type decomposition obtained in [6]. Note also that hypothesis (1.3) is in fact necessary: in his seminal paper [36] Prohorov proved that (1.3) is (essentially) necessary and sufficient in order to obtain convergence in total variation distance in the Central Limit Theorem (see [1] for details).
The decomposition (1.4) has been introduced by Nummelin (see [22] and [20]) in order to produce atoms which allow one to use the renewal theory for studying the convergence to equilibrium for Markov chains – this is why it is also known as “the Nummelin splitting method”. It has been also used by Poly in his PhD thesis [35] and, to our knowledge, this is the first place where the idea of using the regularization given by the noise appears.
In order to present our results we have to introduce some more notation. Given the coefficient in (1.2) we denote
[TABLE]
The quantity is essentially equivalent (up to a multiplicative factor) with the variance of and is essentially equivalent with the “low influence factor” as it is defined and used in [21] (and we follow several ideas from this paper). These are the quantities which come in, in order to estimate the errors.
For we denote by the supremum norm of and of its derivatives of order less or equal to and, for two random variables and we define the distances
[TABLE]
For is the total variation distance, and, if and then is the Fortet-Mourier distance which metrizes the convergence in law. We also consider the Kolmogorov distance
[TABLE]
We are now able to give our first result, Theorem 3.3, concerning the distance between two polynomials and . Assume that and satisfy the Doeblin’s condition (see (1.3)) and moreover assume that the non degeneracy condition holds for some and denote . Then we prove (see (3.17)) that for every and
[TABLE]
where denote a quantity which depends on the coefficients and in an explicit way (see (3.17)). If then so this term does no more appear. Theorem 3.3 is the main result in our paper.
In Theorem 3.7 we give a variant of this result in Kolmogorov distance: we prove (see (3.21)) that
[TABLE]
is again a positive quantity explicitly depending on and (see 3.21). The estimate (1.8) holds for general laws for and (without assuming the Doeblin’s condition). However now we have to assume that the covariance matrix of both and is invertible. The proof of (1.8) is a direct consequence of the results of Mossel et al. in [21].
In the case (multilinear stochastic polynomials) and if and are Gaussian random variables, and are multiple stochastic integrals. In this special case we may drop out and in (1.7) (see Theorem 3.4). Estimates in total variation for such integrals are already studied: the inequality (1.7) for multiple stochastic integrals (for ) has been firstly announced in [10] with the power instead of above, but the proof was only sketched. It has been rigourously proved in [29] with power and recently improved in [8] where the power is obtained. So (1.7) is a generalization of the above results on multiple stochastic integrals to general polynomials depending on a general noise. But, as the above discussion suggests, (1.7) is not the best possible estimate (the approach in [8] does not seem to work in our general framework, so for the moment we are not able to improve it).
A second result, given in Theorem 3.9, concerns the invariance principle. We consider a sequence of independent centred Gaussian random variables and we assume that the covariance matrix of coincides with the covariance matrix of where We denote by the polynomial in which is replaced by We stress that is multi-linear with respect to in contrast to which is a general polynomial with respect to In Theorem 3.9 we prove that, if for some then for every ,
[TABLE]
being explicitly dependent on (see (3.22). A result going in the same direction was previously obtained by Nourdin and Poly in [30]. They take , so is a multi-linear polynomial, and they assume Doeblin’s condition for Then they prove that, if is a sequence of coefficients such that then The progress achieved in our paper consists in the fact that we deal with general polynomials on one hand and we obtain an estimate of the error on the other hand.
A similar estimate with instead of represents the main result in [21] (see Theorem 3.19 therein). Let us be more precise. In [21] one considers “orthonormal ensembles” which are nothing else than multi-dimensional random variables such that and (the Kronecker delta). One denotes the polynomial defined (1.1) in which is replaced by And in [21] (Theorem 3.19 therein) they prove that if then
[TABLE]
Note that in this theorem one does not need Doeblin condition to hold true. Note also that the orthonormality condition for is not more restrictive than saying that the covariance matrix of is invertible and the lower eigenvalues satisfy for every (see the proof of Theorem 2.3). So, by taking one obtains also (1.9) (under the above hypothesis on The difference with respect to their result is just that we deal with convergence in total variation distance instead of Kolmogorov distance.
An important consequence of (1.9) is that it allows to replace the study of the asymptotic behavior of a sequence of general stochastic polynomials by the study of which are elements of a finite number of Wiener chaoses. Of course, the central example is the classical CLT, where and , so is just a Gaussian random variable. But, starting with the proof of the “forth moment theorem” by Nualart and Peccati [33] and Nourdin and Peccati [25], a lot of work has been done in order to characterize the convergence to normality of elements of a finite number of Wiener chaoses (see [23, 28, 32, 34] or [24] for an overview). Moreover, convergence to a distribution has been treated in [25]. We give the consequences of these results in Theorem 3.11 and Theorem 3.13.
Finally we give two more applications. The first one concerns U-statistics. The problem is the following: given a probability law an integer and a symmetric kernel one wants to estimate
[TABLE]
on the basis of a sample of independent random variables of law An un-biased estimator of is constructed by
[TABLE]
in which if any two indexes are equal, otherwise . In the case when is a polynomial this enters in our framework. This covers an important class of kernels: for example gives the estimator of the variance. But not all: for example is out of reach. Say that Then
[TABLE]
This fits in (1.1) except that is not centred. It turns out that the procedure which consists in centering coincides, in this framework, with the Hoeffding’s decomposition, which is a central tool in the U-statistics theory. After doing this one obtains
[TABLE]
for some appropriate coefficients , and we are back in our framework. In U-statistics theory one says that the kernel is degenerated at order if for and Then one writes
[TABLE]
with It follows that the asymptotic behavior of is controlled by Using this decomposition, in Theorem 4.3 we characterizes the limit of as a linear combination of multiple stochastic integrals. The limit is considered both in Kolmogorov distance under general conditions and in total variation distance under Doeblin condition for . Let us mention that number of results are already known concerning the convergence in Kolmogorov distance for U-statistics: they represent generalizations of the Berry–Essen theorem (we refer to [19] and [18]). But the result in total variation distance, which generalizes Prohorov’s theorem for the CLT, seems to be new.
Another subject which is very closed, is that of quadratic forms. Here also the asymptotic behavior in Kolmogorov distance is well understood (see de Jong [11, 12] , Rotar’ et al. [13, 37] and Götze et al. [14]) but we have not found results concerning the convergence in total variation. We do not treat this subject in all generality but we restrict ourselves to the following interesting example: for we define
[TABLE]
where are independent identically distributed random variables with and And for and For we prove that and for one has with a standard normal random variable. Thus, there is a change of regime in As before, the convergence takes place in Kolmogorov distance for a general and in total variation distance under Doeblin’s condition.
The paper is organized as follows. In Section 2, we fix our settings and we give some preliminary results. Section 3 is devoted to our main results: we first precisely define the Doeblin’s condition and the Nummelin splitting (Section 3.1); then we introduce our main result Theorem 3.3 and its several consequences (Section 3.2); finally we analyze the Gaussian and Gamma approximation (Section 3.3). The main examples are developed in Section 4: in Section 4.1 we study the asymptotic behavior of U-statistics written on polynomial kernels and in Section 4.2 we study the convergence of the above quadratic CLT result. Finally, Section 5 contains the proof of our main Theorem 3.3, which is given in the last Section 5.5: in Section 5.1 we introduce the abstract Malliavin calculus, in Section 5.2 we state the regularization lemma we use in this paper, Section 5.3 is devoted to proper estimates of the Sobolev norms and Section 5.4 refers to the non-degeneracy result of the Malliavin covariance matrix. The paper concludes with two appendixes: Appendix A studies an iterated Hoeffding’s inequality for martingales and Appendix B gives useful estimates for the Sobolev norms which are used the Malliavin integration by parts formula.
Acknowledgments. We thank to Cristina Butucea and to Dan Timotin for useful discussions.
2 Notation, basic objects and preliminary results
In this section we introduce multi-linear stochastic polynomials based on a sequence of abstract independent random variables In the next section, when dealing with general polynomials as in (1.1), we will take
The basic noise. We assume that and that has finite moments of any order: for every there exists some such that for every and
[TABLE]
Multi-indexes. We will use “double” multi-indexes with with and We always assume that So we work with ”ordered” multi-indexes. We also denote , and The set of such multi-indexes is denoted by and we set . We stress that we consider also the void multi-index and in this case we put Moreover, for a sequence we denote
[TABLE]
with if .
Coefficients. We consider a Hilbert space with norm and for a valued random variable , we denote In a first stage we have just but in Section 5, when considering stochastic derivatives, we have to use some general space . We denote . These are the coefficients we will use. We define
[TABLE]
and
[TABLE]
The notation means that for some When , we shall omit the subscript , so we simply write , , and . For several authors (see e.g. [21] or [27]), is called the “influence” factor.
Multi-linear polynomials. Given we define
[TABLE]
In the sequel we use several times Burkholder’s inequality for Hilbert space valued martingales: if is a martingale then for every there exists such that
[TABLE]
the second inequality being obtained by using the triangle inequality with respect to
Moreover, as an immediate consequence of (2.1), for every and every we have
[TABLE]
Using these two inequalities we obtain
Lemma 2.1
Suppose that (2.1) holds and denote Then
[TABLE]
and
[TABLE]
Proof. We proceed by recurrence on For we have so (2.8) is obvious. For with we denote
[TABLE]
and we write
[TABLE]
Note that, if , and are independent. So, using (2.6) first and (2.7) then we get
[TABLE]
and by the recurrence hypothesis,
[TABLE]
So (2.8) is proved.
We now prove (2.9) again by induction. The case follows from (2.8). For , we have
[TABLE]
If , and are independent, so and are independent as well. Therefore we can apply (2.6) and (2.7) and we obtain
[TABLE]
and by the recurrence hypothesis,
[TABLE]
We give now the basic invariance principle. We take and for we denote by the supremum norm of and its derivatives up to order three.
Theorem 2.2
Let be a sequence of centred independent random variables which verify (2.1) and let be a sequence of independent centred Gaussian random variables such that Then, for every
[TABLE]
with
[TABLE]
in which
Proof. The proof is based on Lindeberg’s method (we follow the argument from [21]). We fix we denote and we define For we define the intermediate sequences , with and , and we write
[TABLE]
We denote and, for with we define
[TABLE]
This means that, if does not contain we insert in the convenient position. We put
[TABLE]
and then
[TABLE]
Moreover, with defined by we get
[TABLE]
We use now Taylor’s expansion of order three around [math] for both and . Since and are independent of and and the first and second moments of and coincide, the first and second order terms in the Taylor expansion cancel and we obtain
[TABLE]
We have
[TABLE]
The same is true for , so (recall that and are independent of
[TABLE]
Using (2.9),
[TABLE]
and this gives
[TABLE]
We sum over and we get
[TABLE]
We recall now the main result from [21] concerning the invariance principle in Kolmogorov distance (defined in (1.6)).
Theorem 2.3
Let be a sequence of centred independent random variables which verify (2.1) and let denote the covariance matrix of We assume that there exists such that for every
[TABLE]
Let be a sequence of independent centred Gaussian random variables such that Then
[TABLE]
with
[TABLE]
Proof. We denote and we define so that are orthonormal. In the formalism in [21], is called an “orthonormal ensemble”. Then we define
[TABLE]
and we notice that, with this definition,
[TABLE]
Moreover one easily checks that
[TABLE]
Let us check that is hypercontractive in the sense of [21]. We notice that and we take Then, for any coefficients we have (with
[TABLE]
and this means, in the formalism from [21] that is hypercontractive. Now we are able to use Theorem 3.19 in [21] (which is written in terms of ), and this yields (2.15).
3 Main results
3.1 Doeblin’s condition and splitting
We fix and , we denote and we work with a sequence of independent random variables We deal with general polynomials with variables that is, with linear combinations of monomials Because of the powers , this is no more a multi-linear polynomial. In order to come back to multi-linear polynomials we define by
[TABLE]
With this definition, if , with and , then
[TABLE]
where , , with
[TABLE]
in which the symbols and denote the integer and the fractional part of respectively. We denote
[TABLE]
that is
[TABLE]
which agrees with (1.1)-(1.2) in dimension 1 ().
The crucial hypothesis in this section is that for every the law of is locally lower bounded by the Lebesgue measure - this is Doeblin’s condition. Let us be more precise.
Hypothesis . Let , and be fixed. We say that satisfies hypothesis if there exist such that for every measurable set
[TABLE]
* denoting the Lebesgue measure on , and*
[TABLE]
Note that there is no assumption about , , being identically distributed, but the fact that the parameters , and are the same for every represents a uniformity assumption. Note also that this property never holds for This is why we are obliged to work with only.
Hypothesis . We say that satisfies hypothesis if holds and if for every one has .
Note that if Assumption holds then verifies (2.1).
The interesting point about random variables which verity is that one may use a splitting method in order to obtain a nice representation for (in law). We introduce the auxiliary functions defined by
[TABLE]
and we denote
[TABLE]
Let and be independent random variables with laws
[TABLE]
Note that the hypothesis ensures that so that the law of is well defined. It is easy to check that has the same law as . Since all our statements concern only the law of , now on we assume that
[TABLE]
Let us mention a nice property for the function : it is easy to check that for each there exists a universal constant such that
[TABLE]
where denotes the derivative of order of
Actually, the uniformity property (3.5) has not been used so far. We see now that it gives a “non degeneracy” for the powers of the components of uniformly in . More precisely, we define the random vector in , that is
[TABLE]
where and are given in (3.2). Then, one has the following result.
Lemma 3.1
Let be such that (3.5) holds and let denote the covariance matrix of . Then there exists such that
[TABLE]
for every and .
Proof. For and we define
[TABLE]
If then for in an open set, and this imply that Since is continuous, it follows that And since is continuous it follows that one may find such that . Now, we note that and . Thus, if we get , and (3.12) follows.
We conclude with an inequality which will be useful later on.
Lemma 3.2
Let be such that (3.5) holds and let be given in Lemma 3.1. Then for every ,
[TABLE]
with defined in (3.11).
Proof. We first fix an integer , and we consider , . We prove that
[TABLE]
We define the random variable
[TABLE]
We notice that are independent of and that
[TABLE]
So,
[TABLE]
the above lower bound following from (3.12). By iteration, one gets (3.13).
Consider now the general case. We recall that, for any two multi-indexes and , if and only if . This gives
[TABLE]
where, for fixed , we have set . The statement now follows from (3.14).
3.2 Main results
Our goal is to estimate the total variation distance between two polynomials of type , which we write as in (3.3), that is
[TABLE]
where is defined in (3.1) and with , .
We will use the following quantities related to the coefficients We work first with the Hilbert space (so, we drop from the notation) and we recall that is defined in (2.2) and is defined in (2.3). Moreover, for we define
[TABLE]
Finally we assume that verifies and we denote
[TABLE]
Notice that if and satisfy respectively then they both satisfy so we may assume that , and are the same.
For we define the distances
[TABLE]
Note that is the total variation distance and is the Fortet Mourier distance (which metrizes the convergence in law). We give now our first result:
Theorem 3.3
Suppose that and verify Hypothesis (that is (2.1) and ) and let be two families of coefficients. We fix and and and such that and and we denote We also assume that
[TABLE]
Let . Then there exist and , which depend on the parameters and the moment bounds , for a suitable but independent of the coefficients , such that
[TABLE]
* and being defined in (3.15).*
In practical situations, one has or both and are very small, so in (3.16) is actually the -distance between and .
The proof of Theorem 3.3 is done by using a Malliavin type calculus based on which we present in Section 5, so we postpone it for Section 5.5. It represents the main effort in our paper.
As an immediate consequence, we give the following estimate of the total variation distance between two multiple stochastic integrals. We consider a dimensional Brownian motion we fix and, for a symmetric kernel we denote
[TABLE]
Theorem 3.4
Let Then, for every and there exist and (both depending on and ) such that
[TABLE]
Remark 3.5
In the case the above result has first been announced in [10] with the power instead of above, but the proof was only sketched. It has rigourously been proved in [29] with power and recently improved in [8] where the power is obtained. So (3.18) is not the best possible estimate. This also indicates that the power in (3.17) is not optimal (but the approach in [8] does not seem to work in our general framework, so for the moment we are not able to improve it).
Remark 3.6
Theorem 3.4, with exactly the same proof, extends to general random variables which live in a finite sum of Wiener chaoses: let and be two random variables belonging to where is the chaos of order We denote by the projection on and we put and Then, with
[TABLE]
where and depend on .
Proof of Theorem 3.4. Let For , we denote and we define
[TABLE]
Note that is the conditional expectation of with respect to the partition and to the uniform law on Take now with and . We denote
[TABLE]
so that
[TABLE]
We are now in the framework of Theorem 3.3 and we compare and . We take and Then . Let us estimate the parameters associated to By the convergence theorem for martingales We estimate now . By using Hölder’s inequality,
[TABLE]
so that and as .
Now (3.17) gives, for and
[TABLE]
where . We take and we notice that so that the above inequality gives as It follows that the sequences and are Cauchy in and we may pass to the limit in (3.20) in order to obtain (3.18).
We give now the analogous of Theorem 3.3 but in terms of Kolmogorov distance. Here one needs no more Doeblin’s condition nor non degeneracy conditions.
Theorem 3.7
Suppose that and verify (2.1) and are such that and both satisfy (2.14). Let be two families of coefficients such that and with . Then, for every and there exist and such that
[TABLE]
where denotes a constant depending on , suitable moments of and and on the lower bounds in (2.14) applied to and .
Remark 3.8
Note that the estimate (3.21) is in terms of whereas in (3.17) it appears which is much smaller. But we need that and satisfy Doeblin’s condition
Proof. We consider the Gaussian random variables and corresponding to and respectively and we use Theorem 2.3 (see (2.15)) in order to obtain
[TABLE]
Using the same argument as in the proof of Theorem 2.3 we may assume that and are standard Gaussian random variables so that and are multiple stochastic integrals. By and by (3.19) first and (2.12) (recall that then
[TABLE]
We give now the invariance principle:
Theorem 3.9
Let be a sequence of independent valued random variables which verify Hypothesis and a sequence of independent and centred Gaussian random variables such that Suppose that for some one has Let . Then there exist and , which depend on the parameters and the moment bounds , for a suitable but independent of the coefficients , such that
[TABLE]
Proof. This is an immediate consequence of Theorem 3.3 and of Theorem 2.2.
In a number of concrete applications (see Theorem 4.3 for example), one takes and, asymptotically, represents the principal term. Having in mind this we give the following corollary:
Theorem 3.10
Let be such that for and Suppose .
A. If denote independent centred Gaussian random variables then, for every there exists such that
[TABLE]
B. Let satisfy and let be a sequence of independent and centred Gaussian random variables such that . Then for every there exists such that
[TABLE]
C. If satisfies (2.14) then for every there exists such that
[TABLE]
In the above estimates (3.23), (3.24) and (3.25), denotes a constant independent of the coefficients .
Proof. One has
[TABLE]
so (3.23) follows from Theorem 3.3 (see (3.17)). Using (3.23) and (3.22) we obtain (3.24). And (3.25) follows from (3.23) and (2.15).
3.3 Gaussian and Gamma approximation
Theorem 3.10 has the following interesting application: if one considers a sequence of coefficients the study of the asymptotic behavior of reduces to the study of the asymptotic behavior of , where is a sequence of independent and centred Gaussian random variables such that . Since is (nearly) a multiple Wiener stochastic integral of order this problem is already treated at least in two significant cases: the convergence to normality and the convergence to a Gamma distribution. In fact, the convergence to normality of the law of is controlled by the Forth Moment Theorem due to Nualart and Peccati [33] and Nourdin and Peccati [25]. And the convergence to a Gamma distribution (and in particular to a distribution) is treated in [25]. In order to give the consequences of these results in our framework we have to identify the link between the notation in our paper and in the above mentioned works. Note that the coefficients have been defined as with , , with on the simplex We extend them by symmetry on the whole and we denote by this extension (with the convention that is zero if for . So we will have
[TABLE]
The second point is to write the sequence of multi-dimensional random variables as a sequence of one-dimensional random variables and to re-indicate the coefficients in a corresponding way. But we have to note first that are not a priori independent, because is not the identity matrix. So we assume that is invertible and we first use (2.17) in order to write
[TABLE]
with defined in (2.16). Now are independent and we are ready to write them as a sequence. We define by . Setting and the integer respectively the fractional part of , the inverse function is then defined as follows: if and if . We extend this definition to multi-indexes: if then And to coefficients: if we define by Moreover, we consider the sequence Then
[TABLE]
with the convention that now we work with the multi-index Note that is a multiple stochastic integral of order
We introduce now the “contraction operators”. For and one denotes with the convention that for we put and for Note that, even if is symmetric, is not symmetric, so we introduce to be the symmetrization of
We introduce now
[TABLE]
It is known (see [25]) that is equal to the forth cumulant of and moreover, it is proved in [25] that, if is a standard normal random variable, then
[TABLE]
Using this and Theorem 3.10 we immediately obtain
Theorem 3.11
Let be a standard normal random variable.
A. If satisfies and, for every is invertible, then for every there exists such that
[TABLE]
B. If satisfies (2.1) and (2.14) then for every there exists such that
[TABLE]
In the above estimates (3.27) and (3.28), denotes a constant independent of the coefficients .
Remark 3.12
This is a generalization of the “forth moment theorem” to stochastic polynomials. However there is a difference because the influence factor appears in (3.27). One may ask if it is possible to control the distance between stochastic polynomials and the normal distribution in terms of only. An affirmative answer has recently been given in the following more particular framework: assume that so that is a multi-linear polynomial. Assume also that the random variables are identically distributed. Then, if the convergence to normality is controlled by only (see Theorem 2.3 in [26]).
We discuss now the convergence to a Gamma distribution. For we consider a centred Gamma distribution of parameter : where has a Gamma law with parameter (that is, with density ). If is integer then is a centred chi-square distribution with degrees of freedom. We introduce
[TABLE]
with \theta_{m}=\frac{1}{4}(m/2)!\left(\begin{array}[]{c}m\\ m/2\end{array}\right). Combining Theorem 3.11 and Proposition 3.13 from [25] one obtains
[TABLE]
If is an integer then has a centred distribution, so may be represented as a polynomial of degree two of Gaussian random variables. Then, using Theorem 5.9 in [8] one obtains
[TABLE]
Then, using Theorem 3.10 we obtain
Theorem 3.13
Let be a random variable with a centred distribution with degrees of freedom.
A. If satisfies and, for every is invertible, then for every there exists such that
[TABLE]
B. If satisfies (2.1) and (2.14) then for every there exists such that
[TABLE]
In the above estimates (3.30) and (3.31), denotes a constant independent of the coefficients .
4 Examples
4.1 U-statistics associated to polynomial kernels
Let us first shortly recall how U-statistics appear. One considers a class of distributions and aims to estimate a functional with In order to do it one has at hand a sequence of independent random variables with law but does not know which is this law. The goal is to construct an unbiased estimator, that is a sequence of functions such that the estimator converges to and moreover for every This means that the estimator is unbiased - and this is the origin of the name U-statistics. In 1948 Halmos [15] asked the question if such an unbiased estimator exists and if it is unique. It turns out that the necessary and sufficient condition in order to be able to construct such an estimator is that has the following particular form: there exists and a measurable function such that
[TABLE]
In this case one may construct the symmetric unbiased estimator (and if is sufficiently large, this estimator is unique in the class of the symmetric estimators) in the following way:
[TABLE]
where the sum is taken over all the subsets such that for . It is clear that may be taken to be symmetric (if not one takes its symmetrization and this change nothing).
When is a polynomial, this fits in our framework and our results apply, but, for example is out of reach. We will treat first two standard examples.
Example 1. (Variance estimator) We denote and . We take so that
[TABLE]
In order to come back in our framework we write
[TABLE]
It follows that
[TABLE]
thus
[TABLE]
In our notation, we have
[TABLE]
where if and
[TABLE]
The quantities which come on in our convergence theorem are
[TABLE]
Our invariance principle (Theorem 3.9) says that is asymptotically equivalent in total variation distance with
[TABLE]
where are Gaussian random variables with the same mean and covariance as Then is a centred Gaussian random variable with variance so, if holds, then Theorem 3.9 and Theorem 3.10 yield
[TABLE]
for every , with a standard normal random variable.
Remark 4.1
Another way to do things, used in U-statistics theory, is the following. One employs the two dimensional CLT in order to prove that the term normalized with converges in law to and then one notes that the remaining term is smaller, so it may be ignored.
Example 2. We look to the U-statistics associated to We set and . Here is not invariant with respect to translations and we have two different limits according to the fact that is null or not. We write
[TABLE]
so that
[TABLE]
Case 1: . Then
[TABLE]
with if and
[TABLE]
One has
[TABLE]
Using Theorem 3.9 and Theorem 3.10, the asymptotic behavior of is equivalent to the behavior of
[TABLE]
with standard normal.
Case 2: . Then
[TABLE]
where if and
[TABLE]
Here,
[TABLE]
Using the invariance principle (Theorem 3.9) this is close to with independent standard normal random variables. We define and Then the law of coincides with the law of the double Itô integral Setting , we recall that the law of coincides with the law of where is standard normal. Then, using Theorem 3.9 (with and Theorem 3.4 (with one obtains, for every
[TABLE]
An alternative way to solve the problem is to write
[TABLE]
and to use the CLT in order to replace with and to say that by the law of large numbers the last term goes to . This gives the convergence in law of to
Remark 4.2
The above two examples suggest the following rough comparison of the strategies employed in the U-statistics theory on one hand and in our paper on the other hand. In the U-statistics theory one tries to make blocks of terms such that in the end appears as a continuous function of blocks of the form or and then use the CLT, respectively the law of large numbers, in order to replace them, asymptotically, by a Gaussian random variable respectively by a constant. Alternatively, in our paper one begins by using the invariance principle in order to change and by Gaussian random variables and And then one solves the problem of the asymptotic behavior in the framework of Wiener chaoses.
Let us go on and look to general polynomials. We fix , we denote and we define
[TABLE]
with symmetric coefficients which are null on the diagonals. So is a general symmetric polynomial of order in the variables We associate to the U-statistic defined in (4.2):
[TABLE]
The above quantity is linked with the stochastic polynomials defined in the previous sections in the following way. One takes and and constructs coefficients such that with associated to in (3.1): The problem is that is centred whereas which appears in (4.4), is not. I turns out that the operation which consists in centering in (4.4) is exactly the Hoeffding decomposition, introduced by Hoeffding in [16, 17], and which plays a crucial role in the theory of U-statistics. Let us recall it. For one defines the kernels
[TABLE]
Then Hoeffding’s decomposition is the following:
[TABLE]
where is the U-statistic associated to in the first equality from (4.4) (with replaced by . See for example Theorem 1 in Section 1.6 in [19] for the proof of (4.5).
We denote and we compute
[TABLE]
so we obtain
[TABLE]
We conclude that
[TABLE]
In the theory of U-statistics one says that is degenerated at order if for and which amounts to
[TABLE]
We assume that (4.6) holds and we write
[TABLE]
with
[TABLE]
By (4.6), the U-statistic is degenerated at order if and only if
[TABLE]
which is the same non-degeneracy condition we are interested in.
We recall that and that in (2.14) we have introduced the covariance matrix , that is
[TABLE]
We consider a correlated Brownian motion with we define the multiple stochastic integrals
[TABLE]
and we denote
[TABLE]
Theorem 4.3
A. If verifies and (4.6) holds then for every
[TABLE]
B. Suppose that has finite moments of any order and that If (4.6) holds then, for every
[TABLE]
Proof. In order to use Theorem 3.10 we estimate
[TABLE]
Finally we study the influence factor:
[TABLE]
Then (3.24) gives
[TABLE]
And by employing (3.25) one has
[TABLE]
4.2 A quadratic central limit theorem
For , we look to the quadratic form
[TABLE]
where are centred independent random variables which have finite moments of any order. The aim of this section is to prove that if then converges to a double stochastic integral while for the limit is a standard Gaussian random variable. In our notation, we have , , and
[TABLE]
where for and if ,
[TABLE]
Theorem 4.4
Let be a sequence of independent and centred random variables, with and which have finite moments of any order.
A. Let . We denote and , being a Brownian motion. Then for every there exists and such that for
[TABLE]
Suppose moreover that holds. Then for every there exists and such that for
[TABLE]
B. Let . We denote a standard normal random variable. There exists and such that for
[TABLE]
Suppose moreover that holds. Then (4.12) holds with instead of
Proof A. We extend by symmetry the coefficients to all indexes with . We denote and we define
[TABLE]
Let us prove that
[TABLE]
We take and we write
[TABLE]
with
[TABLE]
Note that if then
[TABLE]
so that
[TABLE]
Moreover
[TABLE]
Finally, by comparing Riemann sums with the corresponding integral,
[TABLE]
Since we obtain (4.13). It follows that, for sufficiently large
[TABLE]
And we also have
[TABLE]
Note that and Using Theorem 2.3 (with ), Theorem 3.4 (see (3.18) with ) and (4.13) we obtain
[TABLE]
so (4.10) is proved for
We suppose now that verifies (3.4) and we use Theorem 3.9 (see (3.22) with in order to obtain
[TABLE]
so (4.12) is proved for also.
B. We have with (recall that
[TABLE]
We note first that
[TABLE]
These inequalities are easily obtained by comparing with It immediately follows that
[TABLE]
and Now, using Theorem 2.3
[TABLE]
and, if satisfies , we use Theorem 3.9 and we obtain
[TABLE]
Now we have to estimate the total variation distance between and the normal random variable In order to do it we use (3.26), so we have to estimate the kurtosis We denote and we write
[TABLE]
In order to obtain the last inequality one just looks to the graphs of the functions and to the graph of the step approximation of this function. And the step approximation is below the function in these regions. Moreover (see [3] Lemma B1 for a complete computation)
[TABLE]
It follows that
[TABLE]
5 Stochastic calculus of variation under the Doeblin’s condition
We assume that the sequence , of independent random variables satisfies Hypothesis , that is the Doeblin’s condition and the moment finiteness one. We strongly use here the representation (3.9) discussed in Section 3.1, that is,
[TABLE]
where are independent with laws given in (3.8). The goal of this section is to present a differential calculus based on which has been introduced in [1, 4] (and which is inspired by the Malliavin calculus [31]).
5.1 Abstract Malliavin calculus and Sobolev spaces
To begin we introduce the space of the simple functionals. We denote by the multi-indexes with (that is, we do not impose that ). We consider polynomials with random coefficients
[TABLE]
where with and The coefficients are random variables which are measurable with respect to and so, in particular, are independent of And we define to be the space of the polynomials computed in that is if
[TABLE]
The simple functionals will be In particular our polynomials belong to Note that is dense in with . So we will define first our differential operators on and we extend them in the canonical way to their domains in .
We assume that (so it is a finite dimensional Hilbert space). Let , so . For and we define the first order derivatives
[TABLE]
We look to as to a random element of the following Hilbert space :
[TABLE]
So The Malliavin covariance matrix of is defined by
[TABLE]
Moreover we define the higher order derivatives in the following way. Let be fixed and let with For , we define
[TABLE]
We look to as to a random element of so . For , we have .
We define now the divergence operator
[TABLE]
Standard integration by parts on gives the following duality relation: for every
[TABLE]
We define now the Sobolev norms. For we set
[TABLE]
Moreover we define
[TABLE]
and
[TABLE]
Finally we define the Sobolev spaces
[TABLE]
The duality relation (5.6) implies that the operators and are closable so we may extend these operators to in a standard way. But in this work we will restrict ourself to .
We recall now the basic computational rules. For and we have
[TABLE]
and for
[TABLE]
In particular for
[TABLE]
Let us stress the following fact which is specific in our framework. In order to establish the integration by parts formula in the classical Malliavin calculus one needs that is almost surely invertible. And this is always falls here: indeed if then on the set which has strictly positive probability. This is why we have to use a localized version of the integration by parts formula. Given we consider a function such that and for every Then we define and we notice that on the set we have , so is invertible. We denote
[TABLE]
Theorem 5.1
Let and and, for we denote Then for every and every
[TABLE]
with
[TABLE]
Moreover let and Suppose that and Then
[TABLE]
with defined by
**Proof. The proof is standard so we just sketch it. **Using the chain rule so that
[TABLE]
It follows that, on the set one has . Then, by using (5.13) and the duality formula (5.6),
[TABLE]
We use once again (5.13) in order to obtain in (5.15). By iteration one obtains the higher order integration by parts formulae.
We give now useful estimates for the weights which appear in (5.16). For we denote
[TABLE]
Lemma 5.2
Let and and There exists a universal constant (depending on only) such that for every multi index with and every one has
[TABLE]
In particular, taking and we have
[TABLE]
The proof is straightforward but technical so we leave it for Appendix B.
5.2 Regularization results
We deal here with functions and their derivatives on . So, we use a slightly different definition for multi-indexes. Here, for , a multi-index of length is given by and we set its length. For , we set . We allow the case by setting and, for , .
We recall that a super kernel is a function which belongs to the Schwartz space (infinitely differentiable functions which decrease in a polynomial way to infinity), and such that for every multi-index with one has
[TABLE]
For we define and for a function we denote , the symbol denoting convolution. For we define and to be some constants such that
[TABLE]
We give now a “regularization lemma” which is an improvement of Lemma 2.5 in [2].
Lemma 5.3
Let and There exists some constant depending on and only, such that for every every multi index with and every
[TABLE]
with defined in (5.17) and . Moreover, for every
[TABLE]
Proof. Using Taylor expansion of order ,
[TABLE]
with
[TABLE]
Using (5.20) we obtain and by a change of variable we get
[TABLE]
So that
[TABLE]
Using integration by parts formula (5.16) (with
[TABLE]
The upper bound from (5.19) (with gives
[TABLE]
And since
[TABLE]
we conclude that
[TABLE]
In order to prove (5.24), we write
[TABLE]
So the proof of (5.24) will be completed as soon as we check that and We write
[TABLE]
As a consequence, we get a regularization result involving functions which are just continuous and bounded.
Lemma 5.4
Let and There exists some constant depending on and only, such that for every , every and ,
[TABLE]
with defined in (5.17).
Proof. Let denote the density of the standard -dimensional normal law and for , set . We notice that , . Moreover, and , for every . So, we can apply (5.3) with and we obtain
[TABLE]
We now let tend to 0 and obtain (5.24).
5.3 Estimates of the Sobolev norms
Through this section we assume that verifies (that is (2.1) and and we estimates the Sobolev norms of and of We will give our estimates in terms of the norms defined in (2.2).
Proposition 5.5
Let and be given and let with Then
[TABLE]
Remark 5.6
(5.25) says in particular that if (recall that is a sum up to , see (2.2)) then the infinite series belongs to Let us compare this result with the corresponding one for functionals on the Wiener space. We take and to be standard normal distributed. Then is a multiple integral of order associated to the kernel which is constant on cubes and equal to the corresponding So where denotes the iterated stochastic integral and is the multiple stochastic integral. Note that and so So we have
[TABLE]
It is known that is time differentiable in in Malliavin sense if and only if the quantity in the right hand side is finite. And this is the same in our framework. But in our calculus we need estimates for a large and then This is why we give up in this paper the case of infinite series and we restrict ourself to finite sums.
Proof. Step 1. For simplicity of notation, we set here . For fixed , and we set as the set of the multi-indexes of length which do not contain the pair , the case giving the set made just by the null multi-index. Then, by observing that for every and , one has
[TABLE]
where if and for ,
[TABLE]
It can be easily checked that
[TABLE]
where, for ,
[TABLE]
and the above coefficients are
[TABLE]
We study . First,
[TABLE]
Moreover, for ,
[TABLE]
and similarly,
[TABLE]
We put all this together and we obtain
[TABLE]
Step 2. Starting from formula (5.26), we use Burkholder’s inequality (2.9) in order to obtain
[TABLE]
In order to treat we need the following auxiliary lemma:
Proposition 5.7
A. Let be random variables such that for every and is measurable. We fix and we consider the process
[TABLE]
For every and there exists a universal constant depending on and on only, such that
[TABLE]
with
[TABLE]
B. If
[TABLE]
then
[TABLE]
**Proof. ** In the following denotes a constant depending on and on only and which may change from a line to another.
Step 1. We will use the following facts. First, by the duality formula Moreover using the computational rules (see (5.12))
[TABLE]
It follows that
[TABLE]
It is easy to check that and a similar estimates holds for Moreover it is proved in Lemma 3.2 in [1] that there exists a universal constant such that so that
[TABLE]
Step 2. Let so that We have to check that
[TABLE]
Since is measurable and it follows that is a martingale. By (2.6)
[TABLE]
Since and are independent,
[TABLE]
From , we conclude that
[TABLE]
so the statement holds for .
Step 3. We estimate the derivatives of . We have
[TABLE]
where is -measurable and Notice that , and take values in (defined in (5.1)). So, by applying the step above, we get
[TABLE]
where
[TABLE]
If we prove that
[TABLE]
then we obtain
[TABLE]
And by iteration, we get (5.30) for every . So, let us prove (5.36).
We have . We analyze now First, . Let . Since if we obtain
[TABLE]
Recalling that and are independent and that , we can write
[TABLE]
By inserting all these estimates, we get (5.36). So A is proved. The proof of B is just identical so we skip it.
Proposition 5.8
For every and there exists a universal constant depending on and only such that
[TABLE]
where and is given in (5.31).
**Proof. **We prove this by recurrence on . The case is straightforward, so we suppose . We recall that, if then and we write
[TABLE]
where . Since we get (see (5.12))
[TABLE]
So we are in the framework of the previous lemma with and
[TABLE]
Notice that
[TABLE]
Then, using (5.33) (recall that and the recurrence hypothesis
[TABLE]
Moreover, by the estimates of the Sobolev norms given in (5.25), and the same computations as above
[TABLE]
Remark 5.9
By using Proposition 5.5 and 5.8, we give here an upper estimate of the -norm of the constant defined in (5.17). This will be very useful in the sequel. By using the Hölder inequality we easily get
[TABLE]
By applying the estimates (5.25) and (5.37) we obtain
[TABLE]
* denoting a constant depending on and the moment bound for a suitable and independent of the coefficients .*
5.4 Estimates of the covariance matrix
In this section we give estimates for the Malliavin covariance matrix of which we shortly denote by . We restrict ourself to the scalar case, so that and is just a scalar. We start from the formula of the Malliavin derivative of already discussed in the proof of Proposition 5.5, that is,
[TABLE]
where denotes the multi-indexes of length which do not contain the pair and where if and for ,
[TABLE]
The aim of this section is to prove the non-degeneracy estimate (5.44) in next Lemma 5.11. But we first need to study the conditional expectation of given the randomness from and .
Lemma 5.10
Assume . We denote by the conditional expectation with respect to Then
[TABLE]
where is given in Lemma 3.1 and for , we set and .
Proof. We set here . We recall that and we define (with and defined in (3.2))
[TABLE]
Then
[TABLE]
So, we have
[TABLE]
where
[TABLE]
One has
[TABLE]
This is because , so there is at least one and For the same reason, one has
[TABLE]
We recall that and we use (5.39) in order to we write
[TABLE]
denoting the multi-indexes of length which do not contain the pair . By (5.42) and (5.43), one has for every and and for every . Thus, is orthogonal (in ) to , so that
[TABLE]
Therefore,
[TABLE]
Now, we write
[TABLE]
For every there exists at most one such that so that
[TABLE]
By using (3.13),
[TABLE]
and the statement holds.
We can now prove the main result of this section.
Lemma 5.11
Assume . Let with . For every ,
[TABLE]
where a universal constant (the one in the Carbery Wright inequality) and is given in Lemma 3.1.
Remark 5.12
Sometimes is small and we would like to use instead, with We denote . Then for every there exists such that
[TABLE]
Indeed: we denote and we use the inequality
[TABLE]
in order to obtain
[TABLE]
Using Chebyshev’s inequality and Lemma 5.5, for every ,
[TABLE]
so the proof of (5.45) is completed.
Proof of Lemma 5.11. We will use the Carbery–Wright inequality that we recall here (see Theorem 8 in [9]). Let be a probability law on which is absolutely continuous with respect to the Lebesgue measure and has a log-concave density. There exists a universal constant such that for every polynomial of order and for every one has
[TABLE]
We will use this result in the following framework. We recall that the coefficients are null except a finite number of them. So we may find such that, if and then It follows that we may write (see 5.39))
[TABLE]
where is a polynomial of order with unknowns and coefficients depending on and Moreover we recall that is the conditional probability with respect to We denote by the law of under : this is a product of laws of the form so it is log-concave. So we are able to use (5.46). Using (5.41)
[TABLE]
We take now (to be chosen in a moment) and we use (5.46) in order to obtain
[TABLE]
The first term in the above inequality is estimated in Appendix A. In order to fit in the notation used there we denote and Then
[TABLE]
Now we apply Lemma A.1 with Recall that and we have the restriction
[TABLE]
We have and
[TABLE]
Then (A.2) gives
[TABLE]
Inserting this in (5.47) we obtain
[TABLE]
Now, is any constant satisfying the restriction (5.48). So, by letting , we finally obtain (5.44).
5.5 Proof of Theorem 3.3
The goal of this section is to give the proof of Theorem 3.3 so we use the notation from Section 3.
We take , , and we consider the sequence . Since as , we can find such that such that . And since , we get . We work with this value of and we write simply in place of . Moreover, in the following, stands for a constant which may vary from line to line and which depends on the parameters in the statements but not on the coefficients .
We define , so . We consider , to be chosen in the sequel, and we use the regularization Lemma 5.4 (see (5.24)) with the above choice of and . This gives
[TABLE]
the latter inequality following from (5.38). Moreover by (5.45) (therein, ), for every (recall that
[TABLE]
So,
[TABLE]
A similar estimate holds for We use now defined in (3.16). Since one has
[TABLE]
Putting this together, we get
[TABLE]
We optimize first on we take and we obtain (recall that ),
[TABLE]
It follows that
[TABLE]
We optimize now on we take , so that
[TABLE]
the latter inequality follows from and, since , . By inserting,
[TABLE]
Since
[TABLE]
We note that the above exponent is positive because . So, we choose and such that
[TABLE]
so that
[TABLE]
A similar estimate holds with replaced by . We then obtain
[TABLE]
The statement now follows by recalling that and, from (3.16),
Appendix A An iterated Hoeffding’s inequality
In this section we work with multi-indexes with and we look to
[TABLE]
where , , denote independent Bernoulli random variables and . We denote
[TABLE]
Lemma A.1
Let . If
[TABLE]
then
[TABLE]
Proof. We proceed by recurrence on If we have
[TABLE]
the latter inequality following from (A.1). And by Hoeffding’s inequality
[TABLE]
Since
[TABLE]
(A.2) follows for . We suppose now that (A.2) holds for and we prove it for For with we define and we write
[TABLE]
Then
[TABLE]
We estimate first We write
[TABLE]
Notice that
[TABLE]
and
[TABLE]
We also have
[TABLE]
so we can use the recurrence hypothesis and we get
[TABLE]
We estimate now We use Corollary 1.4 pg 1654 in Bentkus [7] which asserts the following: if is a martingale such that almost surely, then, for every
[TABLE]
Since we have
[TABLE]
Notice that so that
[TABLE]
So, using (A.4)
[TABLE]
This, together with (A.3), gives (A.2).
Appendix B Norms
The aim of this section is to prove Lemma 5.2. For We work with the norms
[TABLE]
To begin we give several easy computational rules:
[TABLE]
Now, for we consider the Malliavin covariance matrix and, if we denote We write
[TABLE]
where is the algebraic complement . Then, using (B.1)
[TABLE]
By (B.1) and (B.2), and Then, using (B.3)
[TABLE]
so that
[TABLE]
We denote
[TABLE]
and
[TABLE]
We also recall that for we consider a function such that and Then we take
Lemma B.1
A. For every there exists a universal constant (depending on and such that, for such that
[TABLE]
B. For every
[TABLE]
Proof A. We first prove (B.7) for . We have
[TABLE]
Using (B.1)
[TABLE]
For , we use recurrence and we obtain
[TABLE]
Then, using (B.1) first and (B.4) secondly, (B.7) follows.
B. Let For every one has Moreover one has So (B.7) implies (B.8).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bally V., Caramellino L.: Asymptotic development for the CLT in total variation distance. Bernoulli , 22, 2442-2485 22.(2016).
- 2[2] Bally V., Caramellino L.: On the distances between probability density functions. Electronic Journal of Probability , 19 , no. 110, 1-33 (2014).
- 3[3] Bally V., Caramellino L.: An Invariance principle for Stochastic Series II. Non Gaussian limits. preprint ar Xiv 1607.04544 (2016) .
- 4[4] Bally V., Caramellino L., Poly G.: Convergence in distribution norms in the CLT for non identical distributed random variables. Preprint ar Xiv:1606.01629, (2016).
- 5[5] Bally V., Ray C.: Approximation of Markov semigroups in total variation distance. Electronic J. of Probab. 21, no 12.(2016).
- 6[6] Bakry D., Gentil I., Ledoux M.: Analysis and Geometry of Markov Diffusion Semigroups . Springer (2014)
- 7[7] Bentkus V.: On Hoeffding’s inequalities. Ann. Probab. 32 , 1650–1673 (2004)
- 8[8] Bogachev V.I., Kosov V.I., Zelenov G.I.: Fractional smoothness of distributions of polynomials and fractional analog of the Hardy-Landau-Littelwod inequality. ar Xiv:1602.05207 v 2
