Concentration inequalities for bounded functionals via generalized log-Sobolev inequalities
Friedrich G\"otze, Holger Sambale, Arthur Sinulis

TL;DR
This paper establishes multilevel concentration inequalities for bounded functionals of random variables, extending to dependent cases and providing applications in empirical processes, chaos, U-statistics, and random graphs.
Contribution
It introduces new concentration inequalities based on generalized log-Sobolev inequalities, applicable to both independent and dependent variables, with explicit constants involving higher order differences.
Findings
Derived tail bounds for empirical processes and chaos.
Provided concentration inequalities for U-statistics with bounded kernels.
Extended results to dependent variables and random graph models.
Abstract
In this paper we prove multilevel concentration inequalities for bounded functionals of random variables that are either independent or satisfy certain logarithmic Sobolev inequalities. The constants in the tail estimates depend on the operator norms of -tensors of higher order differences of . We provide applications in both dependent and independent random variables. This includes deviation inequalities for empirical processes and suprema of homogeneous chaos in bounded random variables in the Banach space case given by . The latter application is comparable to earlier results of Boucheron-Bousquet-Lugosi-Massart and provides the upper tail bounds of Talagrand. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
∎
11institutetext: Friedrich Götze 22institutetext: Holger Sambale 33institutetext: Arthur Sinulis 44institutetext: Fakultät für Mathematik
Universität Bielefeld
Postfach 10 01 31
33501 Bielefeld
Germany
44email: [email protected]
44email: [email protected]
44email: [email protected]
Concentration inequalities for bounded functionals via log-Sobolev-type inequalities††thanks: This research was supported by the German Research Foundation (DFG) via CRC 1283 “Taming uncertainty and profiting from randomness and low regularity in analysis, stochastics and their applications”.
Friedrich Götze
Holger Sambale
Arthur Sinulis∗
(Received: date / Accepted: date)
Abstract
In this paper we prove multilevel concentration inequalities for bounded functionals of random variables that are either independent or satisfy certain logarithmic Sobolev inequalities. The constants in the tail estimates depend on the operator norms of -tensors of higher order differences of .
We provide applications for both dependent and independent random variables. This includes deviation inequalities for empirical processes and suprema of homogeneous chaos in bounded random variables in the Banach space case . The latter application is comparable to earlier results of Boucheron–Bousquet–Lugosi–Massart and provides the upper tail bounds of Talagrand. In the case of Rademacher random variables, we give an interpretation of the results in terms of quantities familiar in Boolean analysis. Further applications are concentration inequalities for -statistics with bounded kernels and for the number of triangles in an exponential random graph model.
MSC:
60E15 05C80
1 Introduction
During the last forty years, the concentration of measure phenomenon has become an established part of probability theory with applications in numerous fields, as is witnessed by the monographs MS86 ; Led01 ; BLM13 ; RS14 ; vH16 . One way to prove concentration of measure is by using functional inequalities, more specifically the entropy method. It has emerged as a way to prove several groundbreaking concentration inequalities in product spaces by Talagrand Tal91 ; Tal96a , mainly in the works Led97 and BL97 , and further developed in Ma00 .
To convey the idea, let us recall that the logarithmic Sobolev inequality for the standard Gaussian measure in (see Gr75 ) states that for any we have
[TABLE]
where is the entropy functional. Informally, it bounds the disorder of a function (under ) by its average local fluctuations, measured in terms of the length of the gradient. It is by now standard that (1) implies subgaussian tail decay for Lipschitz functions (e. g. by means of the Herbst argument). In particular, if is a function such that a.s., we have for any .
If is a probability measure on a discrete set (or a more abstract set not allowing for an immediate replacement for ), then there are several ways to reformulate equation (1), see e. g. DSC96 or BT06 . We continue these ideas by working in the framework of difference operators. Given a probability space , we call any operator satisfying for all , a difference operator. Accordingly, we say that satisfies a , if for all bounded measurable functions we have
[TABLE]
Apart from the domain of , it is clear that (2) can be seen as generalization of (1) by defining on .
Another route to obtain concentration inequalities is to modify the entropy method, which was done in the framework of so-called -entropies. The idea is to replace the function in the definition of the entropy by other functions . This has been studied in LO00 ; BLM03 ; Cha04 . In the seminal work BBLM05 the authors proved inequalities for -entropies for power functions , leading to moment inequalities for independent random variables.
Originally, the entropy method was primarily used to prove sub-Gaussian concentration inequalities for Lipschitz-type functions. However, there are many situations of interest in which the functions under consideration are not Lipschitz or have Lipschitz constants which grow as the dimension increases even after a renormalization which asymptotically stabilizes the variance. Among the simplest examples are polynomial-type functions. Here, the boundedness of the gradient typically has to be replaced by more elaborate conditions on higher order derivatives (up to some order ). Moreover, we cannot have subgaussian tail decay anymore. This is already obvious if we consider the product of two independent standard normal random variables, which leads to subexponential tails. We refer to this topic as higher order concentration.
The earliest higher order concentration results date back to the late 1960s. Already in Bo68 ; Bo70 and Ne73 , the growth of norms and hypercontractive estimates of polynomial-type functions in Rademacher or Gaussian random variables respectively have been studied. The question of estimating the growth of norms of multilinear polynomials in Gaussian random variables was considered in Bor84 , AG93 and La06 . In the context of Erdös–Rényi graphs and the triangle problem, concentration inequalities for polynomials functions gained considerable attention, in papers such as KV00 .
More recently, multilevel concentration inequalities have been proven in Ad06 ; Wo13 ; AW15 for many classes of functions. These included -statistics in independent random variables, functions of random vectors satisfying Sobolev-type inequalities and polynomials in sub-Gaussian random variables respectively. We refer to inequalities of the type
[TABLE]
as multilevel or higher order (-th order) concentration inequalities. This means that the tails might have different decay properties in some regimes of . Usually, we have for some constant which typically depends on the -th order derivatives.
To convey the basic idea of multilevel concentration inequalities, let us once again consider the case , e. g. a quadratic form of independent, say, Gaussian random variables. As sketched above, in this case the tails decay subexponentially in general. By means of a multilevel concentration inequality (the so-called Hanson–Wright inequality, which we address in more detail at a later point), we can show that while for large, subexponential tail decay holds, for small we even get subgaussian decay. In this sense, multilevel concentration inequalities provide refined tail estimates which do not only cover the behavior for large .
Our own work started with a second order concentration inequality on the sphere in BCG17 and was continued in BGS18 for bounded functionals of various classes of random variables (e. g. independent random variables or in presence of a logarithmic Sobolev inequality (1)), and in GSS18 for weakly dependent random variables (e. g. the Ising model). In these papers, we studied higher order concentration, arriving at multi-level tail inequalities of type (3). If the underlying measure satisfies a logarithmic Sobolev inequality, (BGS18, , Corollary 1.11) yields with for and , where denotes the operator norm of the respective tensors of -th order partial derivatives. A downside in both BGS18 and GSS18 is that for functions of independent or weakly dependent random variables, comparable estimates involve Hilbert–Schmidt instead of operator norms, leading to weaker estimates in general.
A central aspect of the present article is to fix this drawback by a slightly more elaborate approach. Here, we consider both independent and dependent random variables. In either case, we prove multilevel concentration inequalities of the same type, and apply them to different forms of functionals. We provide improvements of earlier higher order concentration results like (BGS18, , Theorem 1.1) or (GSS18, , Theorem 1.5), replacing the Hilbert–Schmidt norms appearing therein by operator norms. This leads to sharper bounds and a wider range of applicability.
A special emphasis is placed on providing uniform versions of the higher order concentration inequalities. By this, we mean that we consider functionals of supremum type , which includes suprema of polynomial chaoses, or empirical processes. Two more applications are given by -statistics in independent and weakly dependent random variables as well as a triangle counting statistic in some models of random graphs, for which we prove concentration inequalities.
Notations. Throughout this note, is a random vector taking values in some product space (equipped with the product -algebra) with law , defined on a probability space . By abuse of language, we say that satisfies a , if its distribution does. In any finite-dimensional vector space, we let be the Euclidean norm, and for brevity, we write for any . Given a vector we write . To any -tensor we define the Hilbert–Schmidt norm and the operator norm
[TABLE]
using the outer product . For brevity, for any random -tensor and any we abbreviate as well as . Lastly, we ignore any measurability issues that may arise. Thus, we assume that all the suprema used in this work are either countable or defined as .
1.1 Main results
To formulate our main results, we introduce a difference operator labeled which is frequently used in the method of bounded differences. Let be an independent copy of , defined on the same probability space. Given , define for each
[TABLE]
and
[TABLE]
where denotes the -norm with respect to . The difference operator is given as the Euclidean norm of the vector .
We shall also need higher order versions of , denoted by . They can be thought of as analogues of the -tensors of all partial derivatives of order in an abstract setting. To define the -tensor , we specify it on its “coordinates”. That is, given distinct indices , we set
[TABLE]
where exchanges the random variables by , and denotes the -norm with respect to the random variables and . For instance, for ,
[TABLE]
Using the definition (4), we define tensors of -th order differences as follows:
[TABLE]
Whenever no confusion is possible, we omit writing the random vector , i. e. we freely write instead of and instead of .
Our first main theorem is a concentration inequality for general, bounded functionals of independent random variables .
Theorem 1.1
Let be a random vector with independent components, a measurable function satisfying , and define . We have for any
[TABLE]
For the sake of illustration, let us consider the case of . Assuming that satisfy , and a.s., let be the quadratic form . Here, for all , and is the symmetric matrix with zero diagonal and entries if . In this case, it is easy to see that and , where is the matrix given by . As a result,
[TABLE]
This is a version of the famous Hanson–Wright inequality. For the various forms of the Hanson–Wright inequality we refer to HW71 ; W73 ; HKZ12 ; RV13 ; VW15 ; Ad15 ; ALM18 .
Note that by a modification of our proofs (using arguments especially adapted to polynomials), it is possible to replace by , thus avoiding the drawback of switching to a matrix with a possibly larger operator norm. See Section 2.1 and 2.4 for details. On the other hand, Theorem 1.1 allows for any function , not just quadratic forms, and the case of can in this sense be considered as generalization of the Hanson–Wright inequality.
For a certain class of weakly dependent random variables , we can prove similar estimates as in Theorem 1.1. To this end, we introduce another difference operator, which is more familiar in the context of logarithmic Sobolev inequalities for Markov chains as developed in DSC96 . Assume that for some finite sets , equipped with a probability measure and let denote the conditional measure (interpreted as a measure on ) and the marginal on . Finally, set
[TABLE]
This difference operator appears naturally in the Dirichlet form associated to the Glauber dynamic of , given by
[TABLE]
In the next theorem, we require a –LSI for the underlying random variables . A number of models which satisfy this assumption will be discussed below.
Theorem 1.2
Let be a random vector satisfying a and a measurable function with . With the constant we have for any
[TABLE]
Again, if , assuming that , , a.s. and if , we arrive at a Hanson–Wright type inequality, this time including dependent situations. Similar results still hold if we remove the uncorrelatedness condition.
Let us discuss the –LSI condition in more detail. First, any collection of random independent variables with finitely many values satisfies a with depending on the minimal non-zero probability of the (cf. Proposition 6). In this situation, Theorem 1.1 and Theorem 1.2 only differ by constants.
However, the –LSI conditions also gives rise to numerous models of dependent random variables as in (GSS18, , Proposition 1.1) (the Ising model) or (SS18, , Theorem 3.1) (various different models). Let us recall some of them. The Ising model is the probability measure on defined by normalizing for a symmetric matrix with zero diagonal and some . In (GSS18, , Proposition 1.1), we have shown that if and , the Ising model satisfies a with depending on and only. For the special case of and for all , we obtain the Curie–Weiss model. Here, the two conditions required above reduce to .
Another simple model in which a –LSI holds is the random coloring model. If is a finite graph and is a set of colors, we denote by the set of all proper coloring, i. e. the set of all such that . In (SS18, , Theorem 3.1), we have shown that the uniform distribution on satisfies a –LSI if the maximum degree is uniformly bounded and (strictly speaking, we consider sequences of graphs here). In (SS18, , Theorem 3.1), we moreover prove –LSIs for the (vertex-weighted) exponential random graph model and the hard-core model. We will further discuss the exponential random graph model in Section 2.4.
The common feature in all these models is that the dependencies which appear can be controlled (e. g. by means of a coupling matrix which measures the interactions between the particles of the system under consideration, cf. (GSS18, , Theorem 4.2)) in such a way that the model is not “too far” from a product measure. For instance, in the Curie–Weiss model, this just translates to .
As a final remark, we discuss the LSI property with respect to various difference operators in Section 5. In particular, we show that the restriction to finite spaces which is implicit in Theorem 1.2 is natural since the property requires the underlying space to be finite. By contrast, we prove that any set of independent random variables satisfies an –LSI. However, it seems that it is not possible to use the entropy method based on –LSIs.
The upper bound in Theorem 1.2 admits a “uniform version”, i. e. we can prove deviation inequalities for suprema of functions, in the following sense. Let be a family of uniformly bounded, real-valued, measurable functions and set
[TABLE]
For any and let .
Theorem 1.3
Assume that either are independent or satisfies a and let be as in (7). With the same constant as in Theorem 1.2 or 1.1 respectively, we have for any the deviation inequality
[TABLE]
As mentioned before, Theorem 1.3 yields bounds for the upper tail only. The background is that the entropy method has certain limitations when it is applied to suprema of functions, cf. also Proposition 1 or Theorem 2.1 below. Roughly sketched, the reason is that when evaluating difference operators of suprema, if a positive part is involved we may typically choose a coordinate-independent maximizer of the terms involved. Without a positive part, this is no longer possible. See in particular the proof of Theorem 2.1, where we provide some further details.
Functionals of the form (7) have been considered in various works, starting from the first results in (Tal96a, , Theorem 1.4), and continued in (Rio02, , Théorème 1.1), (Ma00, , Theorem 3) and (Bo02, , Theorem 2.3) in the special case of
[TABLE]
Further research has been done in KR05 , (Sam07, , Section 3) and more recently (Mar18, , Proposition 5.4). In these works, Bennett-type inequalities have been proven for general independent random variables. Furthermore, (BBLM05, , Theorem 10) treats the case for Rademacher random variables and a compact set of vectors . As a byproduct of our method, we prove a deviation inequality for which can be regarded as a uniform bounded differences inequality.
Proposition 1
Assume that satisfies a , let be as in (8), and let be such that . For any we have
[TABLE]
Let us put Proposition 1 into context. In the above mentioned works, the authors derive Bennett-type inequalities for independent random variables , whereas in our case the concentration inequalities have sub-Gaussian tails. It might be compared to the sub-Gaussian tail estimates for Bernoulli processes, see e. g. (Tal14, , Theorem 5.3.2). However, the property is both more and less general. On the one hand, it is possible to include possibly dependent random vectors, but on the other hand for independent random variables it is only applicable if the take finitely many values.
1.2 Outline
In Section 2, we present a number of applications and refinements of our main results. Section 3 contains the proofs of our main theorems. The proofs of the results from Section 2 is deferred to Section 4. We close out the paper by discussing different forms of logarithmic Sobolev inequalities with respect to various difference operators in the last Section 5.
2 Applications
In the sequel, we consider various situations in which our results can be applied. Some of them can be regarded as sharpenings of our main theorems for functions which have a special structure.
2.1 Uniform bounds
If the functions under consideration are of polynomial type, we may somewhat refine the results from the previous section. Here we focus on uniform bounds as discussed in Theorem 1.3.
Let denote the family of subsets of with elements, fix a Banach space with its dual space , a compact subset and let be the -ball in with respect to . Let be a random vector with support in for some real numbers and define
[TABLE]
where . For any we let
[TABLE]
where for we use the convention and .
One can interpret the quantities as follows: If is the corresponding polynomial in variables, and is the -tensor of all partial derivatives of order , then . In this sense, we are considering the same quantities as in Theorem 1.3 but replace the difference operator by formal derivatives of the polynomial under consideration.
Furthermore, the concentration inequalities are phrased with the help of the quantities
[TABLE]
Clearly holds for all .
Concentration properties for functionals as in (9) have been studied for independent Rademacher variables (i. e. ) and in (BBLM05, , Theorem 14) for all , and under certain technical assumptions in Ad15 . We prove deviation inequalities in the weakly dependent setting, and afterwards discuss how these compare to the particular result in BBLM05 . It is easily possible to derive a similar result for functions of independent random variables (in the spirit of Theorem 1.1). As the corresponding proof is easily done by generalizing the proof of (BBLM05, , Theorem 14), we omit it.
Theorem 2.1
Let be a random vector in with support in satisfying a . For as in (9) and all we have
[TABLE]
Consequently, for any
[TABLE]
and the same concentration inequalities hold with replaced by .
Note that independent Rademacher random variables satisfy a (see e. g. (Gr75, , Theorem 3) or (DSC96, , Example 3.1)). Therefore, we get back (BBLM05, , Theorem 14) from Theorem 2.1 (with slightly different constants). However, Theorem 2.1 moreover includes many models with dependencies like those discussed in the introduction. Therefore, it may be considered as a extension of (BBLM05, , Theorem 14) to dependent situations and moreover to coefficients from any Banach space . For instance, we may consider an Ising chaos as a natural generalization of a Rademacher chaos to a dependent situation. In this case, Theorem 2.1 yields that that we still obtain basically the same concentration properties if the dependencies are sufficiently weak (which is guaranteed by the conditions outlined in the introduction).
To illustrate our results further, let us consider the case of separately. Here we write
[TABLE]
The following corollary follows directly from Theorem 2.1.
Corollary 1
Assume that satisfies a and is supported in and let be as in (9) with . We have for all
[TABLE]
For the case of independent Rademacher variables, this recovers the upper tail in a famous result by Talagrand (Tal96a, , Theorem 1.2) on concentration properties of quadratic forms in Banach spaces, which has also been done in BBLM05 . Note that for , we have
[TABLE]
where is the symmetric matrix with zero diagonal and entries if . If consists of a single element only, we have . Hence, Corollary 1 can be regarded as a generalized Hanson–Wright inequality.
2.2 The Boolean hypercube
The case of independent Rademacher random variables above can be interpreted in terms of quantities from Boolean analysis. Recall that any function can be decomposed using the orthonormal Fourier–Walsh basis given by for . More precisely, we have
[TABLE]
where the are given by and are called the Fourier coefficients of . For any we define the Fourier weight of order as . It is clear that . The following multilevel concentration inequality can now be easily deduced.
Proposition 2
Let be independent Rademacher random variables and let be a function given in the Fourier–Walsh basis as for some . For any we have
[TABLE]
In other words, the event holds with probability at least .
The literature on Boolean functions is vast, and a modern overview is given in OD14 . Especially for concentration results we may highlight (AW15, , Theorem 1.4) (which in particular holds for Boolean functions), which we discuss further and partially generalize to dependent models in Section 2.4. Proposition 2 may be of interest due to the direct use of quantities from Fourier analysis. Finally, we should add that while many concentration results for Boolean functions like (AW15, , Theorem 1.4) or also Proposition 2 are valid for functions whose Fourier–Walsh decomposition stops at some order , Theorem 1.1 or Theorem 1.2 work for functions with Fourier–Walsh decomposition possibly up to order .
2.3 Concentration properties of -statistics
Another application of Theorems 1.1 and 1.2 are concentration properties of so-called -statistics which frequently arise in statistical theory. We refer to PG99 for an excellent monograph. More recently, concentration inequalities for -statistics have been considered in Ad06 , (AW15, , Section 3.1.2) and (BGS18, , Corollary 1.3).
Let and assume that are either independent random variables, or the vector satisfies a . Let be a measurable, symmetric function with for any , and define . We are interested in the concentration properties of the -statistic with kernel , i. e. of
[TABLE]
Proposition 3
Let be as above and be as in (14). There exists a constant (the same as in Theorems 1.1 and 1.2) such that for any
[TABLE]
and for some
[TABLE]
The normalization in (15) is of the right order for -statistics generated by a non-degenerate kernel , i. e. , see (PG99, , Remarks 4.2.5). In the case of i.i.d. random variables it states that
[TABLE]
whenever . Actually, (15) shows that for we have sub-Gaussian tails for any finite for bounded kernels .
Proposition 3 improves upon our old result (BGS18, , Corollary 1.3) by providing multilevel tail bounds, thus yielding much finer estimates than the exponential moment bound given in the earlier paper. Moreover, it does not only address independent random variables but also weakly dependent models. As compared to the results from Ad06 and (AW15, , Section 3.1.2), Proposition 3 covers different types of measures, since in Ad06 independent random variables were considered, while in AW15 a Sobolev-type inequality was required, which does not include the various discrete models for which a –LSI holds.
2.4 Polynomials and subgraph counts in exponential random graph models
Lastly, let us once again consider polynomial functions. The case of independent random variables has been treated in (AW15, , Theorem 1.4) under more general conditions, so we omit it and concentrate on weakly dependent random variables.
Let be a multilinear (also called tetrahedral) polynomial of degree , i. e. of the form
[TABLE]
for symmetric -tensors with vanishing diagonal. Here, a -tensor is called symmetric, if for any permutation , and the (generalized) diagonal is defined as . Denote by the -tensor of all partial derivatives of order of .
For the next result, given some , we recall a family of norms on the space of -tensors for each partition of . The family has been first introduced in La06 , where it was used to prove two-sided estimates for norms of Gaussian chaos, and the definitions given below agree with the ones from La06 as well as AW15 and AKPS18 . For brevity, write for the set of all partitions of . For each we denote by a vector in , and for a -tensor set
[TABLE]
We can regard the as a family of operator-type norms. In particular, it is easy to see that and .
The following result has been proven in the context of Ising models (in the Dobrushin uniqueness regime) in AKPS18 , and can easily be extended to any vector satisfying a . By invoking the family of norms , it provides a refinement of our general result for the special case of multilinear polynomials.
Theorem 2.2
Let be a random vector supported in and satisfying a , and be as in (16). There exists a constant depending on only such that for all
[TABLE]
For illustration, let us once again consider the case of . In the notation of (16), we take and , i. e. for a symmetric matrix with vanishing diagonal. In this case, assuming the components of to be centered (so the the term vanishes), Theorem 2.2 reads
[TABLE]
i. e. we obtain a Hanson–Wright inequality in this situation. For higher orders, we arrive at similar bounds. Altogether, for the class of multilinear polynomials, Theorem 2.2 yields finer bounds than Theorem 1.2 (by virtue of the large class of norms involved), though for explicit calculations of the norms involved can be difficult.
To point out one possible application, Theorem 2.2 can be used in the context of the exponential random graph model (ERGM). Let us briefly recall the definitions. Given real numbers and simple graphs (with being a single edge by convention), the ERGM with parameter is a probability measure on the space of all graphs on vertices given by the weight function , where is the number of copies of in the graph and is the number of vertices of . For details, see CD13 or SS18 . One can think of the ERGM as an extension of the famous Erdös–Rényi model (which corresponds to the choice ) to account for dependencies between the edges.
By way of example we show concentration properties of the number of triangles (where denotes the set of all three edges forming a triangle). To formulate our results, we need to recall the function which frequently appears in the discussion of the ERGM. Moreover, we set . In the following corollary, the condition ensures weak dependence in the sense that a –LSI holds. As outlined above, in comparison to earlier results like (SS18, , Theorem 3.2), using Theorem 2.2 yields sharper tail estimates.
Corollary 2
Let be an exponential random graph model with parameter such that . There is a constant such that for all
[TABLE]
3 Concentration inequalities under logarithmic Sobolev inequalities: Proofs
In this section, we give the proofs of our main results. All of them work by first establishing a growth rate on the norms of which will then be iterated. For technical reasons, we need to introduce some auxiliary difference operators which are closely related to . For let
[TABLE]
[TABLE]
where shall denote the norm with respect to .
The norm inequalities which form the core of our proofs can be found in (BGS18, , Theorem 2.3, Corollary 2.6) (building upon the earlier results in BBLM05 ). Note that as compared to BGS18 , a different choice of normalization for leads to slightly different constants.
Theorem 3.1
If are independent random variables and , with the constant , we have for any ,
[TABLE]
Consequently, this leads to
[TABLE]
Furthermore, we need an auxiliary statement relating differences of consecutive order. In BGS18 , we have proven that . Moreover, we explained that a similar estimate with the Hilbert–Schmidt replaced by operator norms cannot be true. As we will see next, the key step in order to be able to invoke operator norms nevertheless is to work with .
Here we need the following simple but crucial observation: if is a -tensor, the supremum in the definition of is attained, and if is a non-negative tensor (i. e. for all ), the maximizing vectors can be chosen to have all positive entries. Indeed, since , we can define by taking the absolute value element-wise.
Lemma 1
For any
[TABLE]
Proof
We have
[TABLE]
Here, in the first inequality we insert the vectors maximizing the supremum and use the monotonicity of , and the second and third inequality follow from the triangle inequality. Taking the square root yields the claim.
As a final step, we need to establish a connection between norm estimates and multilevel concentration inequalities. This is given by the following proposition, which was proven in (Ad06, , Theorem 7) and (AW15, , Theorem 3.3). We state it in the form given in (SS18, , Proof of Theorem 3.6) with slight modifications.
Proposition 4
Assume that a random variable satisfies for any and some constants for some , and let . For any we have
[TABLE]
We will not give a proof of Proposition 4 and refer to the aforementioned works. However, the proof is almost identical to the proof of Proposition 2. The two important cases will be (for independent random variables) as well as (in the weakly dependent setting).
The proof of Theorem 1.1 is now easily completed.
Proof (Proof of Theorem 1.1)
Since are independent, Theorem 3.1 yields
[TABLE]
where we have used that for any positive random variable
[TABLE]
The second term on the right hand side can now be estimated using Theorem 3.1 again, which in combination with Lemma 1 gives
[TABLE]
This can be easily iterated to obtain for any
[TABLE]
Now it remains to apply Proposition 4.
To prove Theorem 1.2, we shall require the following proposition, which is proven in (GSS18, , Proposition 2.4). (Note that the definition of there differed by a factor of .) The estimate (20) does not appear therein, but is an easy modification of the proof.
Proposition 5
Let be a measure on a product of Polish spaces satisfying a . Then, for any and any we have
[TABLE]
and
[TABLE]
Proof (Proof of Theorem 1.2)
The proof is very similar to the proof of Theorem 1.1. In the first step, using (19) leads to
[TABLE]
Equation (20) can be used to estimate the second term on the right hand side. So, for any we have by an iteration
[TABLE]
Again we can apply Proposition 4 to obtain the concentration inequality.
To prove Theorem 1.3 we shall need the following lemma.
Lemma 2
Let be a Banach space and a family of uniformly norm-bounded, -valued, measurable functions and set . We have
[TABLE]
Proof
Fix an and choose for any a function such that . This yields
[TABLE]
where the first inequality follows by monotonicity of and the second one is a consequence of for . Thus we have
[TABLE]
Taking the limit yields the claim.
Proof (Proof of Theorem 1.3)
Note that in the real-valued case, the estimate holds. For brevity, let . Using this in combination with Proposition 5 and Lemma 2 yields
[TABLE]
We can apply Proposition 5 again on the right hand side, which gives
[TABLE]
A combination of Lemmas 1 and 2 shows that , and so by an iteration we obtain
[TABLE]
In the case of independent random variables we replace the first step using Theorem 3.1. Here, and .
Proof (Proof of Proposition 1)
The proof shares some similarities with the proof of Lemma 2. Since satisfies a , we have for any
[TABLE]
Moreover, for any and , if a maximizer of exists, we obtain
[TABLE]
If a maximizer does not exist, these estimates remain valid by an approximation argument as in the proof of Lemma 2. Consequently, we have The claim now follows from Proposition 4.
4 Suprema of chaos, U-statistics and polynomials: Proofs
Proof (Proof of Theorem 2.1)
Let us first consider the case that satisfies a . Recall that we have by (20)
[TABLE]
We shall make use of the pointwise inequality To see this, let be the tuple satisfying . We have
[TABLE]
proving the first part. Consequently,
[TABLE]
As in BBLM05 , this can now be iterated, i. e. we have for any . Here we may argue as above, where the only difference is to choose and which maximize . This finally leads to
[TABLE]
using that is constant. This proves (11). The same arguments are also valid without a property, if one considers and applies Theorem 3.1 instead.
Lastly, to prove (12), let us first consider why we cannot argue as before. Note that the argument heavily relies on the positive part of the difference operator , which allows us to choose the maximizers independent of . This is no longer possible for the concentration inequality. Here, Theorem 3.1 yields
[TABLE]
Thus this argument fails if we try to use these inequalities. However, we can rewrite , where the is to be understood with respect to the support of . As a consequence, we have for each fixed (again choosing by maximizing the first summand in the brackets)
[TABLE]
This implies
[TABLE]
The proof is now completed as using the same arguments as in the first part, with replaced by . The same argument is valid for satisfying a .
Proof (Proof of Proposition 2)
The proposition can be proven using a similar technique as before, since the Hilbert–Schmidt norms of higher order difference act as Fourier projections. We choose to take an alternate route as follows. The proof of (OD14, , Theorem 9.21) shows that for any with degree at most and any
[TABLE]
First off, by Chebyshev’s inequality we have for any
[TABLE]
We want to apply this to a -dependent parameter given by the function
[TABLE]
If , (21) yields , which combined with the trivial estimate gives
[TABLE]
as claimed.
Proof (Proof of Proposition 3)
We apply Theorems 1.1 and 1.2 in the respective cases. To this end, we make use of the general bound for . For any distinct write , so that
[TABLE]
Now it is easy to see that unless (for example, this follows if one writes the sum inside the norm as ), and in these cases one can upper bound the supremum by , from which we infer
[TABLE]
Consequently, this leads to
[TABLE]
Thus, an application of Theorem 1.1 or 1.2 respectively yields for any and for as given therein
[TABLE]
For the second part, choose for to obtain
[TABLE]
A short calculation shows that the minimum is attained for in the range and for otherwise, i. e.
[TABLE]
Proof (Proof of Theorem 2.2)
We give a sketch of the proof only and refer to (AKPS18, , Proof of Theorem 2.2) for details. Recall that by (19) we have the inequality
[TABLE]
Using the arguments and notations from (AKPS18, , Proof of Theorem 2.2) leads to
[TABLE]
where is an absolute constant and is a sequence of independent standard Gaussian random variables, independent of . Furthermore, a result by Latała La06 yields
[TABLE]
The rest now follows as in the previous proofs.
Proof (Proof of Corollary 2)
In SS18 the authors have proven that implies a for with a constant depending on the parameter only. Thus, it remains to bound the norms in (17). Note that due to the structure of the exponential random graph model, the expectations of and are equal whenever and are isomorphic. Thus, we define (where is a -star) and .
The Euclidean norms can be easily bounded:
[TABLE]
and it remains to estimate the three remaining norms. However, in (AW15, , Section 5.1), the authors given estimates for such norms in the Erdös–Rényi case, and it is easy to adapt these to any model with the property that depends only on the isomorphism class of (in the complete graph). Especially, due to the structure of the exponential random graph models, this is true in this setting as well. This gives
[TABLE]
Inserting these estimates into (17) finishes the proof.
5 Logarithmic Sobolev inequalities and difference operators
To conclude this paper, we discuss the LSI property (2) for different choices of difference operators . Here, we always assume that the probability measure is defined on a product of Polish spaces with product Borel -algebra .
In this situation, we can make use of the disintegration theorem on Polish spaces (see (DM78, , Chapter III) and (AGS08, , Theorem 5.3.1)): If is a measure on , then for each we can decompose using the marginal measure (as a measure on ) and a conditional measure on , which we denote by . More precisely, for any we have \mu(A)=\int_{\otimes_{j\neq i}\mathcal{X}_{i}}\int_{\mathcal{X}_{i}}\text{\mathbbm{1}}_{A}(x_{i^{c}},x_{i})d\mu(x_{i}\mid x_{i^{c}})d\mu_{i^{c}}(x_{i^{c}}).
For finite spaces, is just the ordinary conditional measure as used in the definition of the difference operator . Note that the definition of can in principle be rewritten for products of arbitrary Polish spaces. However, our first result shows that the -LSI property in fact requires the underlying space to be finite. More precisely, we say that has finite support if there is no sequence of sets with for any and .
Proposition 6
Let be a product of Polish spaces, and let be a probability measure on . If satisfies a -LSI, then has finite support. Moreover, if is a product probability measure, then satisfies a -LSI iff has finite support.
Proof
First assume does not have finite support, i. e. there is a sequence with . Choosing f_{n}\coloneqq\text{\mathbbm{1}}_{A_{n}}\in L^{\infty}(\mu) and assuming a -LSI holds, we obtain
[TABLE]
This easily leads to a contradiction.
On the other hand, let be a product probability measure with finite support. By tensorization, it suffices to consider , and we may moreover assume to have finitely many elements only. Then, by (BT06, , Remark 6.6), satisfies a -LSI with , which finishes the proof.
In fact, Proposition 6 can be adapted to the difference operator as well. To see this, note that that (23) can easily be rewritten for the difference operator (with only minor changes) and . In particular, the - and -LSI properties are not essentially different.
The situation drastically changes if we consider -LSIs instead. Here, a sufficient condition for the property to hold is that the measure satisfies an approximate tensorization (AT) property. As a consequence, for product probability measures, satisfying an -LSI is in fact a universal property.
Theorem 5.1
Let be a product of Polish spaces, and let be a probability measure on . If satisfies an approximate tensorization property
[TABLE]
then also satisfies an . In particular, any product probability measure satisfies an .
To the best of our knowledge, Theorem 5.1 is new. For product measures, it might be compared to the Efron–Stein inequality (see e. g. ES81 ; St86 ) which establishes the tensorization property for the variance, and can be regarded as a universal Poincaré inequality with respect to (see e. g. BGS18 for such an interpretation). However, note that Theorem 5.1 (i. e. more precisely the for product measures) does not imply the Efron–Stein inequality, as the difference operator is instead of . Unfortunately, as Proposition 6 demonstrates, there is no “entropy version” of the Efron–Stein inequality of the form (for any product probability measure and some universal constant ).
As by Theorem 5.1, any set of independent random variables satisfies an -LSI, it might be tempting to regard Theorem 1.1 as an -LSI analogue of Theorem 1.2. However, it seems that it is not possible to use the entropy method based on -LSIs, so that this interpretation is not fully accurate. More precisely, Theorem 5.1 cannot be used to estimate the growth of norms as in the setting of a . Indeed, it is impossible to prove the required moment inequalities
[TABLE]
under an . For example, the measure satisfies with (for ), so that (25) would imply for an upper bound on the Orlicz norm associated to
[TABLE]
However, a simple calculation shows that \operatorname{\mathbb{E}}\exp\big{(}\frac{(f-\operatorname{\mathbb{E}}f)^{2}}{16e^{2}\sigma_{p}^{2}}\big{)}\to\infty as .
The approximate tensorization property in Theorem 5.1 is interesting in its own right, but it is not yet well-studied. For finite spaces Ma15 gives sufficient conditions for a measure to satisfy an approximate tensorization property. Similar results have been derived in CMT15 , which can be applied in discrete and continuous settings. For example, if one considers a measure of the form
[TABLE]
for some countable spaces , , measures on and bounded functions , under certain technical conditions satisfies an approximate tensorization property. This does not require any functional inequality for . Very recently, in (AKPS18, , Proposition 5.4) it has been shown that the property implies dimension-free concentration inequalities for convex functions.
Note that the property requires a certain weak dependence assumption in general. For example, the push-forward of a random permutation of to cannot satisfy an approximate tensorization property. It is an interesting question to find necessary and sufficient conditions for the approximate tensorization property to hold.
Proof (Proof of Theorem 5.1)
Let be a -valued random vector with law . First we consider the case . By homogeneity of both sides, we may assume . Since is bounded, we have -a.s., where is the essential supremum of and the essential infimum. Due to the constraints on the integral this leads to . (Actually the cases or are trivial, since then -a.s., but we will not make this distinction.) Let . In particular
[TABLE]
Using the partial integration formula (see e. g. (HS75, , Theorem 21.67 and Remark 21.68)) in connection with (Bu07, , Theorem 7.7.1) yields
[TABLE]
The first integral can be calculated explicitly
[TABLE]
and moreover we have due to on
[TABLE]
Plugging in these two estimates yields
[TABLE]
Next, if we show that
[TABLE]
we can further estimate (as is a deterministic quantity in the case )
[TABLE]
To prove (26), define
[TABLE]
Now it is easy to see that since for and . Moreover
[TABLE]
so that is decreasing on every strip , and thus for all . This finishes the proof for .
For arbitrary , the proof is now easily completed. Assume that , i. e. -a.s. we have . For these , by the case we therefore obtain
[TABLE]
Plugging this into the assumption leads to
[TABLE]
As for the second part, it is a classical fact that independent random variables satisfy the tensorization property (i. e. ), see for example (Led01, , Proposition 5.6), (BBLM05, , Theorem 4.10) or (vH16, , Theorem 3.14). In the case of independent random variables, the assumption that is a product of Polish spaces can be dropped by simply defining .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Adamczak, R.: Moment inequalities for U 𝑈 U -statistics. Ann. Probab. 34 (6), 2288–2314 (2006). DOI 10.1214/009117906000000476
- 2(2) Adamczak, R.: A note on the Hanson-Wright inequality for random vectors with dependencies. Electron. Commun. Probab. 20 , no. 72, 13 (2015). DOI 10.1214/ECP.v 20-3829
- 3(3) Adamczak, R., Kotowski, M., Polaczyk, B., Strzelecki, M.: A note on concentration for polynomials in the Ising model. ar Xiv preprint (2018)
- 4(4) Adamczak, R., Latała, R., Meller, R.: Hanson–Wright inequality in Banach spaces. ar Xiv preprint (2018)
- 5(5) Adamczak, R., Wolff, P.: Concentration inequalities for non-Lipschitz functions with bounded derivatives of higher order. Probab. Theory Related Fields 162 (3-4), 531–586 (2015). DOI 10.1007/s 00440-014-0579-3
- 6(6) Aida, S., Stroock, D.W.: Moment estimates derived from Poincaré and logarithmic Sobolev inequalities. Math. Res. Lett. 1 (1), 75–86 (1994). DOI 10.4310/MRL.1994.v 1.n 1.a 9
- 7(7) Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows in metric spaces and in the space of probability measures, second edn. Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel (2008)
- 8(8) Arcones, M.A., Giné, E.: On decoupling, series expansions, and tail behavior of chaos processes. J. Theoret. Probab. 6 (1), 101–122 (1993). DOI 10.1007/BF 01046771
