On a contraction property of Bernoulli canonical processes
Witold Bednorz, Rafa{\l} Martynek

TL;DR
This paper advances Bernoulli comparison by relaxing the contraction condition on functions, enabling comparison of Rademacher sums through a new inequality involving Gaussian increments and subset sums.
Contribution
It introduces a generalized comparison inequality for Bernoulli processes that relaxes the contraction assumption to a condition based on Gaussian increments and subset sums.
Findings
Improved Bernoulli comparison under relaxed conditions
Established a new inequality involving subset sums and Gaussian increments
Applicable to independent Rademacher variables and functions with certain properties
Abstract
In this paper we improve Bernoulli comparison. The result works for independent Rademacher random variables and states that we can compare with , where a function , satisfies certain conditions. Originally, it is assumed that each of is a contraction. We relax this assumption towards comparison of Gaussian parts of increments, which can be described in the following way. For all , where is an absolute constant and , .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On a contraction property of Bernoulli canonical processes
Witold Bednorz & RafaΕ Martynek 111Subject classification: 60G15, 60G17 222Keywords and phrases: VC classes, inequality 333Research partially supported by MNiSW Grant N N201 608740. 444Institute of Mathematics, University of Warsaw, Banacha 2, 02-097 Warszawa, Poland
Abstract
In this paper we give several results concerning the supremum of canonical processes. The main theorem concerns a contraction property of Bernoulli canonical process which generalizes the one proved by Talagrand (Theorem 2.1 in [16]). The result works for independent Rademacher random variables and states that we can compare with , where a function , satisfies certain conditions. Originally, it is assumed that each of is a contraction. We relax this assumption towards comparison of Gaussian parts of increments, which can be described in the following way. For all ,
[TABLE]
where is an absolute constant and , .
1 Introduction and notation
Throughout this paper we will use the following notation. For the set the number of elements in will be denoted as . If , is a sequence of real numbers and then and is the space of all sequences with . If then . For a random variable and we put . If , is a sequence of independent, identically distributed random variables such that , and then the random variable
[TABLE]
is well-defined. For each with the process is called canonical. The convergence of the above series holds in the sense of which means that
[TABLE]
Clearly,
[TABLE]
Remark 1
The almost sure convergence in (1) might be guaranteed also when the independence assumption on βs is skipped. In such case we may consider finite dimensional version of (1), where . The most studied example is when βs have log-concave tails i.e. for convex and may be dependent.
We want to distinguish two types of canonical processes which will be of special interest. If and then the process is called canonical Bernoulli and it is denoted by . This class of processes is important for various applications e.g. infinitely divisible processes [16], empirical processes (see [17] for the comprehensive study). If and are distributed by the normal law then the process is called canonical Gaussian and it is denoted by . In fact, canonical Gaussian processes can be seen as a motivation to study canonical processes in general. The reason for that being the Karhunen-LΓ²eve representation of separable Gaussian process with the canonical Gaussian process, (see e.g. [10] Corollary 5.3.4).
The main object studied will be the suprema of canonical processes. For any set and a stochastic process we define
[TABLE]
where the supremum is taken over all finite subsets of . Usually, by considering the separable modification of it is possible to guarantee that is well-defined random variable (for the definition of separable version of the process and the discussion on the measurability of the supremum in a general setting of Banach space which is not necessarily separable see Ch. 2 in [9]). In this case coincides with the usual expectation of the supremum over , namely
[TABLE]
Let us finish this section with a few important technicalities which will be helpful in dealing with canonical processes. We have that , where so we may always require that . Moreover, and , where is the convex hull of the set and is the closure of in .
We follow the convention that numerical constants denoted by the same letter might vary from line to line. The same constants will be subindexed e.g. etc.
2 Suprema of canonical processes via chaining
First, we recall the basics of the chaining approach to upper bounds for stochastic processes. We say that the sequence of partitions of is an admissible partition of T if and for (usually it is required also that these partitions are nested i.e. for any set , there is a set such that ). For we denote by the unique element of partition which contains . Let be a sequence of points such that itsβ -th element is defined so that for all . We will denote it by . Let
[TABLE]
where the infimum is taken over all admissible sequences of partitions. We denote by the family of all , . In words, at each step of partitioning we choose some point which belongs to the same partition set as t. Clearly, and . By the chain we mean writing as the sum of consecutive approximations i.e.
[TABLE]
Let us observe the following property of .
Lemma 1
Let and be some index sets. Suppose that for some stochastic process , and are well-defined and for
[TABLE]
Then,
[TABLE]
In particular, for canonical Bernoulli and canonical Gaussian processes the above inequality holds with .
Proof. Let and . Let and be admissible partitions of and respectively. Define as all possible sums of these partitions i.e. and . It is obviously admissible since . Moreover, for and let . We also put . Clearly, for and
[TABLE]
So, by (2) we obtain
[TABLE]
The conclusion follows since , were arbitrary as well as the partitions and . The reason why the inequality (2) holds with constant for canonical Gaussian is a straightforward consequence of the fact that for we have , where . The result for canonical Bernoulli processes follows from the general Kahaneβs inequality (see e.g.[3] Theorem 13.2.1)
In [7],[11] it was proved that under a suitable regularity assumptions , where is a universal constant. Let us give a short argument for a similar upper bound.
Theorem 1
For a stochastic process for which is well-defined we have that
[TABLE]
.
Proof. Let be any admissible partition of . For any set and we denote by its -parent i.e. and . Consequently, if then . The proof is based on the analysis of the partition sequence. Let be fixed and consider . The chaining argument gives
[TABLE]
We show that
[TABLE]
Indeed, denoting we have and hence
[TABLE]
Therefore,
[TABLE]
This ends the proof.
The question about lower bounds for the suprema of canonical processes is much more involved. Let us summarize the processes in which the full characterization of the supremum (i.e. lower and upper bound) can be provided with the use of . The seminal result of Fernique and Talagrand known as the Majorizing Measure Theorem (see [2], [14] or [17] for the modern formulation) is equivalent with the statement that is comparable with up to a numerical constant. In [15] it was proved that is comparable with a quantity which, in a sense, is equivalent to for canonical process generated by βs which are symmetric and satisfy for a fixed . A similar result holds for , yet it is only possible to show that there exists a set (which may significantly differ from ) such that is comparable with up to a numerical constant. Note that the limiting case, when is the question about canonical Bernoulli processes. Later, the idea of [15] was slightly generalized by R. LataΕa in [6] for canonical processes generated by with log-concave tails, yet under specific regularity assumptions. Finally, in [8] it was proved it suffices to assume only certain conditions on a moment growth of . Unfortunately, this result still does not apply to Bernoulli processes. The question of characterization of was a long-standing problem posed by M. Talagrand and known as the Bernoulli conjecture. It was finally proved in [1]. In order to explain this result we need to provide a family of distances relevant to canonical Bernoulli processes which follow from some properties of Bernoulli-type random variables. By the results (see [4], [12] and [5] for the below formulation) for any ,
[TABLE]
where is the rearrangement of such that . Now, if we denote by some index set, we can think of (4) as a decomposition of the norm into the part
[TABLE]
and the Gaussian part
[TABLE]
In fact a similar characterization to (4) can be formulated for a broad class of processes to mention processes with log-concave distributions. In particular, in [7] there is a characterization of for canonical processes based on one-unconditional log-concave random variables. As we have mentioned the characterization of was known as the Bernoulli conjecture and was finally proved in [1]. It states that similarly to (4) the understanding of can be decomposed into the Gaussian and part. More precisely, there must exist a decomposition of into such that and moreover dominates up to a universal constant both and . Usually such decomposition is formulated in the language of existence of a mapping which defines and . Recall that we can always assume that and . We now turn to prove that the Bernoulli Theorem [1] implies that there must exist a subset such that is comparable to . The idea of the proof works also for other classes of canonical processes for which we can characterize in terms of increments, see Remark 2 below.
Theorem 2
There exists a function such that
[TABLE]
where is a universal constant, and .
Proof. First, we have to notice that it suffices to prove the result for countable sets . Indeed for any dense countable set it is true that . Suppose we have a decomposition of into and so that (5) holds. It is easy to observe that and moreover, and must be bounded since otherwise or is infinite and hence also . Therefore, is compact and contains . Consequently, with no loss in generality we can assume that is countable. Then, by the main result of [1] we get the existence of and consequently the existence of the decomposition into countable sets such that and
[TABLE]
where is a universal constant. By the Pisierβs [13] and Talagrandβs theorems [17] we have that is comparable with . Let be a standard normal variable independent of , . Observe that for any
[TABLE]
and hence . On the other hand, we can choose an admissible sequence such that . Fix any given point in . Define
[TABLE]
If
[TABLE]
Therefore, by the triangle inequality
[TABLE]
In this way we have proved that
[TABLE]
On the other hand, we have a trivial upper bound
[TABLE]
by Theorem 1.
Let us also observe that for , we could give a similar proof. It is based on the fact that for any there is a Talagrandβs [17] characterization of .
Remark 2
For the class of canonical processes based on independent symmetric such that , , is comparable with up to a constant for some that contains . The role of may be again addressed to the Gaussian reason, whereas for .
In general, we conjecture that the same is true for canonical processes based on log-concave random variables.
Conjecture 1
If , is a sequence of independent log-concave random variables with mean 0 and variance 1 then there exists and sets , such that
[TABLE]
where is a universal constant.
3 Contractions of canonical Bernoulli processes
Suppose we have a map . The main question we treat in this paper is under what assumptions on , and we can show that is bounded by up to a numerical constant. In particular we are interested in the case of canonical Bernoulli processes. Letβs start with classic results concerning comparison of Gaussian processes. It is well-known that if and , are centered Gaussian processes and , then for each finite subset
[TABLE]
This comparison is a consequence of Slepianβs Lemma (Corollary 3.14 in [9] provides the proof with constant 2, the proof with the best possible constant is Corollary 2.1.3 in [2]). Note also that by the Majorizing Measure Theorem the result can be generalized to the case when we compare a centered Gaussian process with a centered process for which we only require sub-gaussianity property, see Theorem 12.16 in [9]. We start with a discussion on possible extensions of this result. It is natural to ask for other cases when similar comparison results hold. From Theorem 1 it can be easily deduced that if we can compare moments then we can compare -type upper bounds.
Corollary 1
Suppose that is a canonical process and suppose that for each , and constant it satisfies
[TABLE]
then .
Proof. Clearly, by Theorem 1 we have .
This means that if we could show that , then by Corollary 1 we would be able to prove that . Unfortunately, in general, there is no proof that is comparable with . On the other hand, as it was discussed before there are cases where the idea works. In particular, we could use Corollary 1 in order to recover the Gaussian comparison result with some absolute constant. However, in the Gaussian setting, one can simply refer to (7) rewriting it in the following way
[TABLE]
We now move to the case of canonical Bernoulli processes. The only known comparison result is Theorem 2.1 in [16] and Theorem 4.12 in [9]. It states that if , where are contractions then dominates with the constant , namely
[TABLE]
Note that if we are interested in the comparison up to a numerical constant (not necessarily equal ) then the requirement of coordinate contractions is too demanding. However, it is known that the result analogous to (7), where we assume that is a Lipschitz contraction does not hold for Bernoulli processes. Therefore some additional assumptions on or are required. As we show in this paper, the comparison for canonical Bernoulli processes should depend on a suitable family of distances already presented in (4). The straightforward consequence of Theorem 2 is the following comparison result.
Corollary 2
Suppose that can be extended to in such a way that for any
[TABLE]
then , where is a universal constant.
Proof. Clearly, by Theorem 1 we have . Hence, by Theorem 2
[TABLE]
Note that the trouble with application of the above result is that may be much larger than . We conjecture the following generalization of the above result.
Conjecture 2
Let . If
[TABLE]
then , for an absolute constant .
Towards this aim we prove a weaker form of the conjecture. As we have explained the norm can be decomposed into the Gaussian and part. Our condition states that if Gaussian part of dominates Gaussian part of , for all and then dominates up to an absolute constant.
Theorem 3
Suppose that for all and all natural such that we have
[TABLE]
for an absolute constant . Then , where is a universal constant.
The result is stronger than the comparison for Bernoulli processes (10). In this way Theorem 3 supports the conjecture that (11) suffices to prove that . Note that there is an important case for which the conjecture is true. Namely, when we assume that all supports of are disjoint. It is crucial is to understand that in this case the decomposition postulated in the Bernoulli Theorem can have a special form: and , where and are disjoint and . We show this fact when proving the following result.
Theorem 4
Suppose that (11) is satisfied and supports are disjoint for all then , where is a universal constant.
As we show in the last section, results of this type are of interest when one wants to compare weak and strong moments for random series in a Banach space. The question was proposed by K. Oleszkiewicz in private communication.
4 Proof of the main result
In this section we prove Theorem 3 and Theorem 4.
Proof.[Proof of Theroem 3] The main step in the proof of the Bernoulli theorem - Proposition 6.2 in [1] is to show the existence of a suitable admissible sequence of partitions. Consequently, if and then it is possible to define nested partitions of such that . Moreover, for each it is possible to find and (we use the notation and , where ) which satisfy the following conditions
- (i)
, for ; 2. (ii)
if , then
- (a)
either and 2. (b)
or , and
[TABLE]
where for any
[TABLE] 3. (iii)
Moreover, numbers , , satisfy
[TABLE]
where is an absolute constant.
As proved in Theorem 3.1 in [1] the existence of the quantities that satisfy conditions (i) and (ii) formulated above implies the existence of a decomposition , such that
[TABLE]
Together with the condition (iii) we get (6). Our aim is to use the mapping to transport all the required quantities to . Before we do it we formulate an auxiliary fact about sets , namely we show that we can get rid of truncation in (13) if we skip a well controlled number of coordinates. We observe that for each there must exist set such that and
[TABLE]
The fact will be proved in two steps. First, we show that . We may only prove that , if , which implies and . Therefore, there exists such that
[TABLE]
and hence and , so by the construction of
[TABLE]
Consequently,
[TABLE]
Obviously,
[TABLE]
Therefore by the induction, . Let
[TABLE]
The second step is to establish that . Again it suffices to prove the result only for such that . Note that by (13)
[TABLE]
and hence the result holds. It remains to observe that
[TABLE]
We turn to construct an admissible partition sequence together with all the supporting quantities for the set . Let consists of , . Obviously partitions are admissible, nested and . Moreover, for each and we define
[TABLE]
and obviously
[TABLE]
As we have mentioned at the beginning of this proof, in order to use Theorem 3.1 in [1] we have to verify conditions (i) and (ii) for the new sequence as well as for , . For this aim we need our main condition (12). First it is obvious that that (12) implies for that
[TABLE]
If and then either
[TABLE]
and
[TABLE]
or . In this case we have and it suffices to show that
[TABLE]
Obviously, the problem now is that we know a little about the structure of the set . Therefore, we simply prove that
[TABLE]
It is obvious that
[TABLE]
We can choose in a way that by (12) we get
[TABLE]
[TABLE]
which proves (16) with . We have proved that assumptions required in Theorem 3.1 in [1] are satisfied for and the supporting quantities. Consequently, there exists a decomposition such that and
[TABLE]
Since and we have (14) for we obtain that
[TABLE]
It implies that
[TABLE]
for a universal constant and ends the proof.
The second case we consider is when for all supports are disjoint. The proof requires the following notation. For any and we define such that for and otherwise.
Proof.[Proof of Theorem 4] Obviously, we may require that . We additionally assume that . It simplifies the proof, but it works also for the general case as we will point out at the end. Recall that by Bernoulli Theorem [1] there exists a decomposition such that
[TABLE]
where is an absolute constant. Obviously, we may think of as suitably large. We can represent the decomposition by in a way that and . We show that under the disjoint supports assumption we may additionally require that and where and are disjoint subsets of such that . Moreover, , for some suitably chosen .
In order to prove the result we have to look closer into the definition of in the proof of Theorem 3.1 in [1]. The definition is based on the construction of admissible partitions we have described in the proof of Theorem 3 above. Using the notation introduced there let
[TABLE]
Note that is comparable with . Therefore, if is finite then necessarily for all . From the partition construction used in Section 6 in [1] we know that we can additionally assume a regularity condition on , , namely
[TABLE]
and for technical purpose we take . As in the proof of Theorem 3.1 in [1] the Bernoulli decomposition is given by , where if the definition means that and the limit exists. Consequently, denoting and we get
[TABLE]
Clearly, , and are disjoint. Note also that if and , then there must exist such that for all . Due to the disjoint supports assumption it is only possible if there exists such that . Now, if there exists such that we define
[TABLE]
The moment is of special nature in the sense that without loss of generality we may assume that for it is true that . It is due to the fact the partition is ceased after this moment. Now, we define
[TABLE]
We can now introduce the improved version of denoted by and given by
[TABLE]
It is clear that
[TABLE]
For let
[TABLE]
Observe, that . If , , then we may find such that . Consequently, using the definition (13) of for all
[TABLE]
We need to show that the decomposition is of the right form i.e. satisfies (18). For this aim we need to investigate a few cases following from different possible paths of approximations . First suppose that . Then we may use the above inequality for and due to the disjoint supports we have
[TABLE]
The same inequality holds if but . We show that . Indeed, suppose that . It means that for some we have . This may concern only if or and , but then it means that i.e. . It concludes the argument that . For it implies that
[TABLE]
For we use simply that and hence
[TABLE]
Now suppose that and . If either or , then . Otherwise . If , then by the above argument
[TABLE]
and thus using that and , we have
[TABLE]
We have the remaining bound
[TABLE]
Combining (20), (21) and (22) we conclude by (14)
[TABLE]
where is an absolute constant.
Now consider , . In order to prove that
[TABLE]
we have to argue that , for all . Note that and . Moreover, and are disjoint. Obviously, it suffices to show the argument that .
First, note that . Indeed if the set was non-empty then for a given we would have , but then for all and therefore . This would imply which is a contradiction. Suppose that and . This is only possible if and and . Let be such that , then either or and , which means that and . Therefore, and . If , then
[TABLE]
which is a contradiction. If , then the argument is trivial.
Summing up, by (23) we have
[TABLE]
and by (24) and the Gaussian comparison we have , which means that our improved version of satisfies
[TABLE]
where is a universal constant. In this way we have proved that we may additionally require that and for some disjoint such that . Recall that in each case is of the form , for a given .
We turn to the main part of the proof. Let be the smallest positive integer such that
[TABLE]
Note that it is possible that in which case we may think of as equal . Since is large enough and it is clear that must be at least greater than, say, . Consequently, by the choice of
[TABLE]
The last step is to define a suitable decomposition for . For each we define and , where and are defined by the decomposition of the norm i.e.
[TABLE]
and
[TABLE]
Consequently by the decomposition (4) and the main assumption (11),
[TABLE]
[TABLE]
Moreover, by (25)
[TABLE]
It implies that
[TABLE]
Therefore, by the Gaussian comparison, we get and hence finally
[TABLE]
It ends the proof in the case when . For the general case the proof follows the same lines, where instead of we consider . Notice that formally this may not obey the disjoint supports assumption, but it does not affect qualitatively the argument presented above.
Note that the above proof works since in the case of disjoint supports we have almost perfect knowledge about the decomposition in Bernoulli Theorem. On the other hand, it is not difficult to give an alternative proof based on the independence of variables , , but it is worth seeing what the decomposition in Theorem 3.1 in [1] should be in order to make Bernoulli comparison possible.
5 The Oleszkiewicz problem
In this section we give an example how to apply our result to compare expectations of norms of random series in a Banach space. First, we prove a general result which concerns where is linear, is convex and . Then, the assumption (8) becomes
[TABLE]
where is the linear space spanned by the set . It is because by the assumptions on any point can be represented as , where and . By the linearity of
[TABLE]
On the other hand, we can easily extend the condition (27) on the closure of . We turn to prove that if then (27) implies that dominates .
Theorem 5
Suppose that , is convex and , if is linear and satisfies (8) then , where is a universal constant.
Proof. By the Bernoulli theorem [1] we have that there exist such that and
[TABLE]
Since is linear it can be easily extended to and thus we can define , . Obviously moreover (27) implies in particular that
[TABLE]
and
[TABLE]
Consequently
[TABLE]
and
[TABLE]
Therefore
[TABLE]
It ends the proof.
We aim to study the question posed by Oleszkiewicz that concerns comparability of weak and strong moments for Bernoulli series in a Banach space. Let , , be vectors in a Banach space . Suppose that for all and
[TABLE]
This property is called weak tail domination. As we have explained in the introduction the weak tail domination can be understood in terms of comparability of weak moments, i.e. for any integer and
[TABLE]
Oleszkiewicz asked whether or not it implies the comparability of strong moments. Namely whether (28) or rather (29) implies that
[TABLE]
where is an absolute constant. Note that in the Oleszkiewicz problem one may assume that is a separable space since we can easily restrict to the closure of . Therefore we have that
[TABLE]
where the supremum is taken over all finite sets contained in . We may assume that since otherwise there is nothing to prove. Consequently for each series is convergent which is equivalent to . Let be defined by . It is clear that is a linear isomorphism on the closed linear subspace of . We apply Theorem 5 to get the following result.
Corollary 3
Suppose that is onto then (28) implies (30).
Unfortunately if is not onto then the above argument fails. Still it is believed that the comparison holds. A partial result can be deduced from Theorem 3 namely
Corollary 4
Suppose that for each and
[TABLE]
Then (30) holds, i.e.
[TABLE]
Proof. It suffices to notice that (31) implies (12) and then apply Theorem 3.
Acknowledgments
We would like to thank prof. KwapieΕ for comments on the shape of this paper and helpful discussion about Theorem 1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bednorz, W and LataΕa, R: On the boundedness of Bernoulli processes, Ann. Math. , 180 , (2014), 1167β1203.
- 2[2] Fernique, X: RΓ©gularitΓ© des trajectoires des fonctions alΓ©atoires gaussiennes. (French) Γcole dβΓtΓ© de ProbabilitΓ©s de Saint-Flour, IV-1974. Lecture Notes in Math. 480 , (1975), 1β96, Springer, Berlin.
- 3[3] Garling, D. J. H.: Inequalities: A Journey into Linear Analysis, (2007), Cambridge University Press, Cambridge.
- 4[4] Hitczenko, P: Domination inequality for martingale transforms of a Rademacher sequence, Israel J. Math. , 84 , (1993), 161β178.
- 5[5] Hitczenko, P and KwapieΕ, S: On the Rademacher Series. In: Hoffmann-JΓΈrgensen J., Kuelbs J., Marcus M.B. (eds) Probability in Banach Spaces, 9 . Progress in Probability , 35 . (1994) BirkhΓ€user, Boston, MA.
- 6[6] LataΕa, R: Sudakov minoration principle and supremum of some processes. Geom. Funct. Anal. , 7(5) , (1997), 936β953.
- 7[7] LataΕa, R: Moments of unconditional logarithmically concave vectors, in Geometric Aspects of Functional Analysis, Israel Seminar 2006-2010, Lecture Notes in Math. 2050 , (2012), 301β315, Springer.
- 8[8] LataΕa, R and Tkocz, T: A note on suprema of canonical processes based on random variables with regular moments. Electron. J. Probab. , 20(36) , (2015), 1β17.
