On the geometry of random polytopes
Shahar Mendelson

TL;DR
This paper provides a simple proof of a recent result showing that the convex hull of certain random matrix rows approximates a specific geometric body under minimal assumptions.
Contribution
It offers a straightforward proof of a geometric approximation result for random polytopes generated by symmetric random matrices.
Findings
Convex hull of random matrix rows approximates a specific geometric body.
High probability bounds for the approximation.
Minimal assumptions on the distribution of matrix entries.
Abstract
We present a simple proof to a fact recently established in [5]: let be a symmetric random variable that has variance , let be an random matrix whose entries are independent copies of , and set to be the rows of . Then under minimal assumptions on and as long as , with high probability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Advanced Combinatorial Mathematics · Random Matrices and Applications
On the geometry of random polytopes
Shahar Mendelson LPSM, Sorbonne University, and Mathematical Sciences Institute, The Australian National University. Email: [email protected]
Abstract
We present a simple proof to a fact recently established in [5]: let be a symmetric random variable that has variance , let be an random matrix whose entries are independent copies of , and set to be the rows of . Then under minimal assumptions on and as long as ,
[TABLE]
with high probability.
1 Introduction
Let be a symmetric random variable that has variance and let be the random vector whose coordinates are independent copies of . Consider a random matrix whose rows are independent copies of . In this note we explore the geometry of the random polytope
[TABLE]
specifically, we study whether is likely to contain a large canonical convex body.
One of the first results in this direction is from [4], where it is shown that if is the standard gaussian random variable, and , then
[TABLE]
with probability at least . It should be noted that this estimate cannot be improved—up to the dependence of the constants on (see, for example, the discussion in Section 4 of [9]).
The proof of (1.1) relies heavily on the tail behaviour of the gaussian random variable. It is therefore natural to try and extend (1.1) beyond the gaussian case, to random polytopes generated by more general random variables that still have ‘well-behaved’ tails. The optimal subgaussian estimate was established in [9]:
Theorem 1.1**.**
Let be a mean-zero random variable that has variance and is -subgaussian111A centred random variable is -subgaussian if for every , .. Let and set . Then with probability at least
[TABLE]
where and are constants that depend on and is an absolute constant.
Remark 1.2**.**
Note that the body contains in (1.2) is slightly smaller than in (1.1), as one has to intersect the Euclidean ball from (1.1) with the unit cube.
While Theorem 1.1 resolves the problem when is subgaussian, the situation is less clear when is heavy-tailed. That naturally leads to the following question:
Question 1.3**.**
Under what conditions on one still has that for ,
[TABLE]
with high probability?
Following the progress in [7], where Question 1.3 had been studied under milder moment assumptions on than in Theorem 1.1, Question 1.3 was answered in [5] under a minimal small-ball condition on .
Definition 1.4**.**
A mean-zero random variable satisfies a small-ball condition with constants and if
[TABLE]
Theorem 1.5**.**
[5]** Let be a symmetric, variance random variable that satisfies (1.4) with constants and . For there are constants and that depend on and for which the following holds. If then with probability at least ,
[TABLE]
Remark 1.6**.**
The assumption made in [5] is slightly stronger than in Theorem 1.5; namely, that for every , . However, (1.4) suffices for the proof. At the same time, in [5] the random variables are only assumed to be independent, symmetric and variance , with each one of the ’s satisfying (1.4) with the same constants and . In what follows we consider only the case in which are independent copies of a single random variable —though extending the presentation to the independent case is straightforward.
The original proof of Theorem 1.5 is based on the construction of a well-chosen net, and that construction is rather involved. Here we present a much simpler argument that is based on the small-ball method (see, e.g., [10, 11, 12]). As an added value, the method presented here gives more information than the assertion of Theorem 1.5, as is explained in what follows.
The starting point of the proof of Theorem 1.5 is straightforward: let
[TABLE]
and set
[TABLE]
By comparing the support functions of and of , one has to show that with the wanted probability, for every , . And, since , Theorem 1.5 can be established by showing that for suitable constants and ,
[TABLE]
What we actually show is a stronger statement than (1.5): not only is there a high probability event on which
[TABLE]
but in fact, on that “good event”, for each , has large coordinates, with each one of these coordinates satisfying that . Thus, the fact that is exhibited by many coordinates and not just by a single one.
Proving that indeed, with high probability the smallest cardinality
[TABLE]
is large is carried out in two steps:
Controlling a single point. For and a well chosen one establishes an individual estimate: that for every fixed ,
[TABLE]
In particular, if are independent copies of then with probability at least ,
[TABLE]
From a single function to uniform control. Thanks to the high probability estimate with which (1.6) holds, it is possible to control uniformly any subset of whose cardinality is at most . Let be a minimal -cover of with respect to the norm of the allowed cardinality. For every , let that satisfies . The wanted uniform control is achieved by showing that
[TABLE]
with probability at least .
Indeed, combining the two estimates it follows that with probability at least
[TABLE]
for every , one has that
[TABLE]
and
[TABLE]
Hence, on that event, for every there is of cardinality at least , and for every ,
[TABLE]
implying that
[TABLE]
in particular, as required.
In the next section this line of reasoning is used to prove Theorem 1.5.
2 Proof of Theorem 1.5
Before we begin the proof, let us introduce some notation. Throughout, absolute constant are denoted by etc. . Unless specified otherwise, the value of these constants may change from line to line. Constants that depend on some parameter are denoted by . We write if there is an absolute constant such that ; implies that ; and if both and .
The required estimate for a single point follows very closely ideas from [13], which had been developed for obtaining lower estimates on the tails of marginals of the Rademacher vector , that is, on
[TABLE]
as a function of the ‘location’ in of .
Fix and consider the interpolation body and its dual . The key estimate one needs to establish the wanted individual control is:
Theorem 2.1**.**
There exist constants and that depend only on the small-ball constants of ( and ) such that if then
[TABLE]
The proof of Theorem 2.1 is based on some well-known facts on the interpolation norm .
Lemma 2.2**.**
There exists an absolute constant such that for every ,
[TABLE]
where is the nonincreasing rearrangement of .
Moreover, for very there is a partition of to disjoint blocks such that
[TABLE]
The first part of Lemma 2.2 is due to Holmstedt (see Theorem 4.1 in [6]) and it gives useful intuition on the nature of the norm . The second part is Lemma 2 from [13] and it plays an essential role in what follows.
Before proving Theorem 2.1, we require an additional observation that is based on the small-ball condition satisfies by .
Lemma 2.3**.**
Let and set . Then
[TABLE]
where is a constant the depends only on ’s small-ball constants and .
Proof. Let be independent, symmetric, -valued random variables that are also independent of . Recall that is symmetric and therefore has the same distribution as . By Khintchine’s inequality it is straightforward to verify that
[TABLE]
Let ; thus, the ’s are iid -valued random variables whose mean is at least , and point-wise
[TABLE]
Hence, and all that is left to complete the proof is to show that
[TABLE]
Let and in particular, . Assume without loss of generality that and that the ’s are non-increasing, let be a parameter to be specified in what follows, and set .
Consider two cases:
If then with probability at least , . In that case
[TABLE]
Alternatively, , implying that
[TABLE]
because .
By Bernstein’s inequality,
[TABLE]
provided that is a small-enough absolute constant. Using, once again, that it is evident that with probability , and therefore
[TABLE]
Thus, setting one has that
[TABLE]
as claimed.
Proof of Theorem 2.1. Fix and recall that by Lemma 2.2 there is a decomposition of to disjoint blocks such that
[TABLE]
Let ; observe that are independent random variables and that by Lemma 2.3,
[TABLE]
for a constant .
At the same time,
[TABLE]
Therefore, by the Paley-Zygmund inequality (see, e.g., [2]), for any ,
[TABLE]
Setting ,
[TABLE]
and since is a symmetric random variable (because the ’s are symmetric), it follows that
[TABLE]
For let
[TABLE]
which are independent events. Hence,
[TABLE]
Thus, by (2.1), if and , one has
[TABLE]
From here on, the constants and denote the constants from Theorem 2.1.
Corollary 2.4**.**
For , and there are constants and that depend on , and , and an absolute constant for which the following holds. If , and then with probability at least ,
[TABLE]
Proof. Let , and invoking Theorem 2.1,
[TABLE]
where and depend only on and .
Let such that ; thus, . If , are independent copied of and , then . Hence, by a standard concentration argument, with probability at least ,
[TABLE]
where is an absolute constant.
Thanks to the high probability estimate with which Corollary 2.4 holds, one can control uniformly all the elements of a set as long as for a suitable absolute constant , and as long as . In that case, there is an event of probability at least such that for every ,
[TABLE]
The natural choice of a set is a minimal -cover of with respect to the norm. Note that , and so there is a -cover of the allowed cardinality for
[TABLE]
where is an absolute constant.
Clearly, , and to complete the proof of Theorem 1.5 it suffices to show that with probability at least
[TABLE]
To prove (2.3), observe that is the supremum of an empirical process indexed by a class of binary valued functions
[TABLE]
in particular, for every ,
[TABLE]
By Talagrand’s concentration inequality for bounded empirical processes ([14], see also [1]), with probability at least ,
[TABLE]
Let us show that for the right choice of and large enough, .
The required estimate on and clearly holds as long as
[TABLE]
As for , note that point-wise
[TABLE]
Let be independent, symmetric, -valued random variables that are independent of . By the Giné-Zinn symmetrization theorem [3] and the contraction inequality for Bernoulli processes [8],
[TABLE]
which is sufficiently small as long as .
3 Concluding Remarks
This proof of Theorem 1.5 is based on the small-ball method and follows an almost identical path to previous results that use the method: first, one obtains an individual estimate that implies that for each in a fine-enough net, many of the values are in the ‘right range’; and then, that the ‘oscillation vector’ does not spoil too many coordinates when is ‘close enough’ to . Thus, with high probability and uniformly in , many of the values are in the right range.
Having said that, there is one substantial difference between this proof and other instances in which the small-ball method had been used. Perviously, individual estimates had been obtained in the small-ball regime; here the necessary regime is different: one requires a lower estimate on the tails of marginals of . And indeed, the core of the proof is the individual estimate from Theorem 2.1, where one shows that if satisfies a small-ball condition and has iid coordinates distributed as then its marginals exhibit a ‘super-gaussian’ behaviour at the right level.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities . Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence.
- 2[2] V.H. de la Peña and E. Giné. Decoupling: from Dependence to Independence . Springer, New York, 1999.
- 3[3] E. Giné and J. Zinn. Some limit theorems for empirical processes. Ann. Probab. , 12(4):929–998, 1984.
- 4[4] E. D. Gluskin. Extremal properties of orthogonal parallelepipeds and their applications to the geometry of Banach spaces. Mat. Sb. (N.S.) , 136(178)(1):85–96, 1988.
- 5[5] O. Guédon, A.E. Litvak, and K. Tatarko. Random polytopes obtained by matrices with heavy tailed entries. manuscript, available at ar Xiv:1811.12007 , 2018.
- 6[6] T. Holmstedt. Interpolation of quasi-normed spaces. Math. Scand. , 26:177–199, 1970.
- 7[7] F. Krahmer, C. Kummerle, and H. Rauhut. A quotient property for matrices with heavy-tailed entries and its application to noise-blind compressed sensing. manuscript, available at ar Xiv:1806.04261 , 2018.
- 8[8] M. Ledoux and M. Talagrand. Probability in Banach spaces . Classics in Mathematics. Springer-Verlag, Berlin, 2011.
