TL;DR
This paper proves that the probability of a random Bernoulli matrix being singular asymptotically approaches ^n, resolving a longstanding open problem in random matrix theory.
Contribution
It establishes the exact asymptotic probability of singularity for random Bernoulli matrices, a problem that remained open for decades.
Findings
Probability of singularity approaches (1/2)^n as n grows.
Provides a rigorous proof settling the old conjecture.
Includes some generalizations beyond the basic model.
Abstract
For each , let be an random matrix with independent entries. We show that {\mathbb P}\{\mbox{M_n is singular}\}=(1/2+o_n(1))^n, which settles an old problem. Some generalizations are considered.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Singularity of random Bernoulli matrices· youtube
Singularity of random Bernoulli matrices
Konstantin Tikhomirov
School of Mathematics, Georgia Institute of Technology
Abstract.
For each , let be an random matrix with independent entries. We show that {\mathbb{P}}\{\mbox{M_{n} is singular}\}=(1/2+o_{n}(1))^{n}, which settles an old problem. Some generalizations are considered.
1. Introduction
Let be independent vectors, each uniformly distributed on vertices of the discrete cube . What is the probability that are linearly independent?
The question has attracted considerable attention in literature. It can be equivalently restated as a question about singularity of an matrix with independent entries. J. Komlós [8] showed that {\mathbb{P}}\{\mbox{M_{n} is singular}\}=o_{n}(1). Much later, the bound {\mathbb{P}}\{\mbox{M_{n} is singular}\}\leq 0.999^{n} was obtained by J. Kahn, J. Komlós and E. Szemerédi in [6]. The upper bound was sequentially improved to in [17] and in [18] by T. Tao and V.Vu, and to by J. Bourgain, V. Vu and P. Wood in [3].
It has been conjectured that
[TABLE]
(see, for example, [3, Conjecture 1.1], [22, Conjecture 7.1], [23, Conjecture 2.1] as well as some stronger conjectures in [2]). In this paper, we confirm the conjecture and, moreover, provide quantitative small ball probability estimates for the smallest singular value of . We extend our analysis to random matrices with Bernoulli() independent entries. Let denote the –dimensional vector of all ones. The main result of this paper can be formulated as follows.
Theorem A**.**
For every and there are n_{\text{\tiny{p,\varepsilon}}},C_{\text{\tiny{p,\varepsilon}}}>0 depending only on and with the following property. Let n\geq n_{\text{\tiny{p,\varepsilon}}}, and let be random matrix with independent entries , such that and . Then for any
[TABLE]
It is easy to see that the probability that the first column of is equal to zero, is . Thus, the theorem implies that, for a fixed ,
[TABLE]
and further, when applied with and , gives (1).
2. Proof strategy
The proof of upper bounds on the probability of singularity of random discrete matrices (i.e. matrices with entries taking a finite number of values) developed in work [6] and later in [17, 18, 3], uses, as a starting point, the relation
[TABLE]
which holds under rather broad assumptions on the distributions of the discrete random vectors [3]. Here, the summation is taken over (finitely many) hyperplanes such that the probability of — the event that span — is non-zero. The set of the hyperplanes is then partitioned according to the value of the combinatorial dimension which is defined as the number such that \max\limits_{i}{\mathbb{P}}\{X_{i}\in V\}\in\big{(}C^{-d(V)-1/n},C^{-d(V)}\big{]}, where is some constant depending on the distribution of ’s. The sum of probabilities corresponding to a given combinatorial dimension is estimated in terms of probabilities for specially constructed random vectors . For some discrete distributions, in particular, for matrices with i.i.d. entries with the probability mass function
[TABLE]
upper bounds for the singularity obtained using the strategy are asymptotically sharp as was shown in [3].
Methods providing strong quantitative information on the smallest singular value of a random matrix were proposed in papers [14, 20]. As a further development, the work [15] established small ball probability estimates on of any matrix with i.i.d normalized subgaussian entries of the form , , where and depend only on the subgaussian moment. Thus, [15] recovered the result of [6], possibly with a worse constant. The key notion of [15] is the essential least common denominator (LCD) which measures “unstructuredness” of a fixed vector and is defined as the smallest such that the distance from to the integer lattice does not exceed . LCD can be used to characterize anticoncentration properties of random sums (and in that respect the approach of [15] is related to the earlier paper [20] where the anticoncentration properties of discrete random sums were connected with existence of generalized arithmetic progressions containing almost all of ). It was proved in [15] that for any unit vector , {\mathbb{P}}\big{\{}\big{|}\sum_{i}a_{ij}x_{i}\big{|}\leq t\big{\}}\leq Ct+\frac{C}{{\rm LCD}(x)}+e^{-cn} for any (see also [16]). This relation, combined with the assertion that the LCD of a random unit vector normal to the linear span of the first columns of is exponential in , already implies that is singular with probability at most . Moreover, an efficient averaging procedure (which we recall below) used in [15] allows to obtain strong quantitative bounds on . The LCD of the random unit normal is estimated with help of an elaborate –net argument.
The approach that we use in this paper is partially based on the methods used in [15] (and in [10]), while the principal difference lies in estimating anticoncentration properties of random sums. The starting point is the relation (taken from [15])
[TABLE]
valid for any random matrix with the distribution invariant under permutations of columns. Here, is a random unit vector orthogonal to the linear span of , ; is the set of compressible unit vectors defined as those with the Euclidean distance at most to the set of –sparse vectors; is the set of incompressible vectors. In the above formula, can be arbitrary, although for our proof we take both parameters small (depending on the choice of in the statement of our main result).
The first summand in the rightmost expression — the small ball probability for — can be bounded with help of an argument which is completely standard by now. For Reader’s convenience, we provide the estimate together with a complete proof in Preliminaries.
The second term — {\mathbb{P}}\big{\{}|\langle{\rm col}_{n}(A_{n}),Y_{n}\rangle|\leq t/\nu\big{\}} — crucially depends on the structure of the random normal . In [15], the authors provided an explicit characterization of “unstructured” vectors in terms of the LCD. In contrast, in our approach we make no attempt to obtain a geometric description of vectors with good anticoncentration properties. For each unit vector and a parameter , we introduce the threshold which is defined as the supremum of all such that {\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}x_{i},t\big{)}>Lt, where, are independent Bernoulli() random variables. Here, denotes the Lévy concentration function, defined as , , for any real valued random variable . The threshold can be viewed as a lower bound of the range of ’s for which corresponding random linear combination admits “good” anticoncentration estimates. Thus, to show that is singular with probability , it is sufficient to check that the threshold of the random normal is at most with probability at least . Note that this approach can be related to the inverse Littlewood–Offord theory started in [20], although here we are only interested in estimating from above the “size” of the set of potential normal vectors with large thresholds, rather than giving an explicit description of this set (in that respect, our strategy can be related to theorems in [19, Section 3], however, the actual proofs are very different).
To estimate the threshold, we apply a procedure which can be called “inversion of randomness”, and which we briefly describe below. We would like to make the description as non-technical as possible, and for this reason omit any discussion of the choice of parameters and other issues of secondary importance. Take any with , and let be the set of all –incompressible unit vectors with the threshold falling into the interval . In order to show that the probability of the event is close to zero, we construct a discrete approximation of , which is a subset of elements of an –dimensional lattice having the threshold of order , and coordinates in a certain range. We then show that the event is contained in
[TABLE]
where “almost orthogonal” should be understood in a specific sense which we prefer not to discuss here. This implies
[TABLE]
and the proof is reduced to efficiently bounding from above the cardinality of the discretization . The “inversion of randomness” is used to solve the problem. We consider a random vector uniformly distributed on a subset of the lattice (whose cardinality is much easier to compute) containing , and show that with probability superexponentially close to one, the threshold of is much less than , so that . This allows to bound in terms of the cardinality of the range of , times the factor . Thus, instead of studying anticoncentration of random sums with fixed coefficients satisfying certain structural assumptions, we consider typical anticoncentration properties of sums with random coefficients . It will be convenient to work with the expression
[TABLE]
which is interpreted as the Lévy concentration function with respect to the randomness of the vector of independent Bernoulli() components.
Let us state, as an illustration, a corollary of the main technical result of this paper, Theorem 4.2, which deals with rescaled vectors distributed on the integer lattice :
Theorem B**.**
Let , , , . There exist depending on and depending only on (and not on ) with the following property. Take , , and let
[TABLE]
Further, assume that a random vector is uniform on . Then
[TABLE]
Here, denotes the Lévy concentration function with respect to , a random vector with independent Bernoulli() components.
The crucial point of this theorem is that does not depend on . Essentially, this means that the probability can be made superexponentially small in as grows, while stays constant. Because of the “inversion of randomness”, a statement of this kind is translated into bounds for the cardinality of the discretization of the sets of vectors with large thresholds considered above.
3. Preliminaries
Denote by the standard –norm, so that
[TABLE]
In particular, by we denote the space of all functions with . We will say that a mapping is –Lipschitz for some if for all .
The unit Euclidean sphere in will be denoted by . The support of a vector is . The –dimensional vector of all ones is denoted by . For an matrix , and are its columns and rows, respectively, and is the spectral norm of . The smallest singular value of is denoted by . We will rely on the standard representation .
The indicator of a subset of or an event is denoted by . For any positive integer , denotes the integer interval . Further, for any two subsets , we write if for all and . The Minkowski sum of two subsets of is defined as the set of all vectors of the form , where and . For a real number , by we denote the largest integer less than or equal to , and by , the smallest integer greater than or equal to .
Everywhere in this paper, is the matrix with i.i.d. Bernoulli() entries, i.e. random variables taking value with probability and [math] with probability . Further, by we denote the matrix obtained from by removing the last row.
The Lévy concentration function of a random variable is defined by
[TABLE]
We will need the following classical inequality:
Lemma 3.1** (Lévy–Kolmogorov–Rogozin, [13]).**
Let be independent real valued random variables. Then for any real numbers and ,
[TABLE]
Here, is a universal constant.
We recall some definitions from [15]. Given and , denote by the set of all unit vectors such that there is with and (in [15], such vectors are called compressible). Further, we define the complementary set of incompressible vectors . We note that a similar partition of the unit vectors was used earlier in [10].
Following an approach developed in [15], we can write for any random matrix with the distribution invariant under permutations of columns
[TABLE]
where are arbitrary numbers in (see [15, formula (3.2) and Lemma 3.5]), and is a random unit vector orthogonal to the first columns of . A satisfactory estimate for the first term for sufficiently small and can be obtained as a simple compilation of known results (see Proposition 3.6 below). The following is a version of the tensorization lemma from [15].
Lemma 3.2**.**
Let be independent random variables.
- (1)
Assume that for some , and all and we have
[TABLE]
Then for each ,
[TABLE]
where is a universal constant.
- (2)
Assume that for some , and all we have {\mathbb{P}}\big{\{}|\xi_{k}|\leq\eta\}\leq\tau. Then for every ,
[TABLE]
Remark 3.3**.**
The second assertion of the lemma follows immediately by noting that the condition implies that . For a proof of the first assertion, see [15].
Further, we recall a standard estimate for the spectral norm of random matrices with i.i.d. subgaussian entries (for a proof, see, for example, [21, Theorem 5.39]).
Lemma 3.4**.**
For any there is depending only on and with the following property. Let and let be an random matrix with i.i.d. entries of zero mean, and such that for all . Then with probability at least we have .
The following is an easy consequence of Lemma 3.2:
Lemma 3.5**.**
For any there is which may only depend on , such that for every , and arbitrary and ,
[TABLE]
Proof.
Let be i.i.d. Bernoulli() random variables. It is not difficult to check that
[TABLE]
for some which may only depend on . For a proof of this fact, one may consider two possibilities: first when the vector has a “large” –norm, in which case the assertion follows by conditioning on all ’s except the one corresponding to the largest component of , and, second, when the vector has a “small” –norm in which case, by the Central Limit Theorem, the random linear combination is approximately normally distributed, see, for example, [4, Lemma 2.1].
Applying the second assertion of the Tensorization Lemma to (3), we get the statement. ∎
By combining Lemma 3.5 with an -net argument, we obtain a small ball probability estimate for compressible vectors. The only difference from a standard argument here is due to the fact that for , the matrix has typical spectral norm of order rather than in the simplest setting of a centered random matrix with normalized independent entries. The net therefore has to be made “denser” in the direction .
Proposition 3.6**.**
For any and there are , and depending only on and such that for and arbitrary ,
[TABLE]
Proof.
Choose any and , and fix . It will be convenient to work with parameter . Without loss of generality, we can assume that . By Lemma 3.4, there is which may only depend on such that for every the event
[TABLE]
has probability at least .
Given an (which will be chosen later), define
[TABLE]
We shall partition the set into subsets of the form
[TABLE]
First, we observe that a standard volumetric argument, together with the definition of compressible vectors, implies that for any the set admits a Euclidean \big{(}\frac{\gamma}{16L}+2\nu\big{)}–net of cardinality at most {n\choose{\lfloor\delta n\rfloor}}\big{(}\frac{C^{\prime}L}{\gamma}\big{)}^{\lfloor\delta n\rfloor}, for some universal constant . By the definition of and , for any there is such that \|x-y\|_{2}\leq\big{(}\frac{\gamma}{16L}+2\nu\big{)}=\frac{\gamma}{8L} and \big{|}\sum_{i=1}^{n}(x_{i}-y_{i})\big{|}\leq\frac{\gamma}{4|\widetilde{s}|}, implying that
[TABLE]
everywhere on . Hence,
[TABLE]
Observe further that for all vectors with \big{|}\sum_{i=1}^{n}x_{i}\big{|}\geq\frac{2L+2\gamma}{|\widetilde{s}|}, everywhere on the event we have
[TABLE]
Thus, everywhere on , \big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\geq\gamma\sqrt{n} for all with or . Combining all the above estimates, we obtain for some universal constant :
[TABLE]
It remains to note that by choosing sufficiently small, we can guarantee that the right hand side of the above inequality is less than
[TABLE]
for every . Then the desired estimate will follow for all sufficiently large satisfying \frac{C(L+\gamma)}{\gamma}\big{(}1-p+\frac{\varepsilon}{2}\big{)}^{n-1}+2^{-n}\leq\big{(}1-p+\varepsilon\big{)}^{n}. ∎
4. Random averaging in
The main goal of this section is to provide upper bounds on the cardinalities of discretizations of sets of vectors with a given threshold , discussed in the second part of Section 2. According to our “inversion of randomness”, we consider a random vector uniformly distributed on a subset of the integer lattice , and want to show that with probability the scalar product of this vector with a vector of independent Bernoulli() variables has a small threshold value (with respect to the randomness of the Bernoulli vector). First, we define the range of the random vector on the lattice.
Let be some integers and let and be some real numbers. We say that a subset is –admissible if
- •
, where every () is an origin-symmetric subset of ;
- •
is an integer interval of cardinality at least for every ;
- •
is a union of two integer intervals of total cardinality at least and for all ;
- •
;
- •
for all .
Remark 4.1**.**
The condition for , subject to appropriate rescaling, is equivalent to the fact that the “potential” normal vectors we consider are –incompressible, hence at least components of those vectors are separated from zero by .
Let be an –admissible set, and let be any real valued function on . Fix any , and assume that are independent integer random variables, where each is uniform in . For every , we define a random function by
[TABLE]
, where denotes the expectation with respect to the randomness of the vector with independent Bernoulli() components. The central statement of the section is the following theorem.
Theorem 4.2**.**
For any , , , there are , depending on and depending only on (and not on ) with the following property. Take , , let be an –admissible set and be a non-negative function in with and such that is –Lipschitz. Then, with defined above, we have
[TABLE]
The crucial feature of the theorem and the most important technical element of this paper, is that the bound on the –norm of the averaged function does not depend on the parameter which controls the probability estimate. In other words, for a given choice of , which determine the value of , the probability bound can be made superexponentially small in .
It is not difficult to check that with the only assumption on the function the above statement is false. For example, take to be the indicator of , assume that . It can be shown that for any natural , on the one hand, the event has probability at least , and, on the other hand, everywhere on we have , because is supported on and (by standard concentration results) has most of its mass located within a (random) integer interval of length . Thus, the probability cannot be made superexponentially small in without taking , hence the lower bound for , to infinity. The condition that the logarithm of the function is –Lipschitz, employed in the theorem, is designed to rule out such situations.
Before proving the theorem, we shall consider the corollary which was (in a somewhat different form) stated in the introduction as Theorem B and which will be used in our net-argument in the next section:
Corollary 4.3**.**
Let , , . There exist depending on and depending only on (and not on ) with the following property. Take , , and let be an –admissible set. Further, assume that are i.i.d Bernoulli() random variables. Then
[TABLE]
Proof.
Take n\geq\max\big{(}n_{\text{\tiny\ref{th: averaging}}},1/\eta_{\text{\tiny\ref{th: averaging}}}^{2}\big{)}, and let , and be an –admissible set. Define the function as
[TABLE]
where . Obviously, , and is –Lipschitz, hence, by the assumptions on , is –Lipschitz.
Applying Theorem 4.2 to , we get
[TABLE]
The definition of allows to rewrite the above inequality as
[TABLE]
On the other hand, since
[TABLE]
for some universal constant , the last relation implies
[TABLE]
For every and , the expression
[TABLE]
is the probability that the random sum falls into the interval . Thus, together with elementary relation , valid for any and any random variable , the previous inequality gives
[TABLE]
The statement follows. ∎
In our proof of Theorem 4.2, we will gradually improve delocalization estimates for the functions . Our first (simple) step — Lemma 4.4 — is to obtain estimates on the –norm of the truncated function (with of order ) for an arbitrary integer interval of length at most . Upper bounds of the order will follow from the Lévy–Kolmogorov–Rogozin inequality stated in the preliminaries as Lemma 3.1. At the second step, Proposition 4.5 below, we prove a weaker version of Theorem 4.2 where the parameter is allowed to depend on . At the third step, we remove the dependence of on by using the Lipschitzness of . A discussion of that part of the proof is given after Proposition 4.5.
Lemma 4.4**.**
There is a universal constant with the following property. Let , , let be a non-negative function with , and let be an –admissible set for some parameters , , and . Further, let . Then deterministically for any integer interval with . In turn, this implies
[TABLE]
for any integer interval of cardinality at least .
Proof.
Let be the random variables from (4). Fix any realization of (so that for all , by the definition of an admissible set and since ), and any integer interval of cardinality at most . Since
[TABLE]
we obtain
[TABLE]
For any ,
[TABLE]
where are Bernoulli() random variables jointly independent with . It remains to note that the Lévy–Kolmogorov–Rogozin inequality (Lemma 3.1), together with the condition for all , implies that for every ,
[TABLE]
for some universal constant . The result follows. ∎
Proposition 4.5**.**
For any , , and there are and (depending on , , and ) with the following property. Let be a non-negative function with , let , , and let be an –admissible set for some parameters and . Then
[TABLE]
where is defined by (4).
The crucial difference between the above statement and Theorem 4.2 is that in the proposition is allowed to depend on . The proof essentially follows by estimating probabilities that f_{\mathcal{A},p,\ell}(t)>\max\big{(}L_{\text{\tiny\ref{p: rough decay}}}(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)} for a fixed and taking the union bound over , although the actual argument is more involved. We will need the following definitions.
Let be a parameter, let , , , and be as in the above proposition, and let . We say that a point decays at time if
[TABLE]
Further, given any and a sequence , the descendant sequence for with respect to is a random sequence , where , (and where we set ). The connection of the above statement with these definitions is provided by the following fact: the event that the –norm of is “large” is contained within the event that there exists a descendant sequence such that a proportional number of its elements do not decay. More precisely, we have
Lemma 4.6**.**
Let , , , , and be as in Proposition 4.5, let , and set . Define event as the subset the probability space such that there exists a sequence and a point so that the descendant sequence for with respect to satisfies
[TABLE]
Then \mathcal{E}\supset\big{\{}\|f_{\mathcal{A},p,\ell}\|_{\infty}>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}\big{\}}.
Proof.
Fix a realization of such that
[TABLE]
(if such a realization does not exist then there is nothing to prove). We will construct a sequence of integers inductively in inverse order as follows. Take to be any integer such that f_{\mathcal{A},p,\ell}(t_{\ell})>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}. At –st step () we assume that has been defined, and satisfies f_{\mathcal{A},p,\ell}(t_{i})>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}. In view of the relation
[TABLE]
which follows immediately from the definition of , we get that for some . Then we set .
Clearly, the sequence constructed this way, is the descendant sequence for with respect to , which satisfies the conditions
- (a)
for all ;
- (b)
f_{\mathcal{A},p,\ell}(t_{\ell})>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}.
We will show that these conditions imply (5). Assume that is such that decays at time . According to (6) and the relation between and , we have
[TABLE]
By our definition of decay at time , both and are less than , hence less than , by the relation between and and conditions (a), (b). Thus, one of the values or f_{\mathcal{A},p,i-1}(t_{i-1}+(1-v_{i})X_{i}\big{)} is at most while the other is equal to . This gives
[TABLE]
Applying the last relation for all where there is a decay and using the monotonicity of the sequence \big{(}f_{\mathcal{A},p,j}(t_{j})\big{)}_{j=0}^{\ell}, we get for u=|\{1\leq i\leq\ell:\;t_{i-1}\mbox{ decays at time i}\}|:
[TABLE]
whence
[TABLE]
This implies the required lower bound for \ell-u=|\{1\leq i\leq\ell:\;t_{i-1}\mbox{ does not decay at time i}\}|. ∎
Proof of Proposition 4.5.
Let be a parameter to be chosen later. Set
[TABLE]
We will assume that . Let be independent random variables, each uniform on , where .
The proposition follows by applying Lemma 4.6 and a union bound. Observe that for any point such that the last element of a descendant sequence (with respect to some sequence in and with ) satisfies , we have
[TABLE]
Indeed, the definition of the descendant sequence implies that for some ,
[TABLE]
while at the same time the condition and the definition of implies that for some , , hence
[TABLE]
Set
[TABLE]
and observe that, in view of the upper bound on ’s from the definition of an admissible set, and the assumption ,
[TABLE]
Set . Then, with the event defined in Lemma 4.6, we can write
[TABLE]
Finally, fix any with , and . Let be the (random) descendant sequence for with respect to (note that is measurable w.r.t. ). Take any with . Conditioned on any realization of , the variable is uniform on , and
[TABLE]
where at the last step we applied Lemma 4.4 with and used that is either an integer interval or a union of two integer intervals. The same estimate is valid for
[TABLE]
Hence, by Markov’s inequality,
[TABLE]
Applying this estimate for all , we obtain
[TABLE]
whence
[TABLE]
where, we recall, . Finally, we observe that by choosing large enough, we can make the last expression less than for all sufficiently large . This completes the proof of the proposition. ∎
The above result is too weak to be useful for our purposes. The rest of the section is devoted to “refining” the proposition by removing the dependence on from the lower bound on the –norm of the averaged function.
Let us informally describe the idea behind the argument and provide some simple examples. The magnitude of the –norm of essentially depends on how efficient in removing spikes is the averaging step given by the relation . One may hope that if at every step , the number of spikes (coordinates with large magnitudes) is decreased significantly with a probability close to one then the resulting function would have a small –norm with a very large probability (superexponentially close to one).
For a moment, it will be convenient to drop the assumption of a bounded –norm. Consider a family of functions on , indexed by natural numbers , an integer interval , and , and defined as
[TABLE]
where we impose the following restrictions on parameters:
- •
;
- •
The function is “essentially non-constant” in the sense that for any integer interval of length at least .
Note that is –Lipschitz and that the second assumption implies . Assume that a random variable is uniformly distributed on , and define the random average
[TABLE]
We are interested in estimating the proportion of spikes preserved by the averaging; with
[TABLE]
A simple computation taking into account the condition , gives
[TABLE]
and, for ,
[TABLE]
Thus, the efficiency of the averaging, i.e. the small ball probability estimate for , is influenced by the magnitude of or, equivalently, the length of the “valleys” separating the clusters of spikes in . Now, let us discuss how this is related to the Lipschitzness of the logarithm. It is not difficult to check that, in order to satisfy the condition of being “essentially non-constant”, we must choose at least of order . Thus, the smaller is, the wider the valleys between the clusters of spikes, and the stronger the small ball probability estimates for must be. In a sense, the Lipschitzness of the logarithm of , together with the essential non-constantness, affects the averaging indirectly, by influencing the structure of spikes and valleys.
In our actual model, a similar phenomenon holds, although the argument is more complicated, first, because the pattern of spikes does not have to be as regular as in the above example, second, because the spikes are defined as points where the function exceeds a certain threshold rather than points where it takes a specific value. Our measurement of the efficiency of the averaging is more complicated compared to the above example. For a function with relatively many spikes, we compare the –norms of the original function and the average. A crucial step towards proving Theorem 4.2 is the following proposition.
Proposition 4.7**.**
Let , , and . Further, assume that are non-negative functions in , and satisfies the following conditions:
- •
* is –Lipschitz;*
- •
* for any integer interval of cardinality ;*
- •
There is interval with , such that .
Let be a random variable uniformly distributed on an integer interval of cardinality at least . Then
[TABLE]
Here, are universal constants.
Before proving the proposition, we consider two lemmas.
Lemma 4.8**.**
Let , and assume that and are such that
[TABLE]
Let . Then \big{\|}pf+(1-p)g\big{\|}_{2}^{2}\leq\big{(}p\|f\|_{2}^{2}+(1-p)\|g\|_{2}^{2}\big{)}-p(1-p)\kappa^{2}k.
Proof.
For any we have
[TABLE]
which implies the estimate. ∎
Lemma 4.9**.**
Let , and . Further, assume that is an integer interval and is a partition of into three subsets (not necessarily subintervals) such that |I_{3}|\in\big{[}\delta|I|/2,\delta|I|\big{]}, , and for all and . Further, assume that is an integer random variable uniformly distributed on an interval of cardinality at least . Then
[TABLE]
Proof.
Without loss of generality, . Fix any subinterval of cardinality at least and at most . We will prove the probability estimate under the condition that belongs to . Then the required result will easily follow by partitioning into subintervals and combining estimates for corresponding conditional probabilities.
Set
[TABLE]
and define
[TABLE]
Observe that, in view of the assumption , for any point we have
[TABLE]
Thus, if then, conditioned on , \big{|}\big{\{}t\in I:\;|f(t)-g(t+X)|\geq\kappa/2\big{\}}\big{|}<\delta|I|/4 holds with probability zero, and the statement follows. Below, we assume that .
Set . Since , we have , whence
[TABLE]
The above estimate immediately gives
[TABLE]
Hence, the number of points such that
[TABLE]
is at most . On the other hand, for every such that (7) does not hold, we clearly have
[TABLE]
Summarizing, we obtain
[TABLE]
whence
[TABLE]
The result follows. ∎
Proof of Proposition 4.7.
Let , and , so that . It is not difficult to see that there is a real interval of the form , where and such that
[TABLE]
We will inductively construct a finite sequence of integer intervals as follows.
At the first step, let ,
[TABLE]
and define (note that by the definition of , exists). In words, we choose to be the largest integer in such that the number of the elements corresponding to “small” values , is at most . If or if for all then we set and complete the process. Otherwise, we go to the second step.
At -th step, , we define to be the smallest integer in such that (the previous step of the construction guarantees that such exists and belongs to ). We set t_{k}^{r}:=\max\big{\{}t\in\widetilde{I}:\;t\geq t_{k}^{\ell};\;|\{s\in\{t_{k}^{\ell},\dots,t\}:\;g_{1}(s)\leq a\}|\leq\delta(t-t_{k}^{\ell}+1)\big{\}}, and . If or if for all then set and complete, otherwise go to the next step.
Next, we observe some important properties of the constructed sequence.
- (a)
The left-points of all intervals are contained in , and the union contains the set ; in particular, cardinality of the union is at least .
- (b)
The cardinality of any interval cannot exceed since our assumption on the function , together with the definition of , gives
[TABLE]
In particular, this implies that is strictly less than .
- (c)
The condition that is –Lipschitz implies that for any , . Indeed, since for all , we have whenever . On the other hand, the last conclusion in property (b) implies that , as .
- (d)
Property (c), in its turn, implies that for any we have , whence .
Our goal is to apply Lemma 4.9 to the constructed intervals. For each , we define the partition , where
[TABLE]
Additionally, set \kappa:=\big{(}2^{\mu^{2}}-1\big{)}\cdot 4R. We define subset of good indices as
[TABLE]
Note that (8), together with property (a) of the intervals, implies that
[TABLE]
By Lemma 4.9, for every the event
[TABLE]
has probability at most . Hence, the expectation of the sum
[TABLE]
is at most , and in view of Markov’s inequality and the lower bound for ,
[TABLE]
As the final remark, for any realization of such that , we have \big{|}\big{\{}t\in\widetilde{I}:\;|g_{1}(t)-g_{2}(t+Y)|\geq\kappa/2\big{\}}\big{|}\geq\frac{\delta}{4}\frac{\mu N}{4}, whence, in view of Lemma 4.8
[TABLE]
The result follows. ∎
The estimate on the –norm of the average in Proposition 4.7 involves the parameter which, roughly speaking, determines the cardinality of the largest cluster of spikes in . If the cardinality is small, the estimate given by the proposition becomes weaker. Even assuming best possible values for , applications of the averaging to obtain from would not provide a bound on which could be translated into a meaningful estimate for the –norm of the average.
Returning to the example that we discussed on page 4, if the function is such that is much less than , i.e. the spikes are rare then with probability the averaged function will not have any spikes left. When the spikes are located in an irregular fashion, such strong property does not hold, but the following phenomenon can still be observed: if the spikes are rare then with a probability close to one the averaged function will have much fewer (by a large factor) spikes. In other words, in the regime when there are few points where the function is large, rather than measuring the –norm of the average, it is more useful to consider how the cardinality of the set of spikes shrinks under averaging. Combining this idea with Proposition 4.7, we can derive the following statement:
Proposition 4.10**.**
For any , , , and there are and with the following property. Let , let , , let be a non-negative function satisfying
- •
;
- •
* is –Lipschitz;*
- •
* for any integer interval of cardinality ;*
- •
.
For each , let be a random variable uniform on some disjoint union of integer intervals of cardinality at least each; and assume that are independent. Define a random function as
[TABLE]
where is the vector of independent Bernoulli() components. Then
[TABLE]
In words, the above proposition tells us that, given a “preprocessed” function with , after averagings the –norm of the function drops at least by the factor with a probability superexponentially close to one. By applying the proposition several times to a “preprocessed” function given by Proposition 4.5, we will be able to complete the proof of the theorem.
Before proving the proposition, let us consider a simple lemma.
Lemma 4.11**.**
Let be a non-negative function, let , , , and assume that and that for any integer interval of cardinality we have
[TABLE]
Choose any integers and set
[TABLE]
where is the vector of independent Bernoulli() random variables. Then for any integer interval of cardinality we have
[TABLE]
Proof.
Take any point such that . We have
[TABLE]
so that
[TABLE]
On the other hand, for any interval of cardinality and any choice of , we have, by the assumptions of the lemma,
[TABLE]
whence
[TABLE]
Combining the last inequality with the condition (9), we get the statement. ∎
Proof of Proposition 4.10.
Fix any admissible parameters , , , , and , and set
[TABLE]
We will assume that is sufficiently large so that and, moreover,
[TABLE]
Set
[TABLE]
We fix any function satisfying conditions of the proposition with parameters , , , , . Note that . Define ,
[TABLE]
so that either (if is even) or (if is odd). It is easy to see that is –Lipschitz (because the log-Lipschitzness is preserved under taking convex combinations) and for all admissible .
For each , define events
[TABLE]
and
[TABLE]
(we can formally extend the first definition to ). Clearly, for each , and are measurable w.r.t the sigma-algebra generated by . Condition for a moment on any realization of , and observe that one of the following two assertions is true:
- •
holds;
- •
\big{|}\big{\{}t\in I:\;g_{i}(t)\geq 8R\big{\}}\big{|}\geq\mu N for some integer interval of cardinality , where we set . Then, applying Proposition 4.7, we get .
Hence,
[TABLE]
This implies that for any , the probability that \big{(}\mathcal{E}_{i-1}\cup\widetilde{\mathcal{E}}_{i}\big{)}^{c} holds for at least indices can be estimated as
[TABLE]
Note that the definition of ’s and the triangle inequality imply that the sequence \big{(}\|g_{k}\|_{2}\big{)}_{k\geq 0} is non-increasing. Hence, taking in the above formula and in view of our choice of , we get that with probability at least at least one of the following two conditions is satisfied:
- (a)
There is such that \big{|}\big{\{}t\in I:\;g_{i}(t)\geq H\big{\}}\big{|}\leq\mu N for any integer interval of cardinality ; or
- (b)
.
It can be checked, however, that condition (b) is improbable. Indeed, in view of the restrictions on the – and –norms of , and Hölder’s inequality,
[TABLE]
whence, applying (10), we get .
Thus, only (a) may hold, so the event
[TABLE]
has probability at least . Applying Lemma 4.11 we get that everywhere on the event
[TABLE]
The second part of our proof resembles the proof of Proposition 4.5, although the argument here is simpler. We observe that there exists a random sequence of integers satisfying
- •
The sequence \big{(}g_{i}(t_{i})\big{)}_{i=m}^{2m} is non-increasing;
- •
;
- •
for all .
On the event
[TABLE]
we necessarily have , , hence, in view of the recursive relation and the deterministic upper bound , we have and for all . Thus,
[TABLE]
We will show that the probability of the latter event is small by considering a union bound over non-random sequences.
Fix any realizations of such that the event defined above holds. Take any non-random sequence and any fixed such that (if such exists). Further, we define random numbers , . Then for any we have
[TABLE]
in view of (11) and our assumption about the distribution of ’s. Hence,
[TABLE]
is at most . This, together with the obvious observation , allows to estimate the probability of as
[TABLE]
By our definition of the parameters , the rightmost quantity is less than for all sufficiently large . The proof is complete. ∎
Proof of Theorem 4.2.
Fix any admissible parameters , , , . The proof of the theorem is essentially a combination of Proposition 4.5 which provides a rough bound on the –norm which depends on , and subsequent application of Proposition 4.10 to get a refined bound.
We define
[TABLE]
and let be the smallest positive integer such that \big{(}p/\sqrt{2}+1-p\big{)}^{q}\leq L^{-1}. Further, define as the smallest number in which satisfies
[TABLE]
and set . Now, we fix any satisfying
[TABLE]
fix , and define . It can be checked that with the above assumptions on parameters, we have .
Further, we fix any non-negative function with and such that is –Lipschitz for . Note that, by the above, , and, by Proposition 4.5, the event
[TABLE]
has probability at least .
Further, we split the integer interval into subintervals, each of cardinality at least . Let be the right endpoints of corresponding subintervals. Observe that by Lemma 4.4, for any and any integer interval of cardinality we have deterministic relation
[TABLE]
by our definition of . This enables us to apply Proposition 4.10. Applying Proposition 4.10 to the first subinterval, we get that, conditioned on the event , the event
[TABLE]
has probability at least . More generally, for the -th subinterval, the application of Proposition 4.10 gives
[TABLE]
where for each ,
[TABLE]
Taking into account our definition of ,
[TABLE]
In view of the above, the probability of this event can be estimated from below by , which is greater than for all suffificently large . It remains to choose
[TABLE]
∎
5. Proof of Theorem A
Let us recall the definition of a threshold which we considered in Section 2. For any , any vector and any parameter we define the threshold as the supremum of all such that {\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}x_{i},t\big{)}>Lt, where are independent Bernoulli() random variables. Note that . On the other hand, as a consequence of the Lévy–Kolmogorov–Rogozin inequality (Lemma 3.1), we obtain
Lemma 5.1**.**
For every , there are and with the following property. Let , , and let . Then .
Proof.
Take any vector , and let be a subset of cardinality corresponding to the largest (by absolute value) coordinates of , i.e. such that for all and . Since is –incompressible, we have , whence there is such that . Thus, for all . Applying Lemma 3.1, we get
[TABLE]
for all for some depending only on . It remains to choose and K_{\text{\tiny\ref{l: threshold}}}:=\max\big{(}\delta^{-1/2},\nu\big{)}. The result follows by the definition of the threshold. ∎
Remark 5.2**.**
The above lemma can also be obtained by applying results of [15], namely, the property that the least common denominator of an incompressible vector is of order at least .
Let us discuss what is left in order to complete the proof of Theorem A. The standard decomposition of into sets of compressible and incompressible vectors and the reduction of invertibility over the incompressible vectors to the distance problem for the random normal (see description in Section 2), leave the following question: given a number , show that the probability of the event is close to zero. Here, is a unit normal vector to the first columns of the matrix . Assuming that is a discrete approximation of the set of incompressible vectors with the threshold in , we can write
[TABLE]
(we prefer not to specify at this stage what “almost orthogonal” means quantitatively). Most of the work related to estimating the cardinality of was done in Section 4. Here, we combine Corollary 4.3 with a simple counting argument giving an estimate of the cardinality of a part of the integer lattice with prescribed bounds on the vector coordinates (see Corollary 5.5 in this section). The probability estimate for the event
[TABLE]
would follow as a simple consequence of the Tensorization Lemma 3.2 and individual small ball probability bounds for . Note that if the threshold of the vector was contained in the range , such estimates would immediately follow from the definition of the threshold. However, the vector is only an approximation of another vector with a small threshold. Thus, to make the conclusion, we will need a statement which asserts that for a given vector one can find its lattice approximation which preserves (to some extent) the anticoncentration properties of the corresponding random linear combination:
Lemma 5.3**.**
Let , let be a vector and , be numbers such that for mutually independent Bernoulli() random variables we have {\mathbb{P}}\{\big{|}\sum_{i=1}^{n}b_{i}y_{i}-\lambda\big{|}\leq t\}\leq Lt for all . Then there exists a vector having the following properties
- •
;
- •
{\mathbb{P}}\big{\{}\big{|}\sum_{i=1}^{n}b_{i}y_{i}^{\prime}-\lambda\big{|}\leq t\big{\}}\leq C_{\text{\tiny\ref{l: magic vector}}}\,Lt* for all ;*
- •
{\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}y_{i}^{\prime},\sqrt{n}\big{)}\geq c_{\text{\tiny\ref{l: magic vector}}}\,{\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}y_{i},\sqrt{n}\big{)};
- •
\big{|}\sum_{i=1}^{n}y_{i}-\sum_{i=1}^{n}y_{i}^{\prime}\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}.
Here, are universal constants.
The first and the last property of will be used to estimate the Euclidean norm of : the bound on provides control of while the relation \big{|}\sum_{i=1}^{n}y_{i}-\sum_{i=1}^{n}y_{i}^{\prime}\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n} implies \big{\|}(s+p)\,1_{n}1_{n}^{\top}(y-y^{\prime})\big{\|}_{2}\leq C_{\text{\tiny\ref{l: magic vector}}}|s+p|n.
The proof of Lemma 5.3 is based on a well known concept of the randomized rounding [12] (see also [1, 7, 11] for some recent applications). The first use of this method in the context of matrix invertibility is, to the best of author’s knowledge, due to G.Livshyts [11]. In [11], the randomized rounding is used to choose a best lattice approximation for a vector, which in turn is applied to construction of –nets; our work follows the same principle. We note that, unlike [11], in the present paper we need to explicitly control the Lévy concentration function and the small ball probability estimates for the approximating vector (the second and the third property in the statement).
Proof of Lemma 5.3.
Fix a vector , and let be independent Bernoulli() random variables. Further, let be random variables jointly independent with , such that for each , takes values and with probabilities and , respectively (so that ). Define random vector , and observe that with probability one .
Fix for a moment any and denote by the collection of all such that \big{|}\sum_{i=1}^{n}v_{i}y_{i}-\lambda\big{|}>2w. Take any . Note that is the sum of independent variables, each of mean zero and variance at most . Hence, by Markov’s inequality,
[TABLE]
Thus, if is the (random) collection of all vectors such that \big{|}\sum_{i=1}^{n}v_{i}\widetilde{y}_{i}-\lambda\big{|}>w then the above estimate immediately implies for an arbitrary subset :
[TABLE]
We take in the above relation and apply it for , , so that
[TABLE]
for any , where we have used that, by the assumption on ,
[TABLE]
The relation implies that for all ,
[TABLE]
An application of Markov’s inequality, with , gives
[TABLE]
Together with the condition on the small ball probability of random sums , this implies that there is an event measurable with respect to and with such that for any realization of from ,
[TABLE]
for some universal constant .
Further, we will derive lower bounds on the anticoncentration function of the sum . The argument is very similar to the one above, and we will skip some details. Let be a number such that
[TABLE]
where
[TABLE]
Further, denote
[TABLE]
Take any . Since the variance of the random sum is at most , we get
[TABLE]
Hence,
[TABLE]
so that with probability at least we have
[TABLE]
Denote by the event that (12) holds (observe that the event is measurable with respect to ). Note that for any realization of from the event , we have
[TABLE]
This immediately implies
[TABLE]
As the last step of the proof, we note that since the variance of the sum is at most , there is an event measurable with respect to and of probability at least such that everywhere on , \big{|}\sum_{i=1}^{n}(y_{i}-\widetilde{y}_{i})\big{|}\leq\sqrt{12n/11}.
Finally, since , there exists a realization of the random vector from the intersection . It is straightforward to check that satisfies all conditions of the lemma. ∎
Given any , , any and , we construct integer vector as follows: take and observe that, by the definition of the threshold,
[TABLE]
Hence, by Lemma 5.3, there is a vector satisfying
- •
\big{\|}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x-{\bf Y}(p,x,L,s)\big{\|}_{\infty}\leq 1;
- •
{\mathbb{P}}\big{\{}\big{|}\sum_{i=1}^{n}b_{i}{\bf Y}_{i}(p,x,L,s)+\frac{s\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\sum_{i=1}^{n}x_{i}\big{|}\leq t\big{\}} for all ;
- •
{\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}{\bf Y}_{i}(p,x,L,s),\sqrt{n}\big{)}\geq c_{\text{\tiny\ref{l: magic vector}}}\,L\,{\mathcal{T}}_{p}(x,L);
- •
\big{|}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\sum_{i=1}^{n}x_{i}-\sum_{i=1}^{n}{\bf Y}_{i}(p,x,L,s)\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}.
The vector with the above properties does not have to be unique, however, from now on we fix a single admissible vector for each –tuple .
Lemma 5.4**.**
For any there is a subset of permutations on with , having the following property. Let , , , , , and let . Then there is such that the vector \widetilde{y}=\big{(}{\bf Y}_{\sigma(i)}(p,x,L,s)\big{)}_{i=1}^{n} satisfies
[TABLE]
and
[TABLE]
Here, is a universal constant.
Proof.
If then the statement is empty, and can be chosen arbitrarily. We will therefore assume that . We start by defining the collection of permutations . Let be the largest integer such that . For every collection of subsets with , , take any permutation such that \sigma\big{(}\big{[}\lfloor 2^{-j}\delta n\rfloor\big{]}\big{)}=I_{j}, . We then compose of all such permutations (where we pick a single admissible permutation for every collection of subsets). It is not difficult to check that the total number of admissible collections , hence the cardinality of , is bounded above by for a universal constant .
It remains to check the properties of . Take any vector , and let be sets of indices corresponding to largest (by absolute value) coordinates of . Namely, is a subset of cardinality such that for all and . Let be a permutation such that
[TABLE]
Set \widetilde{y}:=\big{(}{\bf Y}_{\sigma(i)}(p,x,L,s)\big{)}_{i=1}^{n}.
By our construction, for all . Since is incompressible,
[TABLE]
whence there exists an index such that . Thus, for all , whence, in view of the definition of vector ,
[TABLE]
The upper bounds on coordinates are obtained in a similar fashion. Take any . Since for all , and has Euclidean norm one, we get
[TABLE]
Hence,
[TABLE]
∎
Let , and . Further, let be a number such that
[TABLE]
Define a subset as follows: we take , where
- •
For all and , we have
[TABLE]
- •
For , we have
[TABLE]
- •
A_{1}:=\mathbb{Z}\cap\,\Big{[}-\Big{\lceil}\frac{2\sqrt{n}}{T}\Big{\rceil}-1,\Big{\lceil}\frac{2\sqrt{n}}{T}\Big{\rceil}+1\Big{]}\setminus\Big{[}1-\Big{\lfloor}\frac{\nu}{T}\Big{\rfloor},\Big{\lfloor}\frac{\nu}{T}\Big{\rfloor}-1\Big{]}.
Lemma 5.4 immediately implies
Corollary 5.5**.**
For any there is a subset of permutations on with , having the following property. Let , , , , , , and let be such that . Then there is such that the vector \big{(}{\bf Y}_{\sigma(i)}(p,x,L,s)\big{)}_{i=1}^{n} belongs to .
The next crucial observation, which will enable us to apply results from Section 4, is
Lemma 5.6**.**
For any , there are and with the following property. Take any , and set N:=\big{\lfloor}\frac{\nu}{T}\big{\rfloor}-1. Then the subset defined above is –admissible (with the notion taken from Section 4).
Now, everything is ready to prove the main result of the paper.
Proof of Theorem A.
Fix any , , and assume that and (we will impose additional restrictions on as the proof goes on). Fix any . Our goal is to estimate from above
[TABLE]
for any . Set
[TABLE]
Applying formula (2) and Proposition 3.6, we get for any :
[TABLE]
where is a unit random vector measurable with respect to and orthogonal to . Applying Proposition 3.6 the second time, we obtain that the event \big{\{}Y_{n}\in{\rm Comp}_{n}(\delta,\nu)\big{\}} has probability at most \big{(}1-p+\varepsilon\big{)}^{n}. Further, for every vector , according to Lemma 5.1, whenever . Set
[TABLE]
Then, in view of the above, we have
[TABLE]
Further, for any , using the independence of and and the definition of the threshold, we can write
[TABLE]
Hence, for every ,
[TABLE]
Fix any and set and
[TABLE]
where denotes the constant such that
[TABLE]
(which exists, according to Lemma 3.4). Further, let be the set of permutations from Corollary 5.5. Take any such that . Then the vector satisfies (see page 5)
- (a)
\big{\|}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x-{\bf Y}(p,x,L,s)\big{\|}_{\infty}\leq 1;
- (b)
{\mathbb{P}}\big{\{}\big{|}\sum_{i=1}^{n}b_{i}\,{\bf Y}_{i}(p,x,L,s)+s\,\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\sum_{i=1}^{n}x_{i}\big{|}\leq\tau\big{\}}\leq\frac{C_{\text{\tiny\ref{l: magic vector}}}\,L\,T}{\sqrt{n}}\,\tau for all ;
- (c)
{\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}\,{\bf Y}_{i}(p,x,L,s),\sqrt{n}\big{)}\geq c_{\text{\tiny\ref{l: magic vector}}}\,L\,{\mathcal{T}}_{p}(x,L)\geq\frac{c_{\text{\tiny\ref{l: magic vector}}}}{2}LT\geq\frac{c_{\text{\tiny\ref{l: magic vector}}}L\nu}{4N};
- (d)
\big{|}\sum_{i=1}^{n}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x_{i}-\sum_{i=1}^{n}{\bf Y}_{i}(p,x,L,s)\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}.
Note that a combination of (b) and (d) gives
[TABLE]
Define the subset as
[TABLE]
and let be defined as
[TABLE]
Then, by Corollary 5.5 and the above remarks, for every with . Set Q:=\big{\{}z\in\mathbb{R}^{n}:\;\big{|}\sum_{i=1}^{n}z_{i}\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}\big{\}}. Then the last assertion, together with properties (a) and (d) above, implies
[TABLE]
Thus, we obtain the relation
[TABLE]
Now, let us estimate the probability that is small for a fixed . By our definition of the set , we have
[TABLE]
Hence, appying Lemma 3.2, we get
[TABLE]
Observe that for any we have
[TABLE]
where we have used that . Then the above relations, together with a net argument, imply
[TABLE]
The last — and the most important — step of the proof is to bound from above the cardinality of . In view of Corollary 5.5 and the definition of and , we have
[TABLE]
Further, observe that by Lemma 5.6, the set is –admissible. Hence, Corollary 4.3 is applicable, and the definition of gives for all large enough:
[TABLE]
Combining this with the above relations and recalling that N=\big{\lfloor}\frac{\nu}{T}\big{\rfloor}-1, we obtain
[TABLE]
for all sufficiently large , where the last relation follows from the choice of .
Returning to the small ball probability for , we get
[TABLE]
for all sufficiently large . Since was chosen arbitrarily, the result follows. ∎
Acknowledgement. I would like to thank the Department of Mathematical and Statistical Sciences, University of Alberta, which I visited in December 2018 and where the first draft of this work was completed. I would also like to thank Prof. Terence Tao and the anonymous Referees for valuable remarks.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Alon and B. Klartag, Optimal compression of approximate inner products and dimension reduction, in 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017 , 639–650, IEEE Computer Soc., Los Alamitos, CA. MR 3734268
- 2[2] R. Arratia and S. De Salvo, On the singularity of random Bernoulli matrices—novel integer partitions and lower bound expansions, Ann. Comb. 17 (2013), no. 2, 251–274. MR 3056767
- 3[3] J. Bourgain, V. H. Vu and P. M. Wood, On the singularity probability of discrete random matrices, J. Funct. Anal. 258 (2010), no. 2, 559–603. MR 2557947
- 4[4] D. Chafaï and K. Tikhomirov, On the convergence of the extremal eigenvalues of empirical covariance matrices with dependence, Probab. Theory Related Fields 170 (2018), no. 3-4, 847–889. MR 3773802
- 5[5] P. Erdös, On a lemma of Littlewood and Offord, Bull. Amer. Math. Soc. 51 (1945), 898–902. MR 0014608
- 6[6] J. Kahn, J. Komlós and E. Szemerédi, On the probability that a random ± 1 plus-or-minus 1 \pm 1 -matrix is singular, J. Amer. Math. Soc. 8 (1995), no. 1, 223–240. MR 1260107
- 7[7] B. Klartag, G. Livshyts, The lower bound for Koldobsky’s slicing inequality via random rounding, ar Xiv:1810.06189
- 8[8] J. Komlós, On the determinant of ( 0 , 1 ) 0 1 (0,\,1) matrices, Studia Sci. Math. Hungar 2 (1967), 7–21. MR 0221962
