Singularity of random Bernoulli matrices

Konstantin Tikhomirov

arXiv:1812.09016·math.PR·August 27, 2019

Singularity of random Bernoulli matrices

Konstantin Tikhomirov

PDF

1 Video

TL;DR

This paper proves that the probability of a random Bernoulli matrix being singular asymptotically approaches ^n, resolving a longstanding open problem in random matrix theory.

Contribution

It establishes the exact asymptotic probability of singularity for random Bernoulli matrices, a problem that remained open for decades.

Findings

01

Probability of singularity approaches (1/2)^n as n grows.

02

Provides a rigorous proof settling the old conjecture.

03

Includes some generalizations beyond the basic model.

Abstract

For each $n$ , let $M_{n}$ be an $n \times n$ random matrix with independent $\pm 1$ entries. We show that ${\mathbb P}\{\mbox{$ M_n $is singular}\}=(1/2+o_n(1))^n$ , which settles an old problem. Some generalizations are considered.

Equations410

{\mathbb{P}}\{\mbox{$M_{n}$ is singular}\}=\bigg{(}\frac{1}{2}+o_{n}(1)\bigg{)}^{n}

{\mathbb{P}}\{\mbox{$M_{n}$ is singular}\}=\bigg{(}\frac{1}{2}+o_{n}(1)\bigg{)}^{n}

{\mathbb{P}}\big{\{}s_{\min}(B_{n}(p)+s\,1_{n}1_{n}^{\top})\leq t/\sqrt{n}\big{\}}\leq\big{(}1-p+\varepsilon\big{)}^{n}+C_{\text{\tiny{p,$\varepsilon$}}}\,t,\quad t>0.

{\mathbb{P}}\big{\{}s_{\min}(B_{n}(p)+s\,1_{n}1_{n}^{\top})\leq t/\sqrt{n}\big{\}}\leq\big{(}1-p+\varepsilon\big{)}^{n}+C_{\text{\tiny{p,$\varepsilon$}}}\,t,\quad t>0.

{\mathbb{P}}\big{\{}\mbox{$B_{n}(p)$ is singular}\big{\}}=\big{(}1-p+o_{n}(1)\big{)}^{n},

{\mathbb{P}}\big{\{}\mbox{$B_{n}(p)$ is singular}\big{\}}=\big{(}1-p+o_{n}(1)\big{)}^{n},

\displaystyle{\mathbb{P}}\big{\{}\mbox{the matrix with columns $X_{1},\dots,X_{n}$ is singular}\big{\}}

\displaystyle{\mathbb{P}}\big{\{}\mbox{the matrix with columns $X_{1},\dots,X_{n}$ is singular}\big{\}}

= e^{o_{n} (n)} V \sum P (A_{V}),

f(m)=\begin{cases}\frac{1}{4},&\mbox{ if $m=\pm 1$};\\ \frac{1}{2},&\mbox{ if $m=0$,}\end{cases}

f(m)=\begin{cases}\frac{1}{4},&\mbox{ if $m=\pm 1$};\\ \frac{1}{2},&\mbox{ if $m=0$,}\end{cases}

\begin{split}{\mathbb{P}}\big{\{}s_{\min}(A_{n})\leq t/\sqrt{n}\big{\}}&\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Incomp}_{n}(\delta,\nu)\big{\}}\\ &\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+\frac{1}{\delta}{\mathbb{P}}\big{\{}|\langle{\rm col}_{n}(A_{n}),Y_{n}\rangle|\leq t/\nu\}\big{\}},\end{split}

\begin{split}{\mathbb{P}}\big{\{}s_{\min}(A_{n})\leq t/\sqrt{n}\big{\}}&\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Incomp}_{n}(\delta,\nu)\big{\}}\\ &\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+\frac{1}{\delta}{\mathbb{P}}\big{\{}|\langle{\rm col}_{n}(A_{n}),Y_{n}\rangle|\leq t/\nu\}\big{\}},\end{split}

\mathcal{E}_{{\mathcal{N}}_{T}}:=\big{\{}\mbox{There is a vector }x\in{\mathcal{N}}_{T}\mbox{ ``almost orthogonal'' to ${\rm col}_{1},\dots,{\rm col}_{n-1}$}\big{\}},

\mathcal{E}_{{\mathcal{N}}_{T}}:=\big{\{}\mbox{There is a vector }x\in{\mathcal{N}}_{T}\mbox{ ``almost orthogonal'' to ${\rm col}_{1},\dots,{\rm col}_{n-1}$}\big{\}},

{\mathbb{P}}\{Y_{n}\in D_{T}\}\leq{\mathbb{P}}(\mathcal{E}_{{\mathcal{N}}_{T}})\leq|{\mathcal{N}}_{T}|\,\max\limits_{x\in{\mathcal{N}}_{T}}{\mathbb{P}}\big{\{}\mbox{$x$ is ``almost orthogonal'' to ${\rm col}_{1},\dots,{\rm col}_{n-1}$}\big{\}},

{\mathbb{P}}\{Y_{n}\in D_{T}\}\leq{\mathbb{P}}(\mathcal{E}_{{\mathcal{N}}_{T}})\leq|{\mathcal{N}}_{T}|\,\max\limits_{x\in{\mathcal{N}}_{T}}{\mathbb{P}}\big{\{}\mbox{$x$ is ``almost orthogonal'' to ${\rm col}_{1},\dots,{\rm col}_{n-1}$}\big{\}},

{\mathcal{L}}_{b}\Big{(}\sum\limits_{i=1}^{n}b_{i}\xi_{i},t\Big{)}:=\sup\limits_{\lambda\in\mathbb{R}}\sum\limits_{(v_{j})_{j=1}^{n}\in\{0,1\}^{n}}p^{\sum_{j}v_{j}}(1-p)^{n-\sum_{j}v_{j}}{\bf 1}_{[-t,t]}\big{(}\lambda+v_{1}\xi_{1}+\dots+v_{n}\xi_{n}\big{)},

{\mathcal{L}}_{b}\Big{(}\sum\limits_{i=1}^{n}b_{i}\xi_{i},t\Big{)}:=\sup\limits_{\lambda\in\mathbb{R}}\sum\limits_{(v_{j})_{j=1}^{n}\in\{0,1\}^{n}}p^{\sum_{j}v_{j}}(1-p)^{n-\sum_{j}v_{j}}{\bf 1}_{[-t,t]}\big{(}\lambda+v_{1}\xi_{1}+\dots+v_{n}\xi_{n}\big{)},

A := {- 2 N, \dots, - N - 1, N + 1, \dots, 2 N}^{⌊ δ n ⌋} \times {- N, - N + 1, \dots, N}^{n - ⌊ δ n ⌋} .

A := {- 2 N, \dots, - N - 1, N + 1, \dots, 2 N}^{⌊ δ n ⌋} \times {- N, - N + 1, \dots, N}^{n - ⌊ δ n ⌋} .

{\mathbb{P}}_{\xi}\big{\{}{\mathcal{L}}_{b}\big{(}b_{1}\xi_{1}+\dots+b_{n}\xi_{n},\sqrt{n}\big{)}>L_{\text{\tiny B}}N^{-1}\big{\}}\leq e^{-M\,n}.

{\mathbb{P}}_{\xi}\big{\{}{\mathcal{L}}_{b}\big{(}b_{1}\xi_{1}+\dots+b_{n}\xi_{n},\sqrt{n}\big{)}>L_{\text{\tiny B}}N^{-1}\big{\}}\leq e^{-M\,n}.

\big{\|}(x_{1},x_{2},\dots)\big{\|}_{q}=\bigg{(}\sum\limits_{i}|x_{i}|^{q}\bigg{)}^{1/q},\quad 1\leq q<\infty;\quad\mbox{and}\quad\big{\|}(x_{1},x_{2},\dots)\big{\|}_{\infty}=\max\limits_{i}|x_{i}|.

\big{\|}(x_{1},x_{2},\dots)\big{\|}_{q}=\bigg{(}\sum\limits_{i}|x_{i}|^{q}\bigg{)}^{1/q},\quad 1\leq q<\infty;\quad\mbox{and}\quad\big{\|}(x_{1},x_{2},\dots)\big{\|}_{\infty}=\max\limits_{i}|x_{i}|.

{\mathcal{L}}\big{(}\xi,t\big{)}:=\sup\limits_{\lambda\in\mathbb{R}}{\mathbb{P}}\big{\{}|\xi-\lambda|\leq t\big{\}},\quad t\geq 0.

{\mathcal{L}}\big{(}\xi,t\big{)}:=\sup\limits_{\lambda\in\mathbb{R}}{\mathbb{P}}\big{\{}|\xi-\lambda|\leq t\big{\}},\quad t\geq 0.

{\mathcal{L}}\Big{(}\sum_{i=1}^{m}\xi_{i},r\Big{)}\leq\frac{C_{\text{\tiny\ref{l: lkr}}}r}{\sqrt{\sum_{i=1}^{m}(1-{\mathcal{L}}(\xi_{i},r_{i}))r_{i}^{2}}}.

{\mathcal{L}}\Big{(}\sum_{i=1}^{m}\xi_{i},r\Big{)}\leq\frac{C_{\text{\tiny\ref{l: lkr}}}r}{\sqrt{\sum_{i=1}^{m}(1-{\mathcal{L}}(\xi_{i},r_{i}))r_{i}^{2}}}.

\begin{split}{\mathbb{P}}\big{\{}s_{\min}(A_{n})\leq t/\sqrt{n}\big{\}}&\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Incomp}_{n}(\delta,\nu)\big{\}}\\ &\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+\frac{1}{\delta}{\mathbb{P}}\big{\{}|\langle{\rm col}_{n}(A_{n}),Y_{n}\rangle|\leq t/\nu\}\big{\}},\end{split}

\begin{split}{\mathbb{P}}\big{\{}s_{\min}(A_{n})\leq t/\sqrt{n}\big{\}}&\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Incomp}_{n}(\delta,\nu)\big{\}}\\ &\leq{\mathbb{P}}\big{\{}\|A_{n}x\|_{2}\leq t/\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta,\nu)\big{\}}\\ &\hskip 28.45274pt+\frac{1}{\delta}{\mathbb{P}}\big{\{}|\langle{\rm col}_{n}(A_{n}),Y_{n}\rangle|\leq t/\nu\}\big{\}},\end{split}

{\mathbb{P}}\big{\{}|\xi_{k}|\leq\varepsilon\}\leq K\varepsilon.

{\mathbb{P}}\big{\{}|\xi_{k}|\leq\varepsilon\}\leq K\varepsilon.

{\mathbb{P}}\big{\{}\|(\xi_{1},\xi_{2},\dots,\xi_{m})\|_{2}\leq\varepsilon\sqrt{m}\big{\}}\leq(C_{\text{\tiny\ref{l: tensorization}}}\,K\,\varepsilon)^{m},

{\mathbb{P}}\big{\{}\|(\xi_{1},\xi_{2},\dots,\xi_{m})\|_{2}\leq\varepsilon\sqrt{m}\big{\}}\leq(C_{\text{\tiny\ref{l: tensorization}}}\,K\,\varepsilon)^{m},

{\mathbb{P}}\big{\{}\|(\xi_{1},\xi_{2},\dots,\xi_{m})\|_{2}\leq\eta\sqrt{\varepsilon m}\big{\}}\leq\bigg{(}\frac{e}{\varepsilon}\bigg{)}^{\varepsilon m}\tau^{m-\varepsilon m}.

{\mathbb{P}}\big{\{}\|(\xi_{1},\xi_{2},\dots,\xi_{m})\|_{2}\leq\eta\sqrt{\varepsilon m}\big{\}}\leq\bigg{(}\frac{e}{\varepsilon}\bigg{)}^{\varepsilon m}\tau^{m-\varepsilon m}.

{\mathbb{P}}\big{\{}\big{\|}(B_{n}^{1}(p)+s\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\leq\gamma_{\text{\tiny\ref{l: aux single vector}}}\sqrt{\varepsilon n}\big{\}}\leq\bigg{(}\frac{e}{\varepsilon}\bigg{)}^{\varepsilon(n-1)}(1-p)^{(n-1)(1-\varepsilon)}.

{\mathbb{P}}\big{\{}\big{\|}(B_{n}^{1}(p)+s\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\leq\gamma_{\text{\tiny\ref{l: aux single vector}}}\sqrt{\varepsilon n}\big{\}}\leq\bigg{(}\frac{e}{\varepsilon}\bigg{)}^{\varepsilon(n-1)}(1-p)^{(n-1)(1-\varepsilon)}.

{\mathcal{L}}\Big{(}\sum_{i=1}^{n}b_{i}x_{i},r\Big{)}\leq 1-p

{\mathcal{L}}\Big{(}\sum_{i=1}^{n}b_{i}x_{i},r\Big{)}\leq 1-p

{\mathbb{P}}\big{\{}\big{\|}(B_{n}^{1}(p)+s\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\leq\gamma_{\text{\tiny\ref{l: compress}}}\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta_{\text{\tiny\ref{l: compress}}},\nu_{\text{\tiny\ref{l: compress}}})\big{\}}\leq\big{(}1-p+\varepsilon\big{)}^{n}.

{\mathbb{P}}\big{\{}\big{\|}(B_{n}^{1}(p)+s\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\leq\gamma_{\text{\tiny\ref{l: compress}}}\sqrt{n}\mbox{ for some }x\in{\rm Comp}_{n}(\delta_{\text{\tiny\ref{l: compress}}},\nu_{\text{\tiny\ref{l: compress}}})\big{\}}\leq\big{(}1-p+\varepsilon\big{)}^{n}.

\mathcal{E}:=\big{\{}\|B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}\|\leq L\sqrt{n}\big{\}}

\mathcal{E}:=\big{\{}\|B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}\|\leq L\sqrt{n}\big{\}}

δ := ε; γ := γ_{\ref l: aux single vector} ε; ν := \frac{γ}{32 L} .

δ := ε; γ := γ_{\ref l: aux single vector} ε; ν := \frac{γ}{32 L} .

S_{\ell}:={\rm Comp}_{n}(\delta,\nu)\cap\Big{\{}x\in\mathbb{R}^{n}:\;\sum\nolimits_{i=1}^{n}x_{i}\in\Big{[}\frac{\gamma\ell}{4|\widetilde{s}|},\frac{\gamma(\ell+1)}{4|\widetilde{s}|}\Big{)}\Big{\}},\quad\ell\in\mathbb{Z}.

S_{\ell}:={\rm Comp}_{n}(\delta,\nu)\cap\Big{\{}x\in\mathbb{R}^{n}:\;\sum\nolimits_{i=1}^{n}x_{i}\in\Big{[}\frac{\gamma\ell}{4|\widetilde{s}|},\frac{\gamma(\ell+1)}{4|\widetilde{s}|}\Big{)}\Big{\}},\quad\ell\in\mathbb{Z}.

\big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})(x-y)\big{\|}_{2}\leq\|B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}\|\,\frac{\gamma}{8L}+|\widetilde{s}|\,\sqrt{n-1}\frac{\gamma}{4|\widetilde{s}|}<\frac{\gamma\sqrt{n}}{2}

\big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})(x-y)\big{\|}_{2}\leq\|B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}\|\,\frac{\gamma}{8L}+|\widetilde{s}|\,\sqrt{n-1}\frac{\gamma}{4|\widetilde{s}|}<\frac{\gamma\sqrt{n}}{2}

P

P

\displaystyle\hskip 85.35826pt\leq|{\mathcal{N}}_{\ell}|\,\max\limits_{x\in{\mathcal{N}}_{\ell}}{\mathbb{P}}\big{\{}\big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\leq\gamma\sqrt{n}\big{\}}

\displaystyle\hskip 85.35826pt\leq{n\choose{\lfloor\delta n\rfloor}}\bigg{(}\frac{C^{\prime}L}{\gamma}\bigg{)}^{\lfloor\delta n\rfloor}\bigg{(}\frac{e}{\widetilde{\varepsilon}}\bigg{)}^{\widetilde{\varepsilon}(n-1)}(1-p)^{(n-1)(1-\widetilde{\varepsilon})}.

\big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\geq|\widetilde{s}|\,\sqrt{n-1}\,\Big{|}\sum_{i=1}^{n}x_{i}\Big{|}-L\sqrt{n}>\gamma\sqrt{n}.

\big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\geq|\widetilde{s}|\,\sqrt{n-1}\,\Big{|}\sum_{i=1}^{n}x_{i}\Big{|}-L\sqrt{n}>\gamma\sqrt{n}.

\displaystyle{\mathbb{P}}\big{\{}

\displaystyle{\mathbb{P}}\big{\{}

\displaystyle\leq\frac{C(L+\gamma)}{\gamma}{n\choose{\lfloor\delta n\rfloor}}\bigg{(}\frac{C^{\prime}L}{\gamma}\bigg{)}^{\lfloor\delta n\rfloor}\bigg{(}\frac{e}{\widetilde{\varepsilon}}\bigg{)}^{\widetilde{\varepsilon}(n-1)}(1-p)^{(n-1)(1-\widetilde{\varepsilon})}+2^{-n}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Singularity of random Bernoulli matrices· youtube

Full text

Singularity of random Bernoulli matrices

Konstantin Tikhomirov

School of Mathematics, Georgia Institute of Technology

[email protected]

Abstract.

For each $n$ , let $M_{n}$ be an $n\times n$ random matrix with independent $\pm 1$ entries. We show that ${\mathbb{P}}\{\mbox{$ M_{n} $is singular}\}=(1/2+o_{n}(1))^{n}$ , which settles an old problem. Some generalizations are considered.

1. Introduction

Let $X_{1},X_{2},\dots,X_{n}$ be independent vectors, each $X_{i}$ uniformly distributed on vertices of the discrete cube $\{-1,1\}^{n}$ . What is the probability that $X_{1},\dots,X_{n}$ are linearly independent?

The question has attracted considerable attention in literature. It can be equivalently restated as a question about singularity of an $n\times n$ matrix $M_{n}$ with independent $\pm 1$ entries. J. Komlós [8] showed that ${\mathbb{P}}\{\mbox{$ M_{n} $is singular}\}=o_{n}(1)$ . Much later, the bound ${\mathbb{P}}\{\mbox{$ M_{n} $is singular}\}\leq 0.999^{n}$ was obtained by J. Kahn, J. Komlós and E. Szemerédi in [6]. The upper bound was sequentially improved to $0.939^{n}$ in [17] and $(3/4+o_{n}(1))^{n}$ in [18] by T. Tao and V.Vu, and to $(1/\sqrt{2}+o_{n}(1))^{n}$ by J. Bourgain, V. Vu and P. Wood in [3].

It has been conjectured that

[TABLE]

(see, for example, [3, Conjecture 1.1], [22, Conjecture 7.1], [23, Conjecture 2.1] as well as some stronger conjectures in [2]). In this paper, we confirm the conjecture and, moreover, provide quantitative small ball probability estimates for the smallest singular value of $M_{n}$ . We extend our analysis to random matrices with Bernoulli( $p$ ) independent entries. Let $1_{n}$ denote the $n$ –dimensional vector of all ones. The main result of this paper can be formulated as follows.

Theorem A.

For every $p\in(0,1/2]$ and $\varepsilon>0$ there are $n_{\text{\tiny{p,$ \varepsilon $}}},C_{\text{\tiny{p,$ \varepsilon $}}}>0$ depending only on $p$ and $\varepsilon$ with the following property. Let $n\geq n_{\text{\tiny{p,$ \varepsilon $}}}$ , and let $B_{n}(p)$ be $n\times n$ random matrix with independent entries $b_{ij}$ , such that ${\mathbb{P}}\{b_{ij}=1\}=p$ and ${\mathbb{P}}\{b_{ij}=0\}=1-p$ . Then for any $s\in[-1,0]$

[TABLE]

It is easy to see that the probability that the first column of $B_{n}(p)$ is equal to zero, is $(1-p)^{n}$ . Thus, the theorem implies that, for a fixed $p\in(0,1/2]$ ,

[TABLE]

and further, when applied with $p=1/2$ and $s=-1/2$ , gives (1).

2. Proof strategy

The proof of upper bounds on the probability of singularity of random discrete matrices (i.e. matrices with entries taking a finite number of values) developed in work [6] and later in [17, 18, 3], uses, as a starting point, the relation

[TABLE]

which holds under rather broad assumptions on the distributions of the discrete random vectors $X_{1},\dots,X_{n}$ [3]. Here, the summation is taken over (finitely many) hyperplanes $V$ such that the probability of $A_{V}$ — the event that $X_{1},\dots,X_{n}$ span $V$ — is non-zero. The set of the hyperplanes $V$ is then partitioned according to the value of the combinatorial dimension which is defined as the number $d(V)\in\frac{1}{n}\mathbb{Z}$ such that $\max\limits_{i}{\mathbb{P}}\{X_{i}\in V\}\in\big{(}C^{-d(V)-1/n},C^{-d(V)}\big{]}$ , where $C$ is some constant depending on the distribution of $X_{i}$ ’s. The sum of probabilities corresponding to a given combinatorial dimension is estimated in terms of probabilities ${\mathbb{P}}\{Y_{i}\in V\}$ for specially constructed random vectors $Y_{i}$ . For some discrete distributions, in particular, for matrices with i.i.d. entries with the probability mass function

[TABLE]

upper bounds for the singularity obtained using the strategy are asymptotically sharp as was shown in [3].

Methods providing strong quantitative information on the smallest singular value of a random matrix were proposed in papers [14, 20]. As a further development, the work [15] established small ball probability estimates on $s_{\min}$ of any $n\times n$ matrix $A_{n}$ with i.i.d normalized subgaussian entries of the form ${\mathbb{P}}\{s_{\min}(A_{n})\leq t/\sqrt{n}\}\leq c^{n}+Ct$ , $t>0$ , where $C>0$ and $c\in(0,1)$ depend only on the subgaussian moment. Thus, [15] recovered the result of [6], possibly with a worse constant. The key notion of [15] is the essential least common denominator (LCD) which measures “unstructuredness” of a fixed vector $(x_{1},\dots,x_{n})$ and is defined as the smallest $\lambda$ such that the distance from $\lambda x$ to the integer lattice $\mathbb{Z}^{n}$ does not exceed $\min(c^{\prime}\lambda\|x\|_{2},c\sqrt{n})$ . LCD can be used to characterize anticoncentration properties of random sums $\sum_{i}a_{ij}x_{i}$ (and in that respect the approach of [15] is related to the earlier paper [20] where the anticoncentration properties of discrete random sums were connected with existence of generalized arithmetic progressions containing almost all of $\{x_{1},\dots,x_{n}\}$ ). It was proved in [15] that for any unit vector $x$ , ${\mathbb{P}}\big{\{}\big{|}\sum_{i}a_{ij}x_{i}\big{|}\leq t\big{\}}\leq Ct+\frac{C}{{\rm LCD}(x)}+e^{-cn}$ for any $t>0$ (see also [16]). This relation, combined with the assertion that the LCD of a random unit vector normal to the linear span of the first $n-1$ columns of $A_{n}$ is exponential in $n$ , already implies that $A_{n}$ is singular with probability at most $e^{-cn}$ . Moreover, an efficient averaging procedure (which we recall below) used in [15] allows to obtain strong quantitative bounds on $s_{\min}(A_{n})$ . The LCD of the random unit normal is estimated with help of an elaborate $\varepsilon$ –net argument.

The approach that we use in this paper is partially based on the methods used in [15] (and in [10]), while the principal difference lies in estimating anticoncentration properties of random sums. The starting point is the relation (taken from [15])

[TABLE]

valid for any $n\times n$ random matrix $A_{n}$ with the distribution invariant under permutations of columns. Here, $Y_{n}$ is a random unit vector orthogonal to the linear span of ${\rm col}_{1}(A_{n}),\dots$ , ${\rm col}_{n-1}(A_{n})$ ; ${\rm Comp}_{n}(\delta,\nu)$ is the set of compressible unit vectors defined as those with the Euclidean distance at most $\nu$ to the set of $\delta n$ –sparse vectors; ${\rm Incomp}_{n}(\delta,\nu)=S^{n-1}\setminus{\rm Comp}_{n}(\delta,\nu)$ is the set of incompressible vectors. In the above formula, $\delta,\nu\in(0,1]$ can be arbitrary, although for our proof we take both parameters small (depending on the choice of $\varepsilon$ in the statement of our main result).

The first summand in the rightmost expression — the small ball probability for $\inf\limits_{x\in{\rm Comp}_{n}}\|Ax\|_{2}$ — can be bounded with help of an argument which is completely standard by now. For Reader’s convenience, we provide the estimate together with a complete proof in Preliminaries.

The second term — ${\mathbb{P}}\big{\{}|\langle{\rm col}_{n}(A_{n}),Y_{n}\rangle|\leq t/\nu\big{\}}$ — crucially depends on the structure of the random normal $Y_{n}$ . In [15], the authors provided an explicit characterization of “unstructured” vectors in terms of the LCD. In contrast, in our approach we make no attempt to obtain a geometric description of vectors with good anticoncentration properties. For each unit vector $x$ and a parameter $L$ , we introduce the threshold ${\mathcal{T}}_{p}(x,L)$ which is defined as the supremum of all $t\in(0,1]$ such that ${\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}x_{i},t\big{)}>Lt$ , where, $b_{1},\dots,b_{n}$ are independent Bernoulli( $p$ ) random variables. Here, ${\mathcal{L}}(\cdot,\cdot)$ denotes the Lévy concentration function, defined as ${\mathcal{L}}(Z,t):=\sup_{\lambda\in\mathbb{R}}{\mathbb{P}}\{|Z-\lambda|\leq t\}$ , $t\geq 0$ , for any real valued random variable $Z$ . The threshold can be viewed as a lower bound of the range of $t$ ’s for which corresponding random linear combination admits “good” anticoncentration estimates. Thus, to show that $B_{n}(p)+s1_{n}1_{n}^{\top}$ is singular with probability $(1-p+o_{n}(1))^{n}$ , it is sufficient to check that the threshold of the random normal $Y_{n}$ is at most $(1-p+o_{n}(1))^{n}$ with probability at least $1-(1-p+o_{n}(1))^{n}$ . Note that this approach can be related to the inverse Littlewood–Offord theory started in [20], although here we are only interested in estimating from above the “size” of the set of potential normal vectors with large thresholds, rather than giving an explicit description of this set (in that respect, our strategy can be related to theorems in [19, Section 3], however, the actual proofs are very different).

To estimate the threshold, we apply a procedure which can be called “inversion of randomness”, and which we briefly describe below. We would like to make the description as non-technical as possible, and for this reason omit any discussion of the choice of parameters and other issues of secondary importance. Take any $T$ with $T^{-1}\ll(1-p+o_{n}(1))^{-n}$ , and let $D_{T}$ be the set of all $(\delta,\nu)$ –incompressible unit vectors with the threshold falling into the interval $[T,2T)$ . In order to show that the probability of the event $\{Y_{n}\in D_{T}\}$ is close to zero, we construct a discrete approximation ${\mathcal{N}}_{T}$ of $D_{T}$ , which is a subset of elements of an $n$ –dimensional lattice having the threshold of order $T$ , and coordinates in a certain range. We then show that the event $\{Y_{n}\in D_{T}\}$ is contained in

[TABLE]

where “almost orthogonal” should be understood in a specific sense which we prefer not to discuss here. This implies

[TABLE]

and the proof is reduced to efficiently bounding from above the cardinality of the discretization ${\mathcal{N}}_{T}$ . The “inversion of randomness” is used to solve the problem. We consider a random vector $\xi$ uniformly distributed on a subset of the lattice (whose cardinality is much easier to compute) containing ${\mathcal{N}}_{T}$ , and show that with probability superexponentially close to one, the threshold of $\xi$ is much less than $T$ , so that $\xi\notin{\mathcal{N}}_{T}$ . This allows to bound $|{\mathcal{N}}_{T}|$ in terms of the cardinality of the range of $\xi$ , times the factor $e^{-\omega(n)}$ . Thus, instead of studying anticoncentration of random sums with fixed coefficients satisfying certain structural assumptions, we consider typical anticoncentration properties of sums with random coefficients $\xi_{i}$ . It will be convenient to work with the expression

[TABLE]

which is interpreted as the Lévy concentration function with respect to the randomness of the vector $b=(b_{1},\dots,b_{n})$ of independent Bernoulli( $p$ ) components.

Let us state, as an illustration, a corollary of the main technical result of this paper, Theorem 4.2, which deals with rescaled vectors distributed on the integer lattice $\mathbb{Z}^{n}$ :

Theorem B.

Let $\delta\in(0,1]$ , $p\in(0,1/2]$ , $\varepsilon\in(0,p)$ , $M\geq 1$ . There exist $n_{\text{\tiny B}}=n_{\text{\tiny B}}(\delta,\varepsilon,p,M)\geq 1$ depending on $\delta,\varepsilon,p,M$ and $L_{\text{\tiny B}}=L_{\text{\tiny B}}(\delta,\varepsilon,p)>0$ depending only on $\delta,\varepsilon,p$ (and not on $M$ ) with the following property. Take $n\geq n_{\text{\tiny B}}$ , $1\leq N\leq(1-p+\varepsilon)^{-n}$ , and let

[TABLE]

Further, assume that a random vector $\xi=(\xi_{1},\dots,\xi_{n})$ is uniform on $\mathcal{A}$ . Then

[TABLE]

Here, ${\mathcal{L}}_{b}(\cdot,\cdot)$ denotes the Lévy concentration function with respect to $b=(b_{1},\dots,b_{n})$ , a random vector with independent Bernoulli( $p$ ) components.

The crucial point of this theorem is that $L_{\text{\tiny B}}$ does not depend on $M$ . Essentially, this means that the probability can be made superexponentially small in $n$ as $n$ grows, while $L_{\text{\tiny B}}$ stays constant. Because of the “inversion of randomness”, a statement of this kind is translated into bounds for the cardinality of the discretization of the sets of vectors $D_{T}$ with large thresholds considered above.

3. Preliminaries

Denote by $\|\cdot\|_{q}$ the standard $\ell_{q}$ –norm, so that

[TABLE]

In particular, by $\ell_{1}(\mathbb{Z})$ we denote the space of all functions $g:\mathbb{Z}\to\mathbb{R}$ with $\sum_{i}|g(i)|<\infty$ . We will say that a mapping $g:\mathbb{Z}\to\mathbb{R}$ is $L$ –Lipschitz for some $L>0$ if $|g(t)-g(t+1)|\leq L$ for all $t\in\mathbb{Z}$ .

The unit Euclidean sphere in $\mathbb{R}^{n}$ will be denoted by $S^{n-1}$ . The support of a vector $y=(y_{1},\dots,y_{n})\in\mathbb{R}^{n}$ is ${\rm supp\,}y:=|\{i\leq n:\;y_{i}\neq 0\}|$ . The $n$ –dimensional vector of all ones is denoted by $1_{n}$ . For an $n\times n$ matrix $A$ , ${\rm col}_{i}(A)$ and ${\rm row}_{i}(A)$ are its columns and rows, respectively, and $\|A\|$ is the spectral norm of $A$ . The smallest singular value of $A$ is denoted by $s_{\min}(A)$ . We will rely on the standard representation $s_{\min}(A)=\min\limits_{x\in S^{n-1}}\|Ax\|_{2}$ .

The indicator of a subset of $\mathbb{R}$ or an event $S$ is denoted by ${\bf 1}_{S}$ . For any positive integer $m$ , $[m]$ denotes the integer interval $\{1,2,\dots,m\}$ . Further, for any two subsets $I,J\subset\mathbb{Z}$ , we write $I<J$ if $i<j$ for all $i\in I$ and $j\in J$ . The Minkowski sum of two subsets $A,B$ of $\mathbb{R}^{m}$ is defined as the set of all vectors of the form $a+b$ , where $a\in A$ and $b\in B$ . For a real number $r$ , by $\lfloor r\rfloor$ we denote the largest integer less than or equal to $r$ , and by $\lceil r\rceil$ , the smallest integer greater than or equal to $r$ .

Everywhere in this paper, $B_{n}(p)$ is the matrix with i.i.d. Bernoulli( $p$ ) entries, i.e. random variables taking value $1$ with probability $p$ and [math] with probability $1-p$ . Further, by $B_{n}^{1}(p)$ we denote the $(n-1)\times n$ matrix obtained from $B_{n}(p)$ by removing the last row.

The Lévy concentration function ${\mathcal{L}}(\xi,\cdot)$ of a random variable $\xi$ is defined by

[TABLE]

We will need the following classical inequality:

Lemma 3.1 (Lévy–Kolmogorov–Rogozin, [13]).

Let $\xi_{1},\dots,\xi_{m}$ be independent real valued random variables. Then for any real numbers $r_{1},\dots,r_{m}>0$ and $r\geq\max_{i\leq m}r_{i}$ ,

[TABLE]

Here, $C_{\text{\tiny\ref{l: lkr}}}>0$ is a universal constant.

We recall some definitions from [15]. Given $\delta\in(0,1]$ and $\nu\in(0,1]$ , denote by ${\rm Comp}_{n}(\delta,\nu)$ the set of all unit vectors $x\in\mathbb{R}^{n}$ such that there is $y=y(x)\in\mathbb{R}^{n}$ with $|{\rm supp\,}y|\leq\delta n$ and $\|x-y\|_{2}\leq\nu$ (in [15], such vectors are called compressible). Further, we define the complementary set of incompressible vectors ${\rm Incomp}_{n}(\delta,\nu):=S^{n-1}\setminus{\rm Comp}_{n}(\delta,\nu)$ . We note that a similar partition of the unit vectors was used earlier in [10].

Following an approach developed in [15], we can write for any random matrix $A_{n}$ with the distribution invariant under permutations of columns

[TABLE]

where $\delta,\nu$ are arbitrary numbers in $(0,1)$ (see [15, formula (3.2) and Lemma 3.5]), and $Y_{n}$ is a random unit vector orthogonal to the first $n-1$ columns of $A_{n}$ . A satisfactory estimate for the first term for sufficiently small $\delta$ and $\nu$ can be obtained as a simple compilation of known results (see Proposition 3.6 below). The following is a version of the tensorization lemma from [15].

Lemma 3.2.

Let $\xi_{1},\dots,\xi_{m}$ be independent random variables.

(1)

Assume that for some $\varepsilon_{0}>0$ , $K>0$ and all $\varepsilon\geq\varepsilon_{0}$ and $k\leq m$ we have

[TABLE]

Then for each $\varepsilon\geq\varepsilon_{0}$ ,

[TABLE]

where $C_{\text{\tiny\ref{l: tensorization}}}>0$ is a universal constant.

(2)

Assume that for some $\eta>0$ , $\tau>0$ and all $k\leq m$ we have ${\mathbb{P}}\big{\{}|\xi_{k}|\leq\eta\}\leq\tau$ . Then for every $\varepsilon\in(0,1]$ ,

[TABLE]

Remark 3.3.

The second assertion of the lemma follows immediately by noting that the condition $\|(\xi_{1},\xi_{2},\dots,\xi_{m})\|_{2}\leq\eta\sqrt{\varepsilon m}$ implies that $|\{i\leq m:\,|\xi_{i}|>\eta\}|\leq\varepsilon m$ . For a proof of the first assertion, see [15].

Further, we recall a standard estimate for the spectral norm of random matrices with i.i.d. subgaussian entries (for a proof, see, for example, [21, Theorem 5.39]).

Lemma 3.4.

For any $M,L\geq 1$ there is $C_{\text{\tiny M,L}}>0$ depending only on $M$ and $L$ with the following property. Let $n\geq 1$ and let $A$ be an $n\times n$ random matrix with i.i.d. entries $a_{ij}$ of zero mean, and such that $(\mathbb{E}|a_{ij}|^{q})^{1/q}\leq M\sqrt{q}$ for all $q\geq 1$ . Then with probability at least $1-\exp(-Ln)$ we have $\|A\|\leq C_{\text{\tiny M,L}}\sqrt{n}$ .

The following is an easy consequence of Lemma 3.2:

Lemma 3.5.

For any $p\in(0,1/2]$ there is $\gamma_{\text{\tiny\ref{l: aux single vector}}}>0$ which may only depend on $p$ , such that for every $\varepsilon\in(0,1]$ , $n\geq 2$ and arbitrary $s\in\mathbb{R}$ and $x\in S^{n-1}$ ,

[TABLE]

Proof.

Let $b_{1},\dots,b_{n}$ be i.i.d. Bernoulli( $p$ ) random variables. It is not difficult to check that

[TABLE]

for some $r>0$ which may only depend on $p$ . For a proof of this fact, one may consider two possibilities: first when the vector $x$ has a “large” $\ell_{\infty}$ –norm, in which case the assertion follows by conditioning on all $b_{i}$ ’s except the one corresponding to the largest component of $x$ , and, second, when the vector $x$ has a “small” $\ell_{\infty}$ –norm in which case, by the Central Limit Theorem, the random linear combination is approximately normally distributed, see, for example, [4, Lemma 2.1].

Applying the second assertion of the Tensorization Lemma to (3), we get the statement. ∎

By combining Lemma 3.5 with an $\varepsilon$ -net argument, we obtain a small ball probability estimate for compressible vectors. The only difference from a standard argument here is due to the fact that for $s\neq-p$ , the matrix $B_{n}^{1}(p)+s\,1_{n-1}1_{n}^{\top}$ has typical spectral norm of order $\Theta((s+p)n)$ rather than $\Theta(\sqrt{n})$ in the simplest setting of a centered random matrix with normalized independent entries. The net therefore has to be made “denser” in the direction $1_{n}$ .

Proposition 3.6.

For any $\varepsilon\in(0,1]$ and $p\in(0,1/2]$ there are $n_{\text{\tiny\ref{l: compress}}}\in\mathbb{N}$ , $\gamma_{\text{\tiny\ref{l: compress}}}>0$ and $\delta_{\text{\tiny\ref{l: compress}}},\nu_{\text{\tiny\ref{l: compress}}}\in(0,1)$ depending only on $\varepsilon$ and $p$ such that for $n\geq n_{\text{\tiny\ref{l: compress}}}$ and arbitrary $s\in\mathbb{R}$ ,

[TABLE]

Proof.

Choose any $\varepsilon\in(0,1]$ and $p\in(0,1/2]$ , and fix $s\in\mathbb{R}$ . It will be convenient to work with parameter $\widetilde{s}:=s+p$ . Without loss of generality, we can assume that $\widetilde{s}\neq 0$ . By Lemma 3.4, there is $L>0$ which may only depend on $p$ such that for every $n\geq 2$ the event

[TABLE]

has probability at least $1-2^{-n}$ .

Given an $\widetilde{\varepsilon}\in(0,1]$ (which will be chosen later), define

[TABLE]

We shall partition the set ${\rm Comp}_{n}(\delta,\nu)$ into subsets $S_{\ell}$ of the form

[TABLE]

First, we observe that a standard volumetric argument, together with the definition of compressible vectors, implies that for any $\ell\in\mathbb{Z}$ the set $S_{\ell}$ admits a Euclidean $\big{(}\frac{\gamma}{16L}+2\nu\big{)}$ –net ${\mathcal{N}}_{\ell}\subset S_{\ell}$ of cardinality at most ${n\choose{\lfloor\delta n\rfloor}}\big{(}\frac{C^{\prime}L}{\gamma}\big{)}^{\lfloor\delta n\rfloor}$ , for some universal constant $C^{\prime}>0$ . By the definition of ${\mathcal{N}}_{\ell}$ and $S_{\ell}$ , for any $x\in S_{\ell}$ there is $y\in{\mathcal{N}}_{\ell}$ such that $\|x-y\|_{2}\leq\big{(}\frac{\gamma}{16L}+2\nu\big{)}=\frac{\gamma}{8L}$ and $\big{|}\sum_{i=1}^{n}(x_{i}-y_{i})\big{|}\leq\frac{\gamma}{4|\widetilde{s}|}$ , implying that

[TABLE]

everywhere on $\mathcal{E}$ . Hence,

[TABLE]

Observe further that for all vectors $x\in S^{n-1}$ with $\big{|}\sum_{i=1}^{n}x_{i}\big{|}\geq\frac{2L+2\gamma}{|\widetilde{s}|}$ , everywhere on the event $\mathcal{E}$ we have

[TABLE]

Thus, everywhere on $\mathcal{E}$ , $\big{\|}(B_{n}^{1}(p)-p\,1_{n-1}1_{n}^{\top}+\widetilde{s}\,1_{n-1}1_{n}^{\top})x\big{\|}_{2}\geq\gamma\sqrt{n}$ for all $x\in S_{\ell}$ with $\ell\geq\frac{8(L+\gamma)}{\gamma}$ or $\ell\leq-\frac{8(L+\gamma)}{\gamma}-1$ . Combining all the above estimates, we obtain for some universal constant $C>0$ :

[TABLE]

It remains to note that by choosing $\widetilde{\varepsilon}=\widetilde{\varepsilon}(\varepsilon)$ sufficiently small, we can guarantee that the right hand side of the above inequality is less than

[TABLE]

for every $n\geq 2$ . Then the desired estimate will follow for all sufficiently large $n$ satisfying $\frac{C(L+\gamma)}{\gamma}\big{(}1-p+\frac{\varepsilon}{2}\big{)}^{n-1}+2^{-n}\leq\big{(}1-p+\varepsilon\big{)}^{n}$ . ∎

4. Random averaging in $\ell_{1}(\mathbb{Z})$

The main goal of this section is to provide upper bounds on the cardinalities of discretizations of sets of vectors with a given threshold ${\mathcal{T}}_{p}(\cdot,L)$ , discussed in the second part of Section 2. According to our “inversion of randomness”, we consider a random vector uniformly distributed on a subset of the integer lattice $\mathbb{Z}^{n}$ , and want to show that with probability $1-e^{-\omega(n)}$ the scalar product of this vector with a vector of independent Bernoulli( $p$ ) variables has a small threshold value (with respect to the randomness of the Bernoulli vector). First, we define the range of the random vector on the lattice.

Let $N,n\geq 1$ be some integers and let $\delta\in(0,1]$ and $K\geq 1$ be some real numbers. We say that a subset $\mathcal{A}\subset\mathbb{Z}^{n}$ is $(N,n,K,\delta)$ –admissible if

•

$\mathcal{A}=A_{1}\times A_{2}\times\dots\times A_{n}$ , where every $A_{i}$ ( $i=1,2,\dots,n$ ) is an origin-symmetric subset of $\mathbb{Z}$ ;

•

$A_{i}$ is an integer interval of cardinality at least $2N+1$ for every $i>\delta n$ ;

•

$A_{i}$ is a union of two integer intervals of total cardinality at least $2N$ and $A_{i}\cap[-N,N]=\emptyset$ for all $i\leq\delta n$ ;

•

$|A_{1}|\cdot|A_{2}|\cdot\dots\cdot|A_{n}|\leq(KN)^{n}$ ;

•

$\max A_{i}<n\,N$ for all $1\leq i\leq n$ .

Remark 4.1.

The condition $A_{i}\cap[-N,N]=\emptyset$ for $i\leq\delta n$ , subject to appropriate rescaling, is equivalent to the fact that the “potential” normal vectors we consider are $(\delta,\nu)$ –incompressible, hence at least $\lfloor\delta n\rfloor$ components of those vectors are separated from zero by $\nu/\sqrt{n}$ .

Let $\mathcal{A}=A_{1}\times A_{2}\times\dots\times A_{n}\subset\mathbb{Z}^{n}$ be an $(N,n,K,\delta)$ –admissible set, and let $f(t)$ be any real valued function on $\mathbb{Z}$ . Fix any $p\in(0,1)$ , and assume that $X_{1},X_{2},\dots,X_{n}$ are independent integer random variables, where each $X_{i}$ is uniform in $A_{i}$ . For every $\ell\leq n$ , we define a random function $f_{\mathcal{A},p,\ell}$ by

[TABLE]

$t\in\mathbb{Z}$ , where $\mathbb{E}_{b}$ denotes the expectation with respect to the randomness of the vector $b=(b_{1},\dots,b_{n})$ with independent Bernoulli( $p$ ) components. The central statement of the section is the following theorem.

Theorem 4.2.

For any $\delta\in(0,1]$ , $p\in(0,1/2]$ , $\varepsilon\in(0,p)$ , $K,M\geq 1$ there are $n_{\text{\tiny\ref{th: averaging}}}=n_{\text{\tiny\ref{th: averaging}}}(\delta,\varepsilon,p,K,M)\geq 1$ , $\eta_{\text{\tiny\ref{th: averaging}}}=\eta_{\text{\tiny\ref{th: averaging}}}(\delta,\varepsilon,p,K,M)\in(0,1]$ depending on $\delta,\varepsilon,p,K,M$ and $L_{\text{\tiny\ref{th: averaging}}}=L_{\text{\tiny\ref{th: averaging}}}(\delta,\varepsilon,p,K)>0$ depending only on $\delta,\varepsilon,p,K$ (and not on $M$ ) with the following property. Take $n\geq n_{\text{\tiny\ref{th: averaging}}}$ , $1\leq N\leq(1-p+\varepsilon)^{-n}$ , let $\mathcal{A}$ be an $(N,n,K,\delta)$ –admissible set and $f(t)$ be a non-negative function in $\ell_{1}(\mathbb{Z})$ with $\|f\|_{1}=1$ and such that $\log_{2}f$ is $\eta_{\text{\tiny\ref{th: averaging}}}$ –Lipschitz. Then, with $f_{\mathcal{A},p,n}$ defined above, we have

[TABLE]

The crucial feature of the theorem and the most important technical element of this paper, is that the bound $L_{\text{\tiny\ref{th: averaging}}}(N\sqrt{n})^{-1}$ on the $\ell_{\infty}$ –norm of the averaged function does not depend on the parameter $M$ which controls the probability estimate. In other words, for a given choice of $\delta,\varepsilon,p,K$ , which determine the value of $L_{\text{\tiny\ref{th: averaging}}}$ , the probability bound can be made superexponentially small in $n$ .

It is not difficult to check that with the only assumption $\|f\|_{1}=1$ on the function $f$ the above statement is false. For example, take $f$ to be the indicator of $\{0\}$ , assume that $\mathcal{A}=\{-2N,-2N+1,\dots,-N-1,N+1,\dots,2N\}^{\lfloor\delta n\rfloor}\times\{-N,-N+1,\dots,N\}^{n-\lfloor\delta n\rfloor}$ . It can be shown that for any natural $q<N$ , on the one hand, the event $\mathcal{E}_{q}:=\{X_{i}\in q\,\mathbb{Z},\;i=1,2,\dots,n\}$ has probability at least $(2q)^{-n}$ , and, on the other hand, everywhere on $\mathcal{E}_{q}$ we have $\|f_{\mathcal{A},p,n}\|_{\infty}\geq c_{p}q\,(N\sqrt{n})^{-1}$ , because $f_{\mathcal{A},p,n}$ is supported on $q\,\mathbb{Z}$ and (by standard concentration results) has most of its mass located within a (random) integer interval of length $O_{p}(N\sqrt{n})$ . Thus, the probability cannot be made superexponentially small in $n$ without taking $q$ , hence the lower bound for $\|f_{\mathcal{A},p,n}\|_{\infty}\cdot(N\sqrt{n})$ , to infinity. The condition that the logarithm of the function is $\eta_{\text{\tiny\ref{th: averaging}}}$ –Lipschitz, employed in the theorem, is designed to rule out such situations.

Before proving the theorem, we shall consider the corollary which was (in a somewhat different form) stated in the introduction as Theorem B and which will be used in our net-argument in the next section:

Corollary 4.3.

Let $\delta,\varepsilon\in(0,1]$ , $p\in(0,1/2]$ , $K,M\geq 1$ . There exist $n_{\text{\tiny\ref{cor: anticoncentration}}}=n_{\text{\tiny\ref{cor: anticoncentration}}}(\delta,\varepsilon,p,K,M)\geq 1$ depending on $\delta,\varepsilon,p,K,M$ and $L_{\text{\tiny\ref{cor: anticoncentration}}}=L_{\text{\tiny\ref{cor: anticoncentration}}}(\delta,\varepsilon,p,K)>0$ depending only on $\delta,\varepsilon,p,K$ (and not on $M$ ) with the following property. Take $n\geq n_{\text{\tiny\ref{cor: anticoncentration}}}$ , $1\leq N\leq(1-p+\varepsilon)^{-n}$ , and let $\mathcal{A}$ be an $(N,n,K,\delta)$ –admissible set. Further, assume that $b_{1},b_{2},\dots,b_{n}$ are i.i.d Bernoulli( $p$ ) random variables. Then

[TABLE]

Proof.

Take $n\geq\max\big{(}n_{\text{\tiny\ref{th: averaging}}},1/\eta_{\text{\tiny\ref{th: averaging}}}^{2}\big{)}$ , and let $1\leq N\leq(1-p+\varepsilon)^{-n}$ , and $\mathcal{A}$ be an $(N,n,K,\delta)$ –admissible set. Define the function $f\in\ell_{1}(\mathbb{Z})$ as

[TABLE]

where $m_{0}=\sum_{t\in\mathbb{Z}}2^{-|t|/\sqrt{n}}$ . Obviously, $\|f\|_{1}=1$ , and $\log_{2}f$ is $n^{-1/2}$ –Lipschitz, hence, by the assumptions on $n$ , $\log_{2}f$ is $\eta_{\text{\tiny\ref{th: averaging}}}$ –Lipschitz.

Applying Theorem 4.2 to $f$ , we get

[TABLE]

The definition of $f_{\mathcal{A},p,n}$ allows to rewrite the above inequality as

[TABLE]

On the other hand, since

[TABLE]

for some universal constant $c>0$ , the last relation implies

[TABLE]

For every $t$ and $x=(x_{1},x_{2},\dots,x_{n})$ , the expression

[TABLE]

is the probability that the random sum $t+\sum_{j=1}^{n}b_{j}x_{j}$ falls into the interval $[-\sqrt{n}-1,\sqrt{n}+1]$ . Thus, together with elementary relation $\sup\limits_{t\in\mathbb{Z}}{\mathbb{P}}\{|t+Y|\leq H+1\}\geq{\mathcal{L}}(Y,H)$ , valid for any $H\geq 0$ and any random variable $Y$ , the previous inequality gives

[TABLE]

The statement follows. ∎

In our proof of Theorem 4.2, we will gradually improve delocalization estimates for the functions $f_{\mathcal{A},p,\ell}$ . Our first (simple) step — Lemma 4.4 — is to obtain estimates on the $\ell_{1}$ –norm of the truncated function $f_{\mathcal{A},p,\ell}\,{\bf 1}_{I}$ (with $\ell$ of order $n$ ) for an arbitrary integer interval $I$ of length at most $N$ . Upper bounds of the order $O_{p,\delta}(\|f\|_{1}\,/\sqrt{n})$ will follow from the Lévy–Kolmogorov–Rogozin inequality stated in the preliminaries as Lemma 3.1. At the second step, Proposition 4.5 below, we prove a weaker version of Theorem 4.2 where the parameter $L$ is allowed to depend on $M$ . At the third step, we remove the dependence of $L$ on $M$ by using the Lipschitzness of $f$ . A discussion of that part of the proof is given after Proposition 4.5.

Lemma 4.4.

There is a universal constant $C_{\text{\tiny\ref{l: aux simple ac}}}>0$ with the following property. Let $p\in(0,1)$ , $\delta_{0}\in(0,1)$ , let $f\in\ell_{1}(\mathbb{Z})$ be a non-negative function with $\|f\|_{1}=1$ , and let $\mathcal{A}$ be an $(N,n,K,\delta)$ –admissible set for some parameters $N$ , $\delta\in[\delta_{0},1)$ , $n\geq 1/\delta_{0}$ and $K$ . Further, let $\ell>\delta_{0}n$ . Then deterministically $\sum\limits_{t\in I}f_{\mathcal{A},p,\ell}(t)\leq\frac{C_{\text{\tiny\ref{l: aux simple ac}}}}{\sqrt{\delta_{0}n\,\min(p,1-p)}}$ for any integer interval $I\subset\mathbb{Z}$ with $|I|\leq N$ . In turn, this implies

[TABLE]

for any integer interval $J$ of cardinality at least $N$ .

Proof.

Let $X_{1},\dots,X_{\ell}$ be the random variables from (4). Fix any realization of $X_{1},\dots,X_{\ell}$ (so that $|X_{i}|>N$ for all $i\leq\delta_{0}n$ , by the definition of an admissible set and since $\delta\geq\delta_{0}$ ), and any integer interval $I$ of cardinality at most $N$ . Since

[TABLE]

we obtain

[TABLE]

For any $t\in\mathbb{Z}$ ,

[TABLE]

where $b_{1},\dots,b_{\ell}$ are Bernoulli( $p$ ) random variables jointly independent with $X_{1},\dots,X_{\ell}$ . It remains to note that the Lévy–Kolmogorov–Rogozin inequality (Lemma 3.1), together with the condition $|X_{i}|>N$ for all $i\leq\delta_{0}n$ , implies that for every $t\in\mathbb{Z}$ ,

[TABLE]

for some universal constant $C>0$ . The result follows. ∎

Proposition 4.5.

For any $M>0$ , $p\in(0,1/2]$ , $\delta\in(0,1)$ and $\varepsilon\in(0,p)$ there are $L_{\text{\tiny\ref{p: rough decay}}}=L_{\text{\tiny\ref{p: rough decay}}}(M,p,\delta,\varepsilon)>0$ and $n_{\text{\tiny\ref{p: rough decay}}}=n_{\text{\tiny\ref{p: rough decay}}}(M,p,\delta,\varepsilon)\in\mathbb{N}$ (depending on $M$ , $p$ , $\delta$ and $\varepsilon$ ) with the following property. Let $f\in\ell_{1}(\mathbb{Z})$ be a non-negative function with $\|f\|_{1}=1$ , let $n\geq n_{\text{\tiny\ref{p: rough decay}}}$ , $n/2\leq\ell\leq n$ , and let $\mathcal{A}$ be an $(N,n,K,\delta)$ –admissible set for some parameters $N\leq 2^{n}$ and $K>0$ . Then

[TABLE]

where $f_{\mathcal{A},p,\ell}$ is defined by (4).

The crucial difference between the above statement and Theorem 4.2 is that $L_{\text{\tiny\ref{p: rough decay}}}$ in the proposition is allowed to depend on $M$ . The proof essentially follows by estimating probabilities that $f_{\mathcal{A},p,\ell}(t)>\max\big{(}L_{\text{\tiny\ref{p: rough decay}}}(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}$ for a fixed $t\in\mathbb{Z}$ and taking the union bound over $t$ , although the actual argument is more involved. We will need the following definitions.

Let $R>0$ be a parameter, let $N$ , $\mathcal{A}$ , $f$ , $\ell$ and $p$ be as in the above proposition, and let $m\in\{1,2,\dots,\ell\}$ . We say that a point $t\in\mathbb{Z}$ decays at time $m$ if

[TABLE]

Further, given any $t\in\mathbb{Z}$ and a sequence $(v_{i})_{i=1}^{\ell}\in\{0,1\}^{\ell}$ , the descendant sequence for $t$ with respect to $(v_{i})_{i=1}^{\ell}$ is a random sequence $(t_{i})_{i=0}^{\ell}$ , where $t_{i}=t-\sum_{j=1}^{i}v_{j}X_{j}$ , $1\leq i\leq\ell$ (and where we set $t_{0}:=t$ ). The connection of the above statement with these definitions is provided by the following fact: the event that the $\ell_{\infty}$ –norm of $f_{\mathcal{A},p,\ell}$ is “large” is contained within the event that there exists a descendant sequence such that a proportional number of its elements do not decay. More precisely, we have

Lemma 4.6.

Let $N$ , $\mathcal{A}$ , $f$ , $\ell$ , $\varepsilon$ and $p$ be as in Proposition 4.5, let $L>0$ , and set $R:=\frac{\varepsilon L}{2p}$ . Define event $\mathcal{E}$ as the subset the probability space such that there exists a sequence $(v_{i})_{i=1}^{\ell}\in\{0,1\}^{\ell}$ and a point $t\in\mathbb{Z}$ so that the descendant sequence $(t_{i})_{i=0}^{\ell}$ for $t$ with respect to $(v_{i})_{i=1}^{\ell}$ satisfies

[TABLE]

Then $\mathcal{E}\supset\big{\{}\|f_{\mathcal{A},p,\ell}\|_{\infty}>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}\big{\}}$ .

Proof.

Fix a realization of $X_{1},\dots,X_{\ell}$ such that

[TABLE]

(if such a realization does not exist then there is nothing to prove). We will construct a sequence of integers $(t_{i})_{i=0}^{\ell}$ inductively in inverse order as follows. Take $t_{\ell}$ to be any integer such that $f_{\mathcal{A},p,\ell}(t_{\ell})>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}$ . At $(\ell-i+1)$ –st step ( $1\leq i\leq\ell$ ) we assume that $t_{i}$ has been defined, and satisfies $f_{\mathcal{A},p,\ell}(t_{i})>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}$ . In view of the relation

[TABLE]

which follows immediately from the definition of $f_{\mathcal{A},p,i}$ , we get that $f_{\mathcal{A},p,i-1}(t_{i}+v_{i}X_{i})\geq f_{\mathcal{A},p,i}(t_{i})$ for some $v_{i}\in\{0,1\}$ . Then we set $t_{i-1}:=t_{i}+v_{i}X_{i}$ .

Clearly, the sequence $(t_{i})_{i=0}^{\ell}$ constructed this way, is the descendant sequence for $t_{0}$ with respect to $(v_{i})_{i=1}^{\ell}$ , which satisfies the conditions

(a)

$f_{\mathcal{A},p,i-1}(t_{i-1})\geq f_{\mathcal{A},p,i}(t_{i})$ for all $1\leq i\leq\ell$ ;

(b)

$f_{\mathcal{A},p,\ell}(t_{\ell})>\max\big{(}L(N\sqrt{n})^{-1},(1-p+\varepsilon)^{\ell}\,\|f\|_{\infty}\big{)}$ .

We will show that these conditions imply (5). Assume that $1\leq i\leq\ell$ is such that $t_{i-1}$ decays at time $i$ . According to (6) and the relation between $t_{i}$ and $t_{i-1}$ , we have

[TABLE]

By our definition of decay at time $i$ , both $f_{\mathcal{A},p,i-1}(t_{i-1}+X_{i})$ and $f_{\mathcal{A},p,i-1}(t_{i-1}-X_{i})$ are less than $\frac{R}{N\sqrt{n}}$ , hence less than $\frac{\varepsilon}{2p}\,f_{\mathcal{A},p,i-1}(t_{i-1})$ , by the relation between $L$ and $R$ and conditions (a), (b). Thus, one of the values $f_{\mathcal{A},p,i-1}(t_{i-1}-v_{i}X_{i})$ or $f_{\mathcal{A},p,i-1}(t_{i-1}+(1-v_{i})X_{i}\big{)}$ is at most $\frac{\varepsilon}{2p}\,f_{\mathcal{A},p,i-1}(t_{i-1})$ while the other is equal to $f_{\mathcal{A},p,i-1}(t_{i-1})$ . This gives

[TABLE]

Applying the last relation for all $i$ where there is a decay and using the monotonicity of the sequence $\big{(}f_{\mathcal{A},p,j}(t_{j})\big{)}_{j=0}^{\ell}$ , we get for $u=|\{1\leq i\leq\ell:\;t_{i-1}\mbox{ decays at time$ i $}\}|$ :

[TABLE]

whence

[TABLE]

This implies the required lower bound for $\ell-u=|\{1\leq i\leq\ell:\;t_{i-1}\mbox{ does not decay at time$ i $}\}|$ . ∎

Proof of Proposition 4.5.

Let $L>0$ be a parameter to be chosen later. Set

[TABLE]

We will assume that $\eta n/2\geq 1$ . Let $X_{1},X_{2},\dots,X_{\ell}$ be independent random variables, each $X_{i}$ uniform on $A_{i}$ , where $\mathcal{A}=A_{1}\times A_{2}\times\dots\times A_{n}$ .

The proposition follows by applying Lemma 4.6 and a union bound. Observe that for any point $t\in\mathbb{Z}$ such that the last element of a descendant sequence $(t_{i})_{i=0}^{\ell}$ (with respect to some sequence in $\{0,1\}^{\ell}$ and with $t_{0}=t$ ) satisfies $f_{\mathcal{A},p,\ell}(t_{\ell})>(N\sqrt{n})^{-1}$ , we have

[TABLE]

Indeed, the definition of the descendant sequence implies that for some $(\widetilde{v}_{i})_{i=1}^{\ell}\in\{0,1\}^{\ell}$ ,

[TABLE]

while at the same time the condition $f_{\mathcal{A},p,\ell}(t_{\ell})>(N\sqrt{n})^{-1}$ and the definition of $f_{\mathcal{A},p,\ell}$ implies that $f(t_{\ell}+x_{1}+x_{2}\dots+x_{\ell})>(N\sqrt{n})^{-1}$ for some $x_{i}\in A_{i}\cup\{0\}$ , $i=1,\dots,\ell$ , hence

[TABLE]

Set

[TABLE]

and observe that, in view of the upper bound on $\max A_{i}$ ’s from the definition of an admissible set, and the assumption $\|f\|_{1}=1$ ,

[TABLE]

Set $H:=\eta n$ . Then, with the event $\mathcal{E}$ defined in Lemma 4.6, we can write

[TABLE]

Finally, fix any $I\subset[\ell]$ with $|I|=\lceil H\rceil$ , $t\in D$ and $(v_{i})_{i=1}^{\ell}\in\{0,1\}^{\ell}$ . Let $(t_{i})_{i=0}^{\ell}$ be the (random) descendant sequence for $t$ with respect to $(v_{i})$ (note that $t_{i}$ is measurable w.r.t. $X_{1},\dots,X_{i}$ ). Take any $i\in I$ with $i-1>H/2$ . Conditioned on any realization of $X_{1},\dots,X_{i-1}$ , the variable $t_{i-1}+X_{i}$ is uniform on $t_{i-1}+A_{i}$ , and

[TABLE]

where at the last step we applied Lemma 4.4 with $\delta_{0}:=\eta/2$ and used that $A_{i}$ is either an integer interval or a union of two integer intervals. The same estimate is valid for

[TABLE]

Hence, by Markov’s inequality,

[TABLE]

Applying this estimate for all $i\in I\setminus[1,H/2+1]$ , we obtain

[TABLE]

whence

[TABLE]

where, we recall, $H=\eta n$ . Finally, we observe that by choosing $L=L(M,p,\delta,\varepsilon)$ large enough, we can make the last expression less than $\exp(-Mn)$ for all sufficiently large $n$ . This completes the proof of the proposition. ∎

The above result is too weak to be useful for our purposes. The rest of the section is devoted to “refining” the proposition by removing the dependence on $M$ from the lower bound on the $\ell_{\infty}$ –norm of the averaged function.

Let us informally describe the idea behind the argument and provide some simple examples. The magnitude of the $\ell_{\infty}$ –norm of $f_{\mathcal{A},p,n}$ essentially depends on how efficient in removing spikes is the averaging step given by the relation $f_{\mathcal{A},p,i}(t)=(1-p)\,f_{\mathcal{A},p,i-1}(t)+p\,f_{\mathcal{A},p,i-1}(t+X_{i})$ . One may hope that if at every step $i$ , the number of spikes (coordinates with large magnitudes) is decreased significantly with a probability close to one then the resulting function $f_{\mathcal{A},p,n}$ would have a small $\ell_{\infty}$ –norm with a very large probability (superexponentially close to one).

For a moment, it will be convenient to drop the assumption of a bounded $\ell_{1}$ –norm. Consider a family of functions $g_{N,d,I,\eta}$ on $\mathbb{Z}$ , indexed by natural numbers $N,d$ , an integer interval $I$ , and $\eta>0$ , and defined as

[TABLE]

where we impose the following restrictions on parameters:

•

$N\geq d$ ;

•

The function $g_{N,d,I,\eta}$ is “essentially non-constant” in the sense that $\|g_{N,d,I,\eta}{\bf 1}_{J}\|_{1}\leq\frac{1}{2}|J|$ for any integer interval $J$ of length at least $N$ .

Note that $\log g_{N,d,I,\eta}$ is $\eta$ –Lipschitz and that the second assumption implies $|I|\leq d/2$ . Assume that a random variable $X$ is uniformly distributed on $\{0,1,\dots,N\}$ , and define the random average

[TABLE]

We are interested in estimating the proportion $\mathcal{R}_{N,d,I,\eta}$ of spikes preserved by the averaging; with

[TABLE]

A simple computation taking into account the condition $|I|\leq d/2$ , gives

[TABLE]

and, for $\varepsilon=0$ ,

[TABLE]

Thus, the efficiency of the averaging, i.e. the small ball probability estimate for $1-\mathcal{R}_{N,d,I,\eta}$ , is influenced by the magnitude of $d$ or, equivalently, the length $d-|I|$ of the “valleys” separating the clusters of spikes in $g_{N,d,I,\eta}$ . Now, let us discuss how this is related to the Lipschitzness of the logarithm. It is not difficult to check that, in order to satisfy the condition of being “essentially non-constant”, we must choose $d$ at least of order $1/\eta$ . Thus, the smaller $\eta$ is, the wider the valleys between the clusters of spikes, and the stronger the small ball probability estimates for $1-\mathcal{R}_{N,d,I,\eta}$ must be. In a sense, the Lipschitzness of the logarithm of $g_{N,d,I,\eta}$ , together with the essential non-constantness, affects the averaging indirectly, by influencing the structure of spikes and valleys.

In our actual model, a similar phenomenon holds, although the argument is more complicated, first, because the pattern of spikes does not have to be as regular as in the above example, second, because the spikes are defined as points where the function exceeds a certain threshold rather than points where it takes a specific value. Our measurement of the efficiency of the averaging is more complicated compared to the above example. For a function with relatively many spikes, we compare the $\ell_{2}$ –norms of the original function and the average. A crucial step towards proving Theorem 4.2 is the following proposition.

Proposition 4.7.

Let $R>0$ , $p\in(0,1)$ , $\mu\in(0,1/64]$ and $N\in\mathbb{N}$ . Further, assume that $g_{1},g_{2}$ are non-negative functions in $\ell_{1}(\mathbb{Z})$ , and $g_{1}$ satisfies the following conditions:

•

$\log_{2}g_{1}$ * is $\mu^{4}$ –Lipschitz;*

•

$\sum\limits_{t\in I}g_{1}(t)\leq RN$ * for any integer interval $I$ of cardinality $N$ ;*

•

There is interval $I_{0}\subset\mathbb{Z}$ with $|I_{0}|=N$ , such that $|\{t\in I_{0}:\;g_{1}(t)\geq 8R\}|\geq\mu N$ .

Let $Y$ be a random variable uniformly distributed on an integer interval $J$ of cardinality at least $N$ . Then

[TABLE]

Here, $C_{\text{\tiny\ref{prop: ell 2 update}}},c_{\text{\tiny\ref{prop: ell 2 update}}}>0$ are universal constants.

Before proving the proposition, we consider two lemmas.

Lemma 4.8.

Let $f,g\in\ell_{2}(\mathbb{Z})$ , and assume that $\kappa>0$ and $k\in\mathbb{N}$ are such that

[TABLE]

Let $p\in(0,1)$ . Then $\big{\|}pf+(1-p)g\big{\|}_{2}^{2}\leq\big{(}p\|f\|_{2}^{2}+(1-p)\|g\|_{2}^{2}\big{)}-p(1-p)\kappa^{2}k$ .

Proof.

For any $t\in\mathbb{Z}$ we have

[TABLE]

which implies the estimate. ∎

Lemma 4.9.

Let $f,g\in\ell_{1}(Z)$ , and $\delta,\kappa>0$ . Further, assume that $I\subset\mathbb{Z}$ is an integer interval and $I_{1}\cup I_{2}\cup I_{3}=I$ is a partition of $I$ into three subsets (not necessarily subintervals) such that $|I_{3}|\in\big{[}\delta|I|/2,\delta|I|\big{]}$ , $|I_{2}|\leq\delta|I|$ , and $f(t_{1})\geq\kappa+f(t_{3})$ for all $t_{1}\in I_{1}$ and $t_{3}\in I_{3}$ . Further, assume that $X$ is an integer random variable uniformly distributed on an interval $J\subset\mathbb{Z}$ of cardinality at least $|I|$ . Then

[TABLE]

Proof.

Without loss of generality, $\delta\leq 1/64$ . Fix any subinterval $\widetilde{J}\subset J$ of cardinality at least $|I|/2$ and at most $|I|$ . We will prove the probability estimate under the condition that $X$ belongs to $\widetilde{J}$ . Then the required result will easily follow by partitioning $J$ into subintervals and combining estimates for corresponding conditional probabilities.

Set

[TABLE]

and define

[TABLE]

Observe that, in view of the assumption $w_{1}\geq w_{3}+\kappa$ , for any point $i\in\widetilde{J}\setminus Q$ we have

[TABLE]

Thus, if $Q=\emptyset$ then, conditioned on $X\in\widetilde{J}$ , $\big{|}\big{\{}t\in I:\;|f(t)-g(t+X)|\geq\kappa/2\big{\}}\big{|}<\delta|I|/4$ holds with probability zero, and the statement follows. Below, we assume that $Q\neq\emptyset$ .

Set $S:=\{\min Q,\min Q+1,\dots,\max Q\}$ . Since $|\widetilde{J}|\leq|I|$ , we have $S+I=(\min Q+I)\cup(\max Q+I)$ , whence

[TABLE]

The above estimate immediately gives

[TABLE]

Hence, the number of points $i\in S$ such that

[TABLE]

is at most $32\delta|I|$ . On the other hand, for every $i\in S$ such that (7) does not hold, we clearly have

[TABLE]

Summarizing, we obtain

[TABLE]

whence

[TABLE]

The result follows. ∎

Proof of Proposition 4.7.

Let $\delta:=8\mu$ , $\varepsilon:=\mu^{4}$ and $\widetilde{I}:=I_{0}+\{0,1,\dots,N\}$ , so that $|\widetilde{I}|=2N$ . It is not difficult to see that there is a real interval of the form $(a,2^{\mu^{2}}a]$ , where $4R\leq a\leq 2^{-\mu^{2}}\cdot 8R$ and such that

[TABLE]

We will inductively construct a finite sequence of integer intervals $I^{\prime}_{1},I^{\prime}_{2},\dots,I^{\prime}_{h}$ as follows.

At the first step, let $t_{1}^{\ell}:=\min\{t\in\widetilde{I}:\;g_{1}(t)\geq 2^{\mu^{2}}a\}$ ,

[TABLE]

and define $I^{\prime}_{1}:=\{t_{1}^{\ell},t_{1}^{\ell}+1,\dots,t_{1}^{r}\}$ (note that by the definition of $I_{0}$ , $t_{1}^{\ell}$ exists). In words, we choose $t_{1}^{r}$ to be the largest integer in $\widetilde{I}$ such that the number of the elements $s\in I^{\prime}_{1}$ corresponding to “small” values $g_{1}(s)\leq a$ , is at most $\delta|I^{\prime}_{1}|$ . If $\max I^{\prime}_{1}\geq\max I_{0}$ or if $g_{1}(t)<2^{\mu^{2}}a$ for all $t^{r}_{1}=\max I^{\prime}_{1}<t\leq\max I_{0}$ then we set $h:=1$ and complete the process. Otherwise, we go to the second step.

At $k$ -th step, $k>1$ , we define $t_{k}^{\ell}>I^{\prime}_{k-1}$ to be the smallest integer in $(\max I^{\prime}_{k-1},\infty)$ such that $g_{1}(t_{k}^{\ell})\geq 2^{\mu^{2}}a$ (the previous step of the construction guarantees that such $t_{k}^{\ell}$ exists and belongs to $I_{0}$ ). We set $t_{k}^{r}:=\max\big{\{}t\in\widetilde{I}:\;t\geq t_{k}^{\ell};\;|\{s\in\{t_{k}^{\ell},\dots,t\}:\;g_{1}(s)\leq a\}|\leq\delta(t-t_{k}^{\ell}+1)\big{\}}$ , and $I^{\prime}_{k}:=\{t_{k}^{\ell},t_{k}^{\ell}+1,\dots,t_{k}^{r}\}$ . If $\max I^{\prime}_{k}\geq\max I_{0}$ or if $g_{k}(t)<2^{\mu^{2}}a$ for all $t^{r}_{k}=\max I^{\prime}_{k}<t\leq\max I_{0}$ then set $h:=k$ and complete, otherwise go to the next step.

Next, we observe some important properties of the constructed sequence.

(a)

The left-points of all intervals are contained in $I_{0}$ , and the union $\bigcup_{k=1}^{h}I^{\prime}_{k}$ contains the set $\{t\in I_{0}:\;g_{1}(t)\geq 2^{\mu^{2}}a\}$ ; in particular, cardinality of the union is at least $\mu N$ .

(b)

The cardinality of any interval $I^{\prime}_{k}$ cannot exceed $N$ since our assumption on the function $g_{1}$ , together with the definition of $I^{\prime}_{k}$ , gives

[TABLE]

In particular, this implies that $\max I^{\prime}_{h}$ is strictly less than $\max\widetilde{I}$ .

(c)

The condition that $\log_{2}g_{1}$ is $\varepsilon$ –Lipschitz implies that for any $k\leq h$ , $|I^{\prime}_{k}|\geq\lfloor\mu^{2}/\varepsilon\rfloor>\frac{1}{4\mu}$ . Indeed, since $g_{1}(t+1)\geq 2^{-\varepsilon}g_{1}(t)$ for all $t\in\mathbb{Z}$ , we have $g_{1}(t)>2^{-\mu^{2}}g_{1}(t_{k}^{\ell})\geq a$ whenever $0\leq t-t_{k}^{\ell}<\mu^{2}/\varepsilon$ . On the other hand, the last conclusion in property (b) implies that $|\{s\in\{t_{k}^{\ell},\dots,t_{k}^{r}+1\}:\;g_{1}(s)\leq a\}|>\delta(t_{k}^{r}+1-t_{k}^{\ell}+1)>\delta|I_{k}^{\prime}|$ , as $t_{k}^{r}+1\in\widetilde{I}$ .

(d)

Property (c), in its turn, implies that for any $k\leq h$ we have $1\leq\delta|I_{k}^{\prime}|/2$ , whence $|\{t\in I^{\prime}_{k}:\;g_{1}(t)\leq a\}|\geq\delta|I^{\prime}_{k}|/2$ .

Our goal is to apply Lemma 4.9 to the constructed intervals. For each $k\leq h$ , we define the partition $I^{\prime}_{k}=I^{\prime}_{k,1}\cup I^{\prime}_{k,2}\cup I^{\prime}_{k,3}$ , where

[TABLE]

Additionally, set $\kappa:=\big{(}2^{\mu^{2}}-1\big{)}\cdot 4R$ . We define subset of good indices $G\subset[h]$ as

[TABLE]

Note that (8), together with property (a) of the intervals, implies that

[TABLE]

By Lemma 4.9, for every $k\in G$ the event

[TABLE]

has probability at most $64\delta$ . Hence, the expectation of the sum

[TABLE]

is at most $64\delta\cdot\sum\limits_{k\in G}|I^{\prime}_{k}|$ , and in view of Markov’s inequality and the lower bound for $\sum\limits_{k\in G}|I^{\prime}_{k}|$ ,

[TABLE]

As the final remark, for any realization of $Y$ such that $\sum\limits_{k\in G}|I^{\prime}_{k}|{\bf 1}_{\mathcal{E}_{k}^{c}}\geq\frac{\mu N}{4}$ , we have $\big{|}\big{\{}t\in\widetilde{I}:\;|g_{1}(t)-g_{2}(t+Y)|\geq\kappa/2\big{\}}\big{|}\geq\frac{\delta}{4}\frac{\mu N}{4}$ , whence, in view of Lemma 4.8

[TABLE]

The result follows. ∎

The estimate on the $\ell_{2}$ –norm of the average in Proposition 4.7 involves the parameter $\mu$ which, roughly speaking, determines the cardinality of the largest cluster of spikes in $g_{1}$ . If the cardinality is small, the estimate given by the proposition becomes weaker. Even assuming best possible values for $\mu$ , $n$ applications of the averaging to obtain $f_{\mathcal{A},p,n}$ from $f$ would not provide a bound on $\|f_{\mathcal{A},p,n}\|_{2}$ which could be translated into a meaningful estimate for the $\ell_{\infty}$ –norm of the average.

Returning to the example that we discussed on page 4, if the function $g_{N,d,I,\eta}$ is such that $|I|$ is much less than $d$ , i.e. the spikes are rare then with probability $1-\Theta(\frac{|I|}{d})\approx 1$ the averaged function $g^{av}_{N,d,I,\eta}$ will not have any spikes left. When the spikes are located in an irregular fashion, such strong property does not hold, but the following phenomenon can still be observed: if the spikes are rare then with a probability close to one the averaged function will have much fewer (by a large factor) spikes. In other words, in the regime when there are few points where the function is large, rather than measuring the $\ell_{2}$ –norm of the average, it is more useful to consider how the cardinality of the set of spikes shrinks under averaging. Combining this idea with Proposition 4.7, we can derive the following statement:

Proposition 4.10.

For any $p\in(0,1/2]$ , $\varepsilon\in(0,1)$ , $\widetilde{R}\geq 1$ , $L_{0}\geq 16\widetilde{R}$ and $M\geq 1$ there are $n_{\text{\tiny\ref{prop: refinement}}}=n_{\text{\tiny\ref{prop: refinement}}}(p,\varepsilon,L_{0},\widetilde{R},M)>0$ and $\eta_{\text{\tiny\ref{prop: refinement}}}=\eta_{\text{\tiny\ref{prop: refinement}}}(p,\varepsilon,L_{0},\widetilde{R},M)\in(0,1)$ with the following property. Let $L_{0}\geq L\geq 16\widetilde{R}$ , let $n\geq n_{\text{\tiny\ref{prop: refinement}}}$ , $N\leq 2^{n}$ , let $g\in\ell_{1}(\mathbb{Z})$ be a non-negative function satisfying

•

$\|g\|_{1}=1$ ;

•

$\log_{2}g$ * is $\eta_{\text{\tiny\ref{prop: refinement}}}$ –Lipschitz;*

•

$\sum\limits_{t\in I}g(t)\leq\frac{\widetilde{R}}{\sqrt{n}}$ * for any integer interval $I$ of cardinality $N$ ;*

•

$\|g\|_{\infty}\leq\frac{L}{N\sqrt{n}}$ .

For each $i\leq\lfloor\varepsilon n\rfloor$ , let $X_{i}$ be a random variable uniform on some disjoint union of integer intervals of cardinality at least $N$ each; and assume that $X_{1},\dots,X_{\lfloor\varepsilon n\rfloor}$ are independent. Define a random function $\widetilde{g}\in\ell_{1}(\mathbb{Z})$ as

[TABLE]

where $b=(b_{1},\dots,b_{n})$ is the vector of independent Bernoulli( $p$ ) components. Then

[TABLE]

In words, the above proposition tells us that, given a “preprocessed” function $g$ with $\|g\|_{\infty}\leq\frac{L}{N\sqrt{n}}$ , after $\varepsilon n$ averagings the $\ell_{\infty}$ –norm of the function drops at least by the factor $p/\sqrt{2}+1-p$ with a probability superexponentially close to one. By applying the proposition several times to a “preprocessed” function given by Proposition 4.5, we will be able to complete the proof of the theorem.

Before proving the proposition, let us consider a simple lemma.

Lemma 4.11.

Let $f\in\ell_{1}(\mathbb{Z})$ be a non-negative function, let $m,N\in\mathbb{N}$ , $p\in(0,1)$ , $H,\mu>0$ , and assume that $\|f\|_{\infty}\leq 2H$ and that for any integer interval $I$ of cardinality $N$ we have

[TABLE]

Choose any integers $x_{1},x_{2},\dots,x_{m}$ and set

[TABLE]

where $b=(b_{1},\dots,b_{m})$ is the vector of independent Bernoulli( $p$ ) random variables. Then for any integer interval $J$ of cardinality $N$ we have

[TABLE]

Proof.

Take any point $t\in\mathbb{Z}$ such that $\widetilde{f}(t)\geq\sqrt{2}H$ . We have

[TABLE]

so that

[TABLE]

On the other hand, for any interval $J$ of cardinality $N$ and any choice of $(v_{i})_{i=1}^{m}\in\{0,1\}^{m}$ , we have, by the assumptions of the lemma,

[TABLE]

whence

[TABLE]

Combining the last inequality with the condition (9), we get the statement. ∎

Proof of Proposition 4.10.

Fix any admissible parameters $\varepsilon$ , $p$ , $\widetilde{R}$ , $L$ , $N$ and $M$ , and set

[TABLE]

We will assume that $n$ is sufficiently large so that $\varepsilon n/4\geq 1$ and, moreover,

[TABLE]

Set

[TABLE]

We fix any function $g\in\ell_{1}(\mathbb{Z})$ satisfying conditions of the proposition with parameters $\eta$ , $\widetilde{R}$ , $N$ , $L$ , $n$ . Note that $\|g\|_{\infty}\leq 2H$ . Define $g_{0}:=g$ ,

[TABLE]

so that either $\widetilde{g}=g_{2m}$ (if $\lfloor\varepsilon n\rfloor$ is even) or $\widetilde{g}=g_{2m+1}$ (if $\lfloor\varepsilon n\rfloor$ is odd). It is easy to see that $\log_{2}g_{k}$ is $\eta$ –Lipschitz (because the log-Lipschitzness is preserved under taking convex combinations) and $\|g_{k}\|_{1}=1$ for all admissible $k$ .

For each $i\leq m$ , define events

[TABLE]

and

[TABLE]

(we can formally extend the first definition to $i=0$ ). Clearly, for each $i$ , $\mathcal{E}_{i}$ and $\widetilde{\mathcal{E}}_{i}$ are measurable w.r.t the sigma-algebra generated by $X_{1},\dots,X_{i}$ . Condition for a moment on any realization of $X_{1},\dots,X_{i-1}$ , and observe that one of the following two assertions is true:

•

$\mathcal{E}_{i-1}$ holds;

•

$\big{|}\big{\{}t\in I:\;g_{i}(t)\geq 8R\big{\}}\big{|}\geq\mu N$ for some integer interval $I$ of cardinality $N$ , where we set $R:=\frac{\widetilde{R}}{N\sqrt{n}}$ . Then, applying Proposition 4.7, we get ${\mathbb{P}}_{X_{i}}(\widetilde{\mathcal{E}}_{i})\geq 1-C_{\text{\tiny\ref{prop: ell 2 update}}}\mu$ .

Hence,

[TABLE]

This implies that for any $r\in[m]$ , the probability that $\big{(}\mathcal{E}_{i-1}\cup\widetilde{\mathcal{E}}_{i}\big{)}^{c}$ holds for at least $r$ indices $i$ can be estimated as

[TABLE]

Note that the definition of $g_{k}$ ’s and the triangle inequality imply that the sequence $\big{(}\|g_{k}\|_{2}\big{)}_{k\geq 0}$ is non-increasing. Hence, taking $r:=\lceil m/2\rceil$ in the above formula and in view of our choice of $\mu$ , we get that with probability at least $1-\exp(-2Mn)$ at least one of the following two conditions is satisfied:

(a)

There is $i\leq m$ such that $\big{|}\big{\{}t\in I:\;g_{i}(t)\geq H\big{\}}\big{|}\leq\mu N$ for any integer interval $I$ of cardinality $N$ ; or

(b)

$\|g_{m}\|_{2}^{2}\leq\|g\|_{2}^{2}-c_{\text{\tiny\ref{prop: ell 2 update}}}p(1-p)m\mu^{6}\widetilde{R}^{2}n^{-1}N^{-1}/2$ .

It can be checked, however, that condition (b) is improbable. Indeed, in view of the restrictions on the $\ell_{1}$ – and $\ell_{\infty}$ –norms of $g$ , and Hölder’s inequality,

[TABLE]

whence, applying (10), we get $\|g\|_{2}^{2}-c_{\text{\tiny\ref{prop: ell 2 update}}}p(1-p)m\mu^{6}\widetilde{R}^{2}n^{-1}N^{-1}/2<0$ .

Thus, only (a) may hold, so the event

[TABLE]

has probability at least $1-\exp(-2Mn)$ . Applying Lemma 4.11 we get that everywhere on the event

[TABLE]

The second part of our proof resembles the proof of Proposition 4.5, although the argument here is simpler. We observe that there exists a random sequence of integers $(t_{i})_{i=m}^{2m}$ satisfying

•

The sequence $\big{(}g_{i}(t_{i})\big{)}_{i=m}^{2m}$ is non-increasing;

•

$g_{2m}(t_{2m})=\|g_{2m}\|_{\infty}$ ;

•

$t_{i}\in\{t_{i-1},t_{i-1}-X_{i}\}$ for all $m<i\leq 2m$ .

On the event

[TABLE]

we necessarily have $\|g_{i}\|_{\infty}\geq(\sqrt{2}p+2(1-p))H$ , $i\leq 2m$ , hence, in view of the recursive relation $g_{i}(t_{i})=p\,g_{i-1}(t_{i}+X_{i})+(1-p)g_{i-1}(t_{i})$ and the deterministic upper bound $\|g_{i-1}\|_{\infty}\leq 2H$ , we have $g_{i-1}(t_{i}+X_{i})\geq\sqrt{2}H$ and $g_{i-1}(t_{i})\geq\sqrt{2}H$ for all $m<i\leq 2m$ . Thus,

[TABLE]

We will show that the probability of the latter event is small by considering a union bound over non-random sequences.

Fix any realizations $X_{1}^{0},\dots,X_{m}^{0}$ of $X_{1},\dots,X_{m}$ such that the event $\mathcal{E}$ defined above holds. Take any non-random sequence $(v_{i})_{i=m+1}^{2m}\in\{0,1\}^{m}$ and any fixed $s_{m}\in\mathbb{Z}$ such that $g_{m}(s_{m})\geq\sqrt{2}H$ (if such $s_{m}$ exists). Further, we define random numbers $s_{i}:=s_{i-1}-v_{i}X_{i}$ , $i=m+1,\dots,2m$ . Then for any $i\geq m+1$ we have

[TABLE]

in view of (11) and our assumption about the distribution of $X_{i}$ ’s. Hence,

[TABLE]

is at most $(12\mu)^{m}$ . This, together with the obvious observation $|\{s\in\mathbb{Z}:\;g_{m}(s)\geq\sqrt{2}H\}|\leq(\sqrt{2}H)^{-1}$ , allows to estimate the probability of $\hat{\mathcal{E}}$ as

[TABLE]

By our definition of the parameters $\mu,H,m$ , the rightmost quantity is less than $\exp(-Mn)$ for all sufficiently large $n$ . The proof is complete. ∎

Proof of Theorem 4.2.

Fix any admissible parameters $\delta\in(0,1]$ , $p\in(0,1/2]$ , $\varepsilon\in(0,p)$ , $K,M\geq 1$ . The proof of the theorem is essentially a combination of Proposition 4.5 which provides a rough bound on the $\ell_{\infty}$ –norm which depends on $M$ , and subsequent application of Proposition 4.10 to get a refined bound.

We define

[TABLE]

and let $q$ be the smallest positive integer such that $\big{(}p/\sqrt{2}+1-p\big{)}^{q}\leq L^{-1}$ . Further, define $\alpha=\alpha(p,\varepsilon)$ as the smallest number in $[1/2,1)$ which satisfies

[TABLE]

and set $\widetilde{\varepsilon}:=(1-\alpha)/(2q)$ . Now, we fix any $n$ satisfying

[TABLE]

fix $1\leq N\leq(1-p+\varepsilon)^{-n}$ , and define $\ell:=\lceil\alpha n\rceil$ . It can be checked that with the above assumptions on parameters, we have $(1-p+\varepsilon/2)^{\ell}\leq(1-p+\varepsilon)^{n}/\sqrt{n}$ .

Further, we fix any non-negative function $f\in\ell_{1}(\mathbb{Z})$ with $\|f\|_{1}=1$ and such that $\log_{2}f$ is $\eta$ –Lipschitz for $\eta=\eta_{\text{\tiny\ref{prop: refinement}}}(p,\widetilde{\varepsilon},\max(16\widetilde{R},L),\widetilde{R},2M)$ . Note that, by the above, $(1-p+\varepsilon/2)^{\ell}\,\|f\|_{\infty}\leq L(N\sqrt{n})^{-1}$ , and, by Proposition 4.5, the event

[TABLE]

has probability at least $1-\exp(-2Mn)$ .

Further, we split the integer interval $\{\ell,\ell+1,\dots,n\}$ into $q$ subintervals, each of cardinality at least $\frac{n-\alpha n}{2q}=\widetilde{\varepsilon}n$ . Let $\ell\leq i_{1}<i_{2}<\dots<i_{q}=n$ be the right endpoints of corresponding subintervals. Observe that by Lemma 4.4, for any $k\geq\ell$ and any integer interval $I$ of cardinality $N$ we have deterministic relation

[TABLE]

by our definition of $R$ . This enables us to apply Proposition 4.10. Applying Proposition 4.10 to the first subinterval, we get that, conditioned on the event $\mathcal{E}_{0}:=\mathcal{E}_{\text{\tiny\ref{p: rough decay}}}$ , the event

[TABLE]

has probability at least $1-\exp(-2Mn)$ . More generally, for the $j$ -th subinterval, the application of Proposition 4.10 gives

[TABLE]

where for each $1\leq j\leq q$ ,

[TABLE]

Taking into account our definition of $q$ ,

[TABLE]

In view of the above, the probability of this event can be estimated from below by $1-(q+1)\exp(-2Mn)$ , which is greater than $1-\exp(-Mn)$ for all suffificently large $n$ . It remains to choose

[TABLE]

∎

5. Proof of Theorem A

Let us recall the definition of a threshold which we considered in Section 2. For any $p\in(0,1/2]$ , any vector $x\in S^{n-1}$ and any parameter $L>0$ we define the threshold ${\mathcal{T}}_{p}(x,L)$ as the supremum of all $t\in(0,1]$ such that ${\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}x_{i},t\big{)}>Lt$ , where $b_{1},\dots,b_{n}$ are independent Bernoulli( $p$ ) random variables. Note that ${\mathcal{T}}_{p}(x,L)\geq\frac{1}{L}(1-p)^{n}$ . On the other hand, as a consequence of the Lévy–Kolmogorov–Rogozin inequality (Lemma 3.1), we obtain

Lemma 5.1.

For every $p\in(0,1/2]$ , $\delta,\nu\in(0,1]$ there are $K_{\text{\tiny\ref{l: threshold}}}=K_{\text{\tiny\ref{l: threshold}}}(p,\delta,\nu)>0$ and $L_{\text{\tiny\ref{l: threshold}}}=L_{\text{\tiny\ref{l: threshold}}}(p,\delta,\nu)\geq 1$ with the following property. Let $n\geq 2$ , $L\geq L_{\text{\tiny\ref{l: threshold}}}$ , and let $x\in{\rm Incomp}_{n}(\delta,\nu)$ . Then ${\mathcal{T}}_{p}(x,L)\leq\frac{K_{\text{\tiny\ref{l: threshold}}}}{\sqrt{n}}$ .

Proof.

Take any vector $x\in{\rm Incomp}_{n}(\delta,\nu)$ , and let $I\subset[n]$ be a subset of cardinality $\lfloor\delta n\rfloor$ corresponding to the largest (by absolute value) coordinates of $x$ , i.e. such that $|x_{i}|\geq|x_{\ell}|$ for all $i\in I$ and $\ell\in[n]\setminus I$ . Since $x$ is $(\delta,\nu)$ –incompressible, we have $\|x\,{\bf 1}_{[n]\setminus I}\|_{2}\geq\nu$ , whence there is $\ell\in[n]\setminus I$ such that $|x_{\ell}|\geq\nu/\sqrt{n}$ . Thus, $|x_{i}|\geq\nu/\sqrt{n}$ for all $i\in I$ . Applying Lemma 3.1, we get

[TABLE]

for all $t\geq 1$ for some $C\geq 1$ depending only on $p$ . It remains to choose $L_{\text{\tiny\ref{l: threshold}}}:=\frac{C}{\nu\sqrt{\delta/2}}$ and $K_{\text{\tiny\ref{l: threshold}}}:=\max\big{(}\delta^{-1/2},\nu\big{)}$ . The result follows by the definition of the threshold. ∎

Remark 5.2.

The above lemma can also be obtained by applying results of [15], namely, the property that the least common denominator of an incompressible vector is of order at least $\sqrt{n}$ .

Let us discuss what is left in order to complete the proof of Theorem A. The standard decomposition of $S^{n-1}$ into sets of compressible and incompressible vectors and the reduction of invertibility over the incompressible vectors to the distance problem for the random normal (see description in Section 2), leave the following question: given a number $T\gg(1-p+\varepsilon)^{n}$ , show that the probability of the event $\{{\mathcal{T}}_{p}(Y_{n},L)\in[T,2T)\}$ is close to zero. Here, $Y_{n}$ is a unit normal vector to the first $n-1$ columns of the matrix $B_{n}(p)+s\,1_{n}1_{n}^{\top}$ . Assuming that ${\mathcal{N}}_{T}$ is a discrete approximation of the set of incompressible vectors with the threshold in $[T,2T)$ , we can write

[TABLE]

(we prefer not to specify at this stage what “almost orthogonal” means quantitatively). Most of the work related to estimating the cardinality of ${\mathcal{N}}_{T}$ was done in Section 4. Here, we combine Corollary 4.3 with a simple counting argument giving an estimate of the cardinality of a part of the integer lattice $\mathbb{Z}^{n}$ with prescribed bounds on the vector coordinates (see Corollary 5.5 in this section). The probability estimate for the event

[TABLE]

would follow as a simple consequence of the Tensorization Lemma 3.2 and individual small ball probability bounds for $\langle x,{\rm col}_{i}\rangle$ . Note that if the threshold of the vector $x$ was contained in the range $[0,C\,T)$ , such estimates would immediately follow from the definition of the threshold. However, the vector $x\in{\mathcal{N}}_{T}$ is only an approximation of another vector with a small threshold. Thus, to make the conclusion, we will need a statement which asserts that for a given vector one can find its lattice approximation which preserves (to some extent) the anticoncentration properties of the corresponding random linear combination:

Lemma 5.3.

Let $p\in(0,1/2]$ , let $y=(y_{1},\dots,y_{n})\in\mathbb{R}^{n}$ be a vector and $L>0$ , $\lambda\in\mathbb{R}$ be numbers such that for mutually independent Bernoulli( $p$ ) random variables $b_{1},\dots,b_{n}$ we have ${\mathbb{P}}\{\big{|}\sum_{i=1}^{n}b_{i}y_{i}-\lambda\big{|}\leq t\}\leq Lt$ for all $t\geq\sqrt{n}$ . Then there exists a vector $y^{\prime}=(y_{1}^{\prime},\dots,y_{n}^{\prime})\in\mathbb{Z}^{n}$ having the following properties

•

$\|y-y^{\prime}\|_{\infty}\leq 1$ ;

•

${\mathbb{P}}\big{\{}\big{|}\sum_{i=1}^{n}b_{i}y_{i}^{\prime}-\lambda\big{|}\leq t\big{\}}\leq C_{\text{\tiny\ref{l: magic vector}}}\,Lt$ * for all $t\geq\sqrt{n}$ ;*

•

${\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}y_{i}^{\prime},\sqrt{n}\big{)}\geq c_{\text{\tiny\ref{l: magic vector}}}\,{\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}y_{i},\sqrt{n}\big{)}$ ;

•

$\big{|}\sum_{i=1}^{n}y_{i}-\sum_{i=1}^{n}y_{i}^{\prime}\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}$ .

Here, $C_{\text{\tiny\ref{l: magic vector}}},c_{\text{\tiny\ref{l: magic vector}}}>0$ are universal constants.

The first and the last property of $y^{\prime}$ will be used to estimate the Euclidean norm of $(B_{n}(p)+s\,1_{n}1_{n}^{\top})(y-y^{\prime})$ : the bound on $\|y-y^{\prime}\|_{\infty}$ provides control of $\|(B_{n}(p)-p\,1_{n}1_{n}^{\top})(y-y^{\prime})\|_{2}$ while the relation $\big{|}\sum_{i=1}^{n}y_{i}-\sum_{i=1}^{n}y_{i}^{\prime}\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}$ implies $\big{\|}(s+p)\,1_{n}1_{n}^{\top}(y-y^{\prime})\big{\|}_{2}\leq C_{\text{\tiny\ref{l: magic vector}}}|s+p|n$ .

The proof of Lemma 5.3 is based on a well known concept of the randomized rounding [12] (see also [1, 7, 11] for some recent applications). The first use of this method in the context of matrix invertibility is, to the best of author’s knowledge, due to G.Livshyts [11]. In [11], the randomized rounding is used to choose a best lattice approximation for a vector, which in turn is applied to construction of $\varepsilon$ –nets; our work follows the same principle. We note that, unlike [11], in the present paper we need to explicitly control the Lévy concentration function and the small ball probability estimates for the approximating vector (the second and the third property in the statement).

Proof of Lemma 5.3.

Fix a vector $y\in\mathbb{R}^{n}$ , and let $b_{1},\dots,b_{n}$ be independent Bernoulli( $p$ ) random variables. Further, let $\xi_{1},\dots,\xi_{n}$ be random variables jointly independent with $b_{1},\dots,b_{n}$ , such that for each $i\leq n$ , $\xi_{i}$ takes values $\lfloor y_{i}\rfloor$ and $\lfloor y_{i}\rfloor+1$ with probabilities $\lfloor y_{i}\rfloor+1-y_{i}$ and $y_{i}-\lfloor y_{i}\rfloor$ , respectively (so that $\mathbb{E}\,\xi_{i}=y_{i}$ ). Define random vector $\widetilde{y}:=(\xi_{1},\dots,\xi_{n})$ , and observe that with probability one $\|y-\widetilde{y}\|_{\infty}\leq 1$ .

Fix for a moment any $w>0$ and denote by $S(2w)$ the collection of all $(v_{i})_{i=1}^{n}\in\{0,1\}^{n}$ such that $\big{|}\sum_{i=1}^{n}v_{i}y_{i}-\lambda\big{|}>2w$ . Take any $(v_{i})_{i=1}^{n}\in S(2w)$ . Note that $\sum_{i=1}^{n}v_{i}(y_{i}-\widetilde{y}_{i})$ is the sum of independent variables, each of mean zero and variance at most $1/4$ . Hence, by Markov’s inequality,

[TABLE]

Thus, if $\widetilde{S}(w)$ is the (random) collection of all vectors $(v_{i})_{i=1}^{n}\in\{0,1\}^{n}$ such that $\big{|}\sum_{i=1}^{n}v_{i}\widetilde{y}_{i}-\lambda\big{|}>w$ then the above estimate immediately implies for an arbitrary subset $E\subset\{0,1\}^{n}$ :

[TABLE]

We take $E=S(4w)$ in the above relation and apply it for $w=2^{j-1}t$ , $j\geq 1$ , so that

[TABLE]

for any $t\geq\sqrt{n}$ , where we have used that, by the assumption on $y$ ,

[TABLE]

The relation implies that for all $t\geq\sqrt{n}$ ,

[TABLE]

An application of Markov’s inequality, with $t=\sqrt{n},2\sqrt{n},4\sqrt{n},\dots$ , gives

[TABLE]

Together with the condition on the small ball probability of random sums $\sum_{i=1}^{n}b_{i}y_{i}-\lambda$ , this implies that there is an event $\mathcal{E}_{1}$ measurable with respect to $\widetilde{y}$ and with ${\mathbb{P}}(\mathcal{E}_{1})>9/16$ such that for any realization $\widetilde{y}^{0}$ of $\widetilde{y}$ from $\mathcal{E}_{1}$ ,

[TABLE]

for some universal constant $C>0$ .

Further, we will derive lower bounds on the anticoncentration function of the sum $\sum_{i=1}^{n}b_{i}\widetilde{y}_{i}$ . The argument is very similar to the one above, and we will skip some details. Let $\lambda^{\prime}\in\mathbb{R}$ be a number such that

[TABLE]

where

[TABLE]

Further, denote

[TABLE]

Take any $(v_{i})_{i=1}^{n}\in\{0,1\}^{n}\setminus S_{\lambda^{\prime}}(\sqrt{n})$ . Since the variance of the random sum $\sum_{i=1}^{n}v_{i}(y_{i}-\widetilde{y}_{i})$ is at most $n/4$ , we get

[TABLE]

Hence,

[TABLE]

so that with probability at least $2/3$ we have

[TABLE]

Denote by $\mathcal{E}_{2}$ the event that (12) holds (observe that the event is measurable with respect to $\widetilde{y}$ ). Note that for any realization $\widetilde{y}^{0}$ of $\widetilde{y}$ from the event $\mathcal{E}_{2}$ , we have

[TABLE]

This immediately implies

[TABLE]

As the last step of the proof, we note that since the variance of the sum $\sum_{i=1}^{n}(y_{i}-\widetilde{y}_{i})$ is at most $n/4$ , there is an event $\mathcal{E}_{3}$ measurable with respect to $\widetilde{y}$ and of probability at least $37/48$ such that everywhere on $\mathcal{E}_{3}$ , $\big{|}\sum_{i=1}^{n}(y_{i}-\widetilde{y}_{i})\big{|}\leq\sqrt{12n/11}$ .

Finally, since $3-{\mathbb{P}}(\mathcal{E}_{1})-{\mathbb{P}}(\mathcal{E}_{2})-{\mathbb{P}}(\mathcal{E}_{3})<1$ , there exists a realization $y^{\prime}$ of the random vector $\widetilde{y}$ from the intersection $\mathcal{E}_{1}\cap\mathcal{E}_{2}\cap\mathcal{E}_{3}$ . It is straightforward to check that $y^{\prime}$ satisfies all conditions of the lemma. ∎

Given any $p\in(0,1/2]$ , $s\in[-1,0]$ , any $x\in S^{n-1}$ and $L\geq 1$ , we construct integer vector ${\bf Y}(p,x,L,s)\in\mathbb{Z}^{n}$ as follows: take $y=(y_{1},\dots,y_{n}):=\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x$ and observe that, by the definition of the threshold,

[TABLE]

Hence, by Lemma 5.3, there is a vector ${\bf Y}(p,x,L,s)\in\mathbb{Z}^{n}$ satisfying

•

$\big{\|}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x-{\bf Y}(p,x,L,s)\big{\|}_{\infty}\leq 1$ ;

•

${\mathbb{P}}\big{\{}\big{|}\sum_{i=1}^{n}b_{i}{\bf Y}_{i}(p,x,L,s)+\frac{s\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\sum_{i=1}^{n}x_{i}\big{|}\leq t\big{\}}$ $\leq\frac{C_{\text{\tiny\ref{l: magic vector}}}\,L\,{\mathcal{T}}_{p}(x,L)}{\sqrt{n}}\,t$ for all $t\geq\sqrt{n}$ ;

•

${\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}{\bf Y}_{i}(p,x,L,s),\sqrt{n}\big{)}\geq c_{\text{\tiny\ref{l: magic vector}}}\,L\,{\mathcal{T}}_{p}(x,L)$ ;

•

$\big{|}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\sum_{i=1}^{n}x_{i}-\sum_{i=1}^{n}{\bf Y}_{i}(p,x,L,s)\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}$ .

The vector with the above properties does not have to be unique, however, from now on we fix a single admissible vector for each $4$ –tuple $(p,x,L,s)$ .

Lemma 5.4.

For any $n\geq 2$ there is a subset $\bf\Pi$ of permutations on $[n]$ with $|{\bf\Pi}|\leq C_{\text{\tiny\ref{l: special permutations}}}^{n}$ , having the following property. Let $p\in(0,1/2]$ , $\delta\in(0,1/2]$ , $s\in[-1,0]$ , $\nu\in(0,1]$ , $L\geq 1$ , and let $x\in{\rm Incomp}_{n}(\delta,\nu)$ . Then there is $\sigma=\sigma(x)\in\bf\Pi$ such that the vector $\widetilde{y}=\big{(}{\bf Y}_{\sigma(i)}(p,x,L,s)\big{)}_{i=1}^{n}$ satisfies

[TABLE]

and

[TABLE]

Here, $C_{\text{\tiny\ref{l: special permutations}}}>0$ is a universal constant.

Proof.

If $\delta n<1$ then the statement is empty, and $\bf\Pi$ can be chosen arbitrarily. We will therefore assume that $\delta n\geq 1$ . We start by defining the collection of permutations $\bf\Pi$ . Let $j_{0}\geq 0$ be the largest integer such that $\delta n\geq 2^{j_{0}}$ . For every collection of subsets $[n]\supset I_{0}\supset\dots\supset I_{j_{0}}$ with $|I_{j}|=\lfloor 2^{-j}\delta n\rfloor$ , $j=0,\dots,j_{0}$ , take any permutation $\sigma$ such that $\sigma\big{(}\big{[}\lfloor 2^{-j}\delta n\rfloor\big{]}\big{)}=I_{j}$ , $j=0,\dots,j_{0}$ . We then compose $\bf\Pi$ of all such permutations (where we pick a single admissible permutation for every collection of subsets). It is not difficult to check that the total number of admissible collections $[n]\supset I_{0}\supset\dots\supset I_{j_{0}}$ , hence the cardinality of $\bf\Pi$ , is bounded above by $C^{n}$ for a universal constant $C>0$ .

It remains to check the properties of $\bf\Pi$ . Take any vector $x\in{\rm Incomp}_{n}(\delta,\nu)$ , and let $[n]\supset I_{0}(x)\supset\dots\supset I_{j_{0}}(x)$ be sets of indices corresponding to largest (by absolute value) coordinates of $x$ . Namely, $I_{j}(x)$ is a subset of cardinality $\lfloor 2^{-j}\delta n\rfloor$ such that $|x_{i}|\geq|x_{\ell}|$ for all $i\in I_{j}(x)$ and $\ell\in[n]\setminus I_{j}(x)$ . Let $\sigma\in\bf\Pi$ be a permutation such that

[TABLE]

Set $\widetilde{y}:=\big{(}{\bf Y}_{\sigma(i)}(p,x,L,s)\big{)}_{i=1}^{n}$ .

By our construction, $|x_{\sigma(i)}|\geq|x_{\sigma(\ell)}|$ for all $i\leq\delta n<\ell$ . Since $x$ is incompressible,

[TABLE]

whence there exists an index $\ell>\delta n$ such that $|x_{\sigma(\ell)}|>\nu/\sqrt{n}$ . Thus, $|x_{\sigma(i)}|>\nu/\sqrt{n}$ for all $i\leq\delta n$ , whence, in view of the definition of vector $\widetilde{y}$ ,

[TABLE]

The upper bounds on coordinates $\widetilde{y}_{i}$ are obtained in a similar fashion. Take any $j\in\{0,\dots,j_{0}\}$ . Since $|x_{\sigma(i)}|\leq|x_{\sigma(\ell)}|$ for all $\ell\leq 2^{-j}\delta n<i$ , and $x$ has Euclidean norm one, we get

[TABLE]

Hence,

[TABLE]

∎

Let $n\geq 2$ , $\delta\in[1/n,1/2]$ and $\nu\in(0,1]$ . Further, let $T\in(0,1]$ be a number such that

[TABLE]

Define a subset $\mathcal{A}(n,\delta,\nu,T)\subset\mathbb{Z}^{n}$ as follows: we take $\mathcal{A}(n,\delta,\nu,T)=A_{1}\times A_{2}\times\dots\times A_{n}$ , where

•

For all $1\leq j\leq\log_{2}(\delta n)$ and $2^{-j}\delta n<i\leq 2^{-j+1}\delta n$ , we have

[TABLE]

•

For $i>\delta n$ , we have

[TABLE]

•

$A_{1}:=\mathbb{Z}\cap\,\Big{[}-\Big{\lceil}\frac{2\sqrt{n}}{T}\Big{\rceil}-1,\Big{\lceil}\frac{2\sqrt{n}}{T}\Big{\rceil}+1\Big{]}\setminus\Big{[}1-\Big{\lfloor}\frac{\nu}{T}\Big{\rfloor},\Big{\lfloor}\frac{\nu}{T}\Big{\rfloor}-1\Big{]}$ .

Lemma 5.4 immediately implies

Corollary 5.5.

For any $n\geq 2$ there is a subset $\bf\Pi$ of permutations on $[n]$ with $|{\bf\Pi}|\leq C_{\text{\tiny\ref{l: special permutations}}}^{n}$ , having the following property. Let $p\in(0,1/2]$ , $\delta\in[1/n,1/2]$ , $s\in[-1,0]$ , $\nu\in(0,1]$ , $L\geq 1$ , $T>0$ , and let $x\in{\rm Incomp}_{n}(\delta,\nu)$ be such that $T/2\leq{\mathcal{T}}_{p}(x,L)\leq T$ . Then there is $\sigma=\sigma(x)\in\bf\Pi$ such that the vector $\big{(}{\bf Y}_{\sigma(i)}(p,x,L,s)\big{)}_{i=1}^{n}$ belongs to $\mathcal{A}(n,\delta,\nu,T)$ .

The next crucial observation, which will enable us to apply results from Section 4, is

Lemma 5.6.

For any $\delta\in(0,1/2]$ , $\nu\in(0,1]$ there are $n_{\text{\tiny\ref{l: admissibility of A}}}=n_{\text{\tiny\ref{l: admissibility of A}}}(\delta,\nu)\geq 1$ and $K_{\text{\tiny\ref{l: admissibility of A}}}=K_{\text{\tiny\ref{l: admissibility of A}}}(\delta,\nu)\geq 1$ with the following property. Take any $n\geq n_{\text{\tiny\ref{l: admissibility of A}}}$ , $T\in(0,\nu/2]$ and set $N:=\big{\lfloor}\frac{\nu}{T}\big{\rfloor}-1$ . Then the subset $\mathcal{A}(n,\delta,\nu,T)$ defined above is $(N,n,K_{\text{\tiny\ref{l: admissibility of A}}},\delta)$ –admissible (with the notion taken from Section 4).

Now, everything is ready to prove the main result of the paper.

Proof of Theorem A.

Fix any $p\in(0,1/2]$ , $\varepsilon\in(0,p/2]$ , and assume that $n\geq n_{\text{\tiny\ref{l: compress}}}(\varepsilon,p)$ and $\sqrt{n}\geq 2K_{\text{\tiny\ref{l: threshold}}}/\nu_{\text{\tiny\ref{l: compress}}}(\varepsilon,p)$ (we will impose additional restrictions on $n$ as the proof goes on). Fix any $s\in[-1,0]$ . Our goal is to estimate from above

[TABLE]

for any $t>0$ . Set

[TABLE]

Applying formula (2) and Proposition 3.6, we get for any $t\leq\gamma n$ :

[TABLE]

where $Y_{n}$ is a unit random vector measurable with respect to ${\rm col}_{1}(B_{n}(p)),\dots,{\rm col}_{n-1}(B_{n}(p))$ and orthogonal to ${\rm span\,}\{{\rm col}_{1}(B_{n}(p)+s\,1_{n}1_{n}^{\top}),\dots,{\rm col}_{n-1}(B_{n}(p)+s\,1_{n}1_{n}^{\top})\}$ . Applying Proposition 3.6 the second time, we obtain that the event $\big{\{}Y_{n}\in{\rm Comp}_{n}(\delta,\nu)\big{\}}$ has probability at most $\big{(}1-p+\varepsilon\big{)}^{n}$ . Further, for every vector $x\in{\rm Incomp}_{n}(\delta,\nu)$ , according to Lemma 5.1, ${\mathcal{T}}_{p}(x,L)\leq\frac{K_{\text{\tiny\ref{l: threshold}}}}{\sqrt{n}}$ whenever $L\geq L_{\text{\tiny\ref{l: threshold}}}$ . Set

[TABLE]

Then, in view of the above, we have

[TABLE]

Further, for any $j\geq 0$ , using the independence of $Y_{n}$ and ${\rm col}_{n}(B_{n}(p)+s\,1_{n}1_{n}^{\top})$ and the definition of the threshold, we can write

[TABLE]

Hence, for every $t\leq\gamma n$ ,

[TABLE]

Fix any $j\in\{0,1,\dots,\lfloor-n\,\log_{2}(1-p+\varepsilon)\rfloor\}$ and set $T:=\frac{2^{-j}K_{\text{\tiny\ref{l: threshold}}}}{\sqrt{n}}$ and

[TABLE]

where $C>0$ denotes the constant such that

[TABLE]

(which exists, according to Lemma 3.4). Further, let ${\bf\Pi}$ be the set of permutations from Corollary 5.5. Take any $x\in{\rm Incomp}_{n}(\delta,\nu)$ such that $T/2<{\mathcal{T}}_{p}(x,L)\leq T$ . Then the vector ${\bf Y}(p,x,L,s)$ satisfies (see page 5)

(a)

$\big{\|}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x-{\bf Y}(p,x,L,s)\big{\|}_{\infty}\leq 1$ ;

(b)

${\mathbb{P}}\big{\{}\big{|}\sum_{i=1}^{n}b_{i}\,{\bf Y}_{i}(p,x,L,s)+s\,\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\sum_{i=1}^{n}x_{i}\big{|}\leq\tau\big{\}}\leq\frac{C_{\text{\tiny\ref{l: magic vector}}}\,L\,T}{\sqrt{n}}\,\tau$ for all $\tau\geq\sqrt{n}$ ;

(c)

${\mathcal{L}}\big{(}\sum_{i=1}^{n}b_{i}\,{\bf Y}_{i}(p,x,L,s),\sqrt{n}\big{)}\geq c_{\text{\tiny\ref{l: magic vector}}}\,L\,{\mathcal{T}}_{p}(x,L)\geq\frac{c_{\text{\tiny\ref{l: magic vector}}}}{2}LT\geq\frac{c_{\text{\tiny\ref{l: magic vector}}}L\nu}{4N}$ ;

(d)

$\big{|}\sum_{i=1}^{n}\frac{\sqrt{n}}{{\mathcal{T}}_{p}(x,L)}\,x_{i}-\sum_{i=1}^{n}{\bf Y}_{i}(p,x,L,s)\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}$ .

Note that a combination of (b) and (d) gives

[TABLE]

Define the subset $D\subset\mathcal{A}$ as

[TABLE]

and let ${\mathcal{N}}_{T}$ be defined as

[TABLE]

Then, by Corollary 5.5 and the above remarks, ${\bf Y}(p,x,L,s)\in{\mathcal{N}}_{T}$ for every $x\in{\rm Incomp}_{n}(\delta,\nu)$ with $T/2<{\mathcal{T}}_{p}(x,L)\leq T$ . Set $Q:=\big{\{}z\in\mathbb{R}^{n}:\;\big{|}\sum_{i=1}^{n}z_{i}\big{|}\leq C_{\text{\tiny\ref{l: magic vector}}}\sqrt{n}\big{\}}$ . Then the last assertion, together with properties (a) and (d) above, implies

[TABLE]

Thus, we obtain the relation

[TABLE]

Now, let us estimate the probability that $\|(B_{n}^{1}(p)+s\,1_{n-1}1_{n}^{\top})y\|_{2}$ is small for a fixed $y\in{\mathcal{N}}_{T}$ . By our definition of the set ${\mathcal{N}}_{T}$ , we have

[TABLE]

Hence, appying Lemma 3.2, we get

[TABLE]

Observe that for any $z\in[-1,1]^{n}\cap Q$ we have

[TABLE]

where we have used that $s\in[-1,0]$ . Then the above relations, together with a net argument, imply

[TABLE]

The last — and the most important — step of the proof is to bound from above the cardinality of ${\mathcal{N}}_{T}$ . In view of Corollary 5.5 and the definition of $D$ and ${\mathcal{N}}_{T}$ , we have

[TABLE]

Further, observe that by Lemma 5.6, the set $\mathcal{A}$ is $(N,n,K_{\text{\tiny\ref{l: admissibility of A}}},\delta)$ –admissible. Hence, Corollary 4.3 is applicable, and the definition of $D$ gives for all $n$ large enough:

[TABLE]

Combining this with the above relations and recalling that $N=\big{\lfloor}\frac{\nu}{T}\big{\rfloor}-1$ , we obtain

[TABLE]

for all sufficiently large $n$ , where the last relation follows from the choice of $M$ .

Returning to the small ball probability for $s_{\min}(B_{n}(p)+s\,1_{n}1_{n}^{\top})$ , we get

[TABLE]

for all sufficiently large $n$ . Since $\varepsilon\in(0,p/2]$ was chosen arbitrarily, the result follows. ∎

Acknowledgement. I would like to thank the Department of Mathematical and Statistical Sciences, University of Alberta, which I visited in December 2018 and where the first draft of this work was completed. I would also like to thank Prof. Terence Tao and the anonymous Referees for valuable remarks.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Alon and B. Klartag, Optimal compression of approximate inner products and dimension reduction, in 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017 , 639–650, IEEE Computer Soc., Los Alamitos, CA. MR 3734268
2[2] R. Arratia and S. De Salvo, On the singularity of random Bernoulli matrices—novel integer partitions and lower bound expansions, Ann. Comb. 17 (2013), no. 2, 251–274. MR 3056767
3[3] J. Bourgain, V. H. Vu and P. M. Wood, On the singularity probability of discrete random matrices, J. Funct. Anal. 258 (2010), no. 2, 559–603. MR 2557947
4[4] D. Chafaï and K. Tikhomirov, On the convergence of the extremal eigenvalues of empirical covariance matrices with dependence, Probab. Theory Related Fields 170 (2018), no. 3-4, 847–889. MR 3773802
5[5] P. Erdös, On a lemma of Littlewood and Offord, Bull. Amer. Math. Soc. 51 (1945), 898–902. MR 0014608
6[6] J. Kahn, J. Komlós and E. Szemerédi, On the probability that a random ± 1 plus-or-minus 1 \pm 1 -matrix is singular, J. Amer. Math. Soc. 8 (1995), no. 1, 223–240. MR 1260107
7[7] B. Klartag, G. Livshyts, The lower bound for Koldobsky’s slicing inequality via random rounding, ar Xiv:1810.06189
8[8] J. Komlós, On the determinant of ( 0 , 1 ) 0 1 (0,\,1) matrices, Studia Sci. Math. Hungar 2 (1967), 7–21. MR 0221962

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Singularity of random Bernoulli matrices

Abstract.

1. Introduction

Theorem A**.**

2. Proof strategy

Theorem B**.**

3. Preliminaries

Lemma 3.1** (Lévy–Kolmogorov–Rogozin, [13]).**

Lemma 3.2**.**

Remark 3.3**.**

Lemma 3.4**.**

Lemma 3.5**.**

Proof.

Proposition 3.6**.**

Proof.

4. Random averaging in ℓ1(Z)\ell_{1}(\mathbb{Z})ℓ1​(Z)

Remark 4.1**.**

Theorem 4.2**.**

Corollary 4.3**.**

Proof.

Lemma 4.4**.**

Proof.

Proposition 4.5**.**

Lemma 4.6**.**

Proof.

Proof of Proposition 4.5.

Proposition 4.7**.**

Lemma 4.8**.**

Proof.

Lemma 4.9**.**

Proof.

Proof of Proposition 4.7.

Proposition 4.10**.**

Lemma 4.11**.**

Proof.

Proof of Proposition 4.10.

Proof of Theorem 4.2.

5. Proof of Theorem A

Lemma 5.1**.**

Proof.

Remark 5.2**.**

Lemma 5.3**.**

Proof of Lemma 5.3.

Lemma 5.4**.**

Proof.

Corollary 5.5**.**

Lemma 5.6**.**

Proof of Theorem A.

Theorem A.

Theorem B.

Lemma 3.1 (Lévy–Kolmogorov–Rogozin, [13]).

Lemma 3.2.

Remark 3.3.

Lemma 3.4.

Lemma 3.5.

Proposition 3.6.

4. Random averaging in $\ell_{1}(\mathbb{Z})$

Remark 4.1.

Theorem 4.2.

Corollary 4.3.

Lemma 4.4.

Proposition 4.5.

Lemma 4.6.

Proposition 4.7.

Lemma 4.8.

Lemma 4.9.

Proposition 4.10.

Lemma 4.11.

Lemma 5.1.

Remark 5.2.

Lemma 5.3.

Lemma 5.4.

Corollary 5.5.

Lemma 5.6.