Noise sensitivity of the top eigenvector of a Wigner matrix
Charles Bordenave, G\'abor Lugosi, Nikita Zhivotovskiy

TL;DR
This paper studies how the top eigenvector of a Wigner matrix changes when a small or large number of entries are randomly resampled, revealing a phase transition in its sensitivity.
Contribution
It establishes a precise threshold at which the top eigenvector shifts from being nearly aligned to nearly orthogonal due to entry resampling.
Findings
For k much less than N^{5/3}, eigenvectors remain almost collinear.
For k much greater than N^{5/3}, eigenvectors become almost orthogonal.
Identifies a phase transition in eigenvector sensitivity at k ~ N^{5/3}.
Abstract
We investigate the noise sensitivity of the top eigenvector of a Wigner matrix in the following sense. Let be the top eigenvector of an Wigner matrix. Suppose that randomly chosen entries of the matrix are resampled, resulting in another realization of the Wigner matrix with top eigenvector . We prove that, with high probability, when , then and are almost collinear and when , then is almost orthogonal to .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Noise sensitivity of the top eigenvector of a Wigner matrix
††thanks: Gábor Lugosi was supported by the Spanish Ministry of Economy and Competitiveness, Grant PGC2018-101643-B-I00; “High-dimensional problems in structured probabilistic models - Ayudas Fundación BBVA a Equipos de Investigación Cientifica 2017”; and Google Focused Award “Algorithms and Learning for AI”. Charles Bordenave was supported by by the research grants ANR-14-CE25-0014 and ANR-16-CE40-0024-01. Nikita Zhivotovskiy was supported by RSF grant No. 18-11-00132.
Charles Bordenave Institut de Mathématiques de Marseille, CNRS & Aix-Marseille University, Marseille, France.
Gábor Lugosi Department of Economics and Business, Pompeu Fabra University, Barcelona, Spain, [email protected], Pg. Lluís Companys 23, 08010 Barcelona, SpainBarcelona Graduate School of Economics
Nikita Zhivotovskiy This work was prepared while Nikita Zhivotovskiy was a postdoctoral fellow at the department of Mathematics, Technion I.I.T. and researcher at National University Higher School of Economics. Now at Google Research, Brain Team.
Abstract
We investigate the noise sensitivity of the top eigenvector of a Wigner matrix in the following sense. Let be the top eigenvector of an Wigner matrix. Suppose that randomly chosen entries of the matrix are resampled, resulting in another realization of the Wigner matrix with top eigenvector . We prove that, with high probability, when , then and are almost collinear and when , then is almost orthogonal to .
1 Introduction
In this paper we study the noise sensitivity of top eigenvectors of Wigner matrices. For a positive integer , let be a symmetric matrix such that, for , the are independent real random variables, such that for some constant and for all , and . Note that this assumption is satisfied for a wide class of distributions with a sufficiently light tail. Uniformly bounded, sub-gaussian, and sub-exponential distributions fall in this class. To guarantee that is a symmetric matrix, we set . Finally, we assume that the off-diagonal entries have the unit variance: for all , and for all , , for some . Throughout this text, we call such matrix a Wigner matrix. In this paper we are concerned with large matrices and the main results are asymptotic, concerning . The distribution of the entries may change with though we suppress this dependence in the notation. However, the values of and are assumed to be the same for all .
Let be the top eigenvalue of and let denote the corresponding unit eigenvector. In this paper we study the noise sensitivity of . In particular, we are interested in the behavior of the top eigenvector of the symmetric matrix obtained by resampling random entries of . The main finding of the paper is that, with high probability, when , then and are almost collinear and when , then is almost orthogonal to .
Related work and proof technique
Noise sensitivity is an important notion in probability that has been extensively studied since the pioneering work of Benjamini, Kalai, and Schramm [2]. Noise sensitivity has mostly been studied in the context of Boolean functions and it has been shown to have deep connections with threshold phenomena, measure concentration, and isoperimetric inequalities, see Talagrand [22], Friedgut and Kalai [11], Kahn, Kalai, and Linial [16], Bourgain, Kahn, Kalai, Katznelson, and Linial [4] for some of the key early work and Garban [12], Garban and Steif [14], Kalai and Safra [17], O’Donnell [20] for surveys. The key techniques for studying noise sensitivity typically use elements of harmonic analysis, in particular, hypercontractivity ([22], [16]) but also the “randomized algorithm” approach of Schramm and Steif [21] and other techniques, see Garban, Pete, and Schramm [13].
Our approach is inspired by Chatterjee’s work [7] who shows that, for functions of independent standard Gaussian random variables, the notion of noise sensitivity (or “chaos” as Chatterjee calls it) is deeply related to the notion of “superconcentration”.
In fact, a result in a similar spirit to ours for the Gaussian Unitary Ensemble was proved by Chatterjee [7, Section 3.6]. However, instead of resampling random entries of the matrix, the perturbations considered in [7] are different. In Chatterjee’s model, every entry of the matrix is perturbed by replacing by where is an independent copy of and . It is proved in [7] that the top eigenvectors of and are approximately orthogonal (in the sense that the expectation of their inner product goes to zero as ) as soon as .
Chatterjee uses this example to illustrate how “superconcentration” implies “chaos”. His techniques crucially depend on the Gaussian assumption as in that case explicit formulas may be exploited. Our techniques are similar in the sense that our starting point is also “superconcentration” (i.e., the fact that the variance of the largest eigenvalue of a Wigner matrix is small). However, outside of the Gaussian realm, the notions of superconcentration and chaos are murkier. Starting from a general formula for the variance of a function of independent random variables, due to Chatterjee [5], we establish a monotonicity lemma that allows us to make the connection between the variance of the top eigenvalue and the inner product of interest. Then we use the fact that the top eigenvector has a small variance (i.e., in a sense, it is “superconcentrated”). The monotonicity lemma may be of independent interest and it may have further uses when one tries to prove that “superconcentration implies chaos” for functions of independent–not necessarily Gaussian–random variables.
Result
To formally describe the setup, let be a symmetric Wigner matrix as defined above. For a positive integer , let the random matrix be defined as follows. Let be a set of pairs chosen uniformly at random (without replacement) from the set of all ordered pairs of indices with . We also assume that is independent of the entries of . The entries of above the diagonal are
[TABLE]
where are independent random variables, independent of and has the same distribution as , for all . In words, is obtained from by resampling random entries of the matrix above and including the diagonal and also the corresponding terms below the diagonal. Clearly, has the same distribution as . Denote unit eigenvectors corresponding to the largest eigenvalues of and by and , respectively. Note that with overwhelming probability, the spectrum of a Wigner matrix is simple and, in particular, the top unit eigenvector is unique (up to changing the sign), see [1].
Our main results are the following.
Theorem 1**.**
Assume that is a Wigner matrix as above. If , then
[TABLE]
Conversely, our second result asserts that when then and are almost aligned.
Theorem 2**.**
Assume that is a Wigner matrix as above. There exists a constant such that, with ,
[TABLE]
The proof of Theorem 2 actually establishes that goes to [math] in probability.
The following heuristic argument may provide an intuition of why the threshold in the lower bound of Theorem 2 is at . Since the seminal work of Erdős, Schlein, and Yau [10], it is well known that unit eigenvectors of random matrices are delocalized in the sense that with high probability. Denoting the top eigenvalue of by , we might infer from the derivative of a simple eigenvalue as the function of the matrix entries that
[TABLE]
where is the -th component of Assuming that is nearly independent of any matrix entry , since is centered with unit variance, we would get from the central limit theorem that
[TABLE]
On the other hand, the known behavior of random matrices at the edge of the spectrum implies that the second largest eigenvalue of is at distance of order from . The above heuristic should thus break down when is of order . It gives the threshold at .
To get an idea of how Theorem 1 is proved, consider the variance of the largest eigenvalue of . The key inequality we prove is that
[TABLE]
By the Tracy-Widom law [24, 25] for the largest eigenvalue, we expect that is of order , which implies the desired asymptotic orthogonality whenever . The proof of the inequality above is based on a variance formula for general functions of independent random variables due to Chatterjee [5], see Lemma 1 below. The variance formula suggests that small variance implies noise sensitivity of the top eigenvalue in a certain sense. This is made precise by Lemmas 2 and 3. Finally, noise sensitivity of the top eigenvalue translates to the inequality above.
Remark. We expect that the arguments of Theorem 1 for the noise sensitivity of the top eigenvalue may be modified to prove analogous results for the eigenvector corresponding to the -th largest eigenvalue, . However, the threshold is expected to occur at values different from . In particular, a simple heuristic argument suggests that for the -th eigenvector the threshold occurs around . However, to keep the presentation transparent, in this paper we focus on the top eigenvalue.
Interestingly, the proof that the top eigenvalue is very sensitive to resampling more than entries involves proving that it is insensitive to resampling just a single entry. As a consequence the proofs of Theorems 1 and 2 share common techniques.
The rest of the paper is dedicated to proving Theorems 1 and 2. In Section 2 we introduce a general tool for proving noise sensitivity that generalizes Chatterjee’s ideas based of “superconcentration” to functions of independent, not necessarily standard normal random variables. In Section 3 we summarize some of the tools from random matrix theory that are crucial for our arguments. In Sections 4 and 5 we give the proofs of Theorems 1 and 2.
2 Variance and noise sensitivity
The first building block in the proof of Theorem 1 is a formula for the variance of an arbitrary function of independent random variables, due to Chatterjee [5]. For any positive integer , denote .
Lemma 1**.**
[5]* Let be independent random variables taking values in some set and let be a measurable function. Denote . Let be an independent copy of . Under the notation*
[TABLE]
and, in particular, and , we have
[TABLE]
In general, for let denote the random vector, obtained from by replacing the components indexed by by corresponding components of .
In the variance formula above, the order of the variables does not matter and the formula remains valid after permuting the indices arbitrarily. In particular, one may take the variables in random order. Thus, if is a random permutation sampled uniformly from the symmetric group and denotes , then
[TABLE]
Note that on the right-hand side of (2.1) the expectation is taken with respect to both , and the random permutation .
One would intuitively expect that the terms on the right-hand side of (2.1) decrease with , as the differences and become less correlated as more randomly chosen components get resampled. This is indeed the case and this fact is one of our main tools in proving noise sensitivity. We believe that the following lemma can be useful in diverse situations. The proof is given in Section 4.1 below.
Lemma 2**.**
Consider the setup of Lemma 1 and the notation above. For , denote
[TABLE]
where the expectation is taken with respect to components of vectors and random permutations. Then for all and . In particular, for any ,
[TABLE]
We also introduce a modification of Lemma 2 that will be more convenient for our purposes. To do so, we introduce the following notation. Let have uniform distribution on . Let denote the vector obtained from by replacing its -th component by an independent copy of the random variable , denoted by . Observe that may belong to and in this case is independent of appearing in . With this notation in mind we may prove the following version of Lemma 2.
Lemma 3**.**
Using the notation of Lemma 2, assuming that is chosen uniformly at random from the set and independently of other random variables involved, we have for any ,
[TABLE]
where for any ,
[TABLE]
3 Random matrix results
In the proof of Theorem 1 we apply Lemma 3 with being the top eigenvalue of a Wigner matrix. The usefulness of this bound crucially hinges on the fact that the variance of the top eigenvalue is small, that is, in a sense, the top eigenvalue is “superconcentrated”. This fact is quantified in this section.
Our first lemma on the variance of is obtained as a combination of a result of Ledoux and Rider [19] on Gaussian ensembles and the universality of fluctuations for Wigner matrices as stated in Erdős, Yau and Yin [9].
Lemma 4**.**
Assume that is a Wigner matrix as in Theorem 1. Let denote the largest eigenvalue of . Then,
[TABLE]
where is an absolute constant.
Remark. The result of Lemma 4 implies an improved version of the variance bound
[TABLE]
following from [9, Theorem 2.2].
We also need the following delocalization result of the top eigenvector of a Wigner matrix which can be found in Tao and Vu [23, Proposition 1.12].
Lemma 5**.**
[23].* Assume that is a Wigner matrix as in Theorem 1. For any real , there exists a constant , such that, with probability at least , any eigenvector of with satisfies*
[TABLE]
Our final lemma is a perturbation inequality in -norm of the top eigenvector of a Wigner matrix when a single entry is re-sampled. The proof uses precise estimates on the eigenvalue spacings in Wigner matrices proved in Tao and Vu [23] and Erdős, Yau, and Yin [9].
Lemma 6**.**
Let be a Wigner matrix as in Theorem 1 and be an independent copy of . For any with . Denote by the symmetric matrix obtained from by replacing the entry by and by . For any , there exists such that, for all large enough, with probability at least ,
[TABLE]
where and are any unit eigenvectors corresponding to the largest eigenvalues of and .
4 Proof of Theorem 1
Now we are ready for the proof of the main results of the paper.
We start by fixing some notation. Let denote the largest eigenvalue of the Wigner matrix of Theorem 1 and let be a corresponding normalized eigenvector. Let to be specified later and let be the random symmetric matrix obtained by resampling random entries above the diagonal and including the diagonal, as defined in the introduction. We denote by the set of random positions of the resampled entries. Let denote the top eigenvalue of and a corresponding normalized eigenvector.
For , we denote by the symmetric matrix obtained from by replacing the entry by where is an independent copy of . We obtain from by the same operation. We denote by , and the top eigenvalue/eigenvector pairs of and , respectively. Let be a pair of indices chosen uniformly at random from and satisfying . For ease of notation, we set , and . We define similarly , and .
By applying Lemma 3 to the function of independent random variables , we obtain that, for any ,
[TABLE]
In what follows, we show that the right-hand side of (4.1) satisfies
[TABLE]
This relation, combined with Lemma 4 and (4.1), implies
[TABLE]
which is sufficient for Theorem 1. We proceed with the formal argument.
Using the notation of the previous section we have
[TABLE]
Using the fact that maximizes and maximizes we have
[TABLE]
Observe that the elements of are all zeros except at most two that correspond to resampled values. If the element of was resampled to get , we have, for any vector ,
[TABLE]
with . Similarly, if we set , we have . Therefore, it is straightforward to see that
[TABLE]
where we have set,
[TABLE]
and for ,
[TABLE]
In order to have some extra independence, we introduce yet another independent copy of our random variables. For , let be the symmetric matrix obtained from by replacing the entry by where is an independent copy of , independent of and . We obtain from by the same operation. As above, we denote by , and the top unit eigenvector of and , respectively. For ease of notation, with as above, we define and . The key observation is that is independent of and .
Fix and let be as in Lemma 5 for . We define to be the intersection of the following two events:
- •
: for all : .
- •
: for all .
By Lemmas 5, 6, and the union bound, we have, for all large enough, and for some , (provided that we choose properly the -phase for the eigenvectors , , and ). Observe that when holds, for all
[TABLE]
we have, for all large enough,
[TABLE]
We show this, for brevity, only for . Denoting and , we write
[TABLE]
Then open the brackets and use that, on ,
[TABLE]
If holds, we thus have
[TABLE]
On the other hand, if holds, we get
[TABLE]
Finally, if does not hold, using that all the vectors are of unit norm (and therefore, ), we have
[TABLE]
The same bounds hold for on and . Note also that for some constant depending on . Combining altogether the last three bounds, by the Cauchy-Schwarz inequality, we arrive at
[TABLE]
Recalling (4.1), we find
[TABLE]
Integrating over the random choice of , we have
[TABLE]
Now, using (4.3) and using , we get
[TABLE]
where if , and
[TABLE]
Note that for , and . We have
[TABLE]
Hence, using that the variable is independent of the vectors , we deduce that
[TABLE]
where
[TABLE]
We now argue that in (4.5), we may replace the vectors and by and respectively. We repeat the above argument. Recall the event defined above. As already pointed, on the event , we have
[TABLE]
If holds, we have
[TABLE]
Finally, there is the deterministic bound
[TABLE]
Combining the last three bounds we obtain that
[TABLE]
The right-hand side is upper bounded by . We thus have proved that
[TABLE]
with . As already pointed, by Lemmas 5, 6, and the union bound, we have, for all large enough, and . It follows that with .
Now, combining Jensen’s inequality and (4.6),
[TABLE]
From Lemma 4, the claim follows.
4.1 Proof of Lemma 2 and Lemma 3
We start with the following technical lemma.
Lemma 7**.**
Let be a measurable function and let be any fixed permutation. Fix and such that . Let be independent random variables taking values in . Then
[TABLE]
Proof. Without loss of generality, we may consider one particular permutation , defined as follows: set for , , , and we may also assume that . The proof is identical for any other and . In our case,
[TABLE]
Moreover, we have
[TABLE]
We introduce a simplifying notation. Denote , and . Therefore, we may rewrite
[TABLE]
and
[TABLE]
Denote h(X_{1},X_{1}^{\prime},X_{i+1},C)=\mathbb{E}[\left(f(X_{1},B,X_{i+1},C)-f(X_{1}^{\prime},B,X_{i+1},C)\right)\big{|}X_{1},X_{1}^{\prime},X_{i+1},C]. Using the independence of and their independence of the remaining random variables, we have
[TABLE]
At the same time, using the same notation for we have, by the Cauchy-Shwarz inequality and the fact that and have the same distribution,
[TABLE]
Now to prove that , it is sufficient to show that . Denoting , we have
[TABLE]
where we used Jensen’s inequality and that .
We proceed with the proof of Lemma 2.
Proof. In this proof by writing we mean . For each permutation and fixed we construct a corresponding permutation by defining and for .
It is straightforward to see that for any fixed there is a one-to-one correspondence between and . By observing that and we have, conditionally on ,
[TABLE]
where in the last step we used Lemma 7. Using the one to one correspondence between all and , we have
[TABLE]
The proof that follows from Lemma 7 as well.
Finally, we prove Lemma 3.
Proof. To prove this Lemma we show an upper bound for . We have,
[TABLE]
Observe that and the second summand is equal to . We proceed with the first summand. For , we have
[TABLE]
Finally, we prove that
[TABLE]
Without loss of generality, we consider a particular choice of and such that , for and . Therefore, (4.7) will follow from
[TABLE]
Since , we have . This implies that (4.7) is valid whenever
[TABLE]
As in the proof of Lemma 7, this relation holds due to Jensen’s inequality. These lines together imply that
[TABLE]
which, using Lemma 2, proves the claim.
4.2 Proof of Lemma 4
We start with a special case. Let us say that a Wigner matrix as in Theorem 1 is standard if for all , . In this case, the variance of the entries of is equal to the variance of the entries of a random matrix sampled from the Gaussian Orthogonal Ensemble (GOE). If is the largest eigenvalue of , it follows from [19, Corollary 3] that for some absolute constant ,
[TABLE]
On the other hand, it follows from [9, Theorem 2.4] (see also [18, Theorem 1.6] for a statement which can be used directly) that,
[TABLE]
We obtain the first claim of the lemma for standard Wigner matrices. To conclude the proof of the lemma for Wigner matrices, it suffices to prove that for any Wigner matrix , for some , we have for all large enough,
[TABLE]
where is the largest eigenvalue of a matrix obtained from by setting to [math] all diagonal entries. We will prove it for any (an improvement of the forthcoming Lemma 11 would give (4.8) for any ). The proof requires some care since the operator norm of may be much larger than and the rank of could be .
There is an easy inequality which is half of (4.8). Let be a unit eigenvector of with eigenvalue . We have
[TABLE]
where is the -th coordinate of . We observe that is independent of for all and for . Denoting , by the Cauchy-Schwarz inequality, we deduce that
[TABLE]
We write, . From Lemma 5 applied to , we deduce that for some constant ,
[TABLE]
It implies the easy half of (4.8) for any .
The proof of the converse inequality is more involved. Fore ease of notation, we introduce the number for ,
[TABLE]
We say that a sequence of events holds with overwhelming probability if for any , there exists a constant such that . We repeatedly use the fact that a polynomial intersection of events of overwhelming probability is an event of overwhelming probability. We start with a small deviation lemma which can be found, for example, in [8, Appendix B].
Lemma 8**.**
Assume that are independent centered complex variables such that for some , for all , . Then, for any with overwhelming probability,
[TABLE]
For with and , we introduce the resolvent matrices
[TABLE]
where denotes the identity matrix. The following lemma asserts that the resolvent can be used to estimate the largest eigenvalue of and .
Lemma 9**.**
Let be a Wigner matrix as in Theorem 1 and let be its eigenvalues. For any , there exists an integer such that for all and
[TABLE]
Moreover, let . There exists such that with overwhelming probability, we have and for all integers , and all such that ,
[TABLE]
Proof. From the spectral theorem, we have
[TABLE]
where is an orthonormal basis of eigenvectors of and is the -th coordinate of . In particular,
[TABLE]
From the pigeonhole principle, for some , and the first statement of the lemma follows.
Fix an integer . From [9, Theorem 2.2] and Lemma 5, for some constants , we have, with overwhelming probability, that the following event holds: , for all integers ,
[TABLE]
and . We set for some . Let be such that . On the event , if is large enough, we have, for all , and
[TABLE]
On the other hand, on the same event , we have
[TABLE]
It remains to adjust the value of the constant to conclude the proof.
The next step in the proof of (4.8) is a comparison between the resolvent of and for close to . The following result is a corollary of [9, Theorem 2.1 (ii)].
Lemma 10**.**
Let be a Wigner matrix as in Theorem 1. There exists such that, with overwhelming probability, the following event holds: for all such that and , all , we have
[TABLE]
where .
Proof. Let and for , , . We have . Theorem 2.1 (ii) in [9] asserts that with overwhelming probability for all such that and , all , we have
[TABLE]
where and is the Cauchy-Stieltjes transform of the semi-circular law (for its precise definition see [9]). Then [9, Lemma 3.4] implies that, for some , for all , and , we have and . We apply the above result for and . We obtain the claimed statement for .
We use Lemma 10 to estimate the difference between and .
Lemma 11**.**
Let be a Wigner matrix as in Theorem 1, let be obtained from by setting to [math] all diagonal entries, and let be as in Lemma 9. With overwhelming probability, the following event holds: for all such that and , all ,
[TABLE]
Proof. The resolvent identity states that if and are invertible matrices then
[TABLE]
Applying twice this identity, it implies that
[TABLE]
(where we omit to write the parameter for ease of notation). For any integer , we thus have
[TABLE]
Note that is independent of . By Lemma 8 and Lemma 10 we find that, with overwhelming probability,
[TABLE]
For a given such that and , it is straightforward to check that, for some , and .
Similarly, we have
[TABLE]
For a given , by Lemma 8 and Lemma 10, we have with overwhelming probability, for all , and .
For a given , let be the event that and the event that . We have proved so far that for a given such that and , with overwhelming probability, holds. By a net argument, it implies that with overwhelming probability, the events hold jointly for all with and . Indeed, from the resolvent identity (4.10), we have . It follows that if then . Let be a finite subset of the interval such that for all , . We may assume that has at most elements. From what precedes we have the inclusion, with ,
[TABLE]
From the union bound, the right-hand side holds with overwhelming probability. It concludes the proof of the lemma.
Now we have all ingredients necessary to conclude the proof of (4.8). Let . We prove that for some , with overwhelming probability,
[TABLE]
By Lemma 9, with overwhelming probability, and for some ,
[TABLE]
and if ,
[TABLE]
By Lemma 11, we deduce that with overwhelming probability, if ,
[TABLE]
Hence, , concluding the proof of (4.8).
4.3 Proof of Lemma 6
Let be the eigenvalues of . For any , let be the largest eigenvalue of . We start by proving that and are close compared to their fluctuations. We have
[TABLE]
where is as in Lemma 6. Since and have the same distribution, we deduce from Lemma 5 that, for any , there exists such that with probability at least , and . For all large enough, we have , where is defined in (4.9). Hence for any , for some new constant , with probability at least , and . Since can be taken arbitrarily large, we deduce that with overwhelming probability, , and . On this event, we get
[TABLE]
Reversing the role and and using the union bound, we deduce that, with overwhelming probability,
[TABLE]
It follows from [23, Theorem 1.14] that, for any , there exists such that, for all large enough,
[TABLE]
Let be an orthonormal basis of eigenvectors of associated to the eigenvalues with . We set and . For some constant to be defined and , we introduce the event such that
- •
and ;
- •
and ;
- •
.
From what precedes, Lemma 5 and [9, Theorem 2.2], for some small enough, for any there exits such that for all large enough, . Note also, that we have checked that if holds then .
On the event , we now prove that and are close in -norm. For a fixed , we write, , where with non-negative real numbers, is a unit vector in the vector space spanned by , and is a unit vector in the vector space spanned by . Set
[TABLE]
We have
[TABLE]
Taking the scalar product with , we find
[TABLE]
Hence,
[TABLE]
Similarly, taking the scalar product with , we find
[TABLE]
Since where is the number of non-zeros entries of , we have . By construction, where . If holds, using the Cauchy-Schwarz inequality and , we deduce that
[TABLE]
So finally,
[TABLE]
We deduce that is positive for all large enough. We set . We find, since ,
[TABLE]
For our choice of , this last expression is . Indeed, we have
[TABLE]
Since , we have . Hence, finally, if we set , we get that . This concludes the proof of the lemma.
5 Proof of Theorem 2
The proof of Theorem 2 relies on the rigorous justification of the heuristic argument sketched below the statement of Theorem 2, see the forthcoming Lemma 12. This is performed by a careful perturbation argument on the resolvent in Lemma 13. Indeed, the resolvent has nice analytical properties and it is intimately connected to the spectrum, as illustrated in Lemma 14.
Recall that is the set of pairs chosen uniformly at random (without replacement) from the set of all ordered pairs of indices with which is used in the definition of . We denote by and the largest eigenvalues of and . Recall the definition of in (4.9) and the notion of overwhelming probability immediately below (4.9). The main technical lemma is the following:
Lemma 12**.**
Let be a Wigner matrix as in Theorem 2 and let be its eigenvalues. For any there exists a constant such that for all , for all large enough, with probability at least ,
[TABLE]
We postpone the proof of Lemma 12 to the next subsection. We denote by and the resolvent of and . The proof of Lemma 12 is based on this comparison lemma on the resolvents.
Lemma 13**.**
Let be a Wigner matrix as in Theorem 1. Let be as in Lemma 9 and let . There exists such that, with overwhelming probability, the following event holds: for all , for all such that and ,
[TABLE]
We postpone the proof of Lemma 13 to the next subsection. Our next lemma connects the resolvent with eigenvectors.
Lemma 14**.**
Let be a Wigner matrix as in Theorem 1 and let . There exist such that the following event holds for all large enough with probability at least : for all , we have, with , ,
[TABLE]
Proof. Let be the eigenvalues of . Let be an eigenvector basis of . Recall that
[TABLE]
As in the proof of Lemma 9, from [9, Theorem 2.2] and Lemma 5, for some constants , we have with overwhelming probability that the following event holds: , for all integers , and for all with and such that we have
[TABLE]
On the other hand, let be the event that . Fix . From [3, Theorem 2.7] and, e.g., [1, Chapter 3], there exists such that
[TABLE]
On the event , if , we have
[TABLE]
Finally, if , on the event , we find easily, if is i-th coordinate of ,
[TABLE]
For some , we thus find, that if then on the event , for all such that we have
[TABLE]
We apply this last estimate and . For each , let be the event corresponding to for instead of . We apply the above estimate on the event to and . By Lemma 12 and the union bound has probability at least . It concludes the proof.
We may now conclude the proof of Theorem 2. Let be as in Lemma 14, and . Up to increasing the value of , we may also assume that the conclusion of Lemma 13 holds. By Lemma 5, Lemma 13 and Lemma 14, for any , for all large enough, with probability at least , it holds that for some : , and
[TABLE]
Applied to , we get that for some ,
[TABLE]
Notably, we find
[TABLE]
Let . It follows from the above inequality that for , . Let be this common value. We have for all ,
[TABLE]
Moreover, for all , by definition,
[TABLE]
It concludes the proof of Theorem 2.
5.1 Proof of Lemma 13
The proof of Lemma 13 is based on a technical martingale argument. Thanks to the resolvent identity (4.10), we will write as a sum of martingale differences up to small error terms, this is performed in Equation (5.5). These martingales will allow us to use concentration inequalities. Each term of the martingale differences will be estimated thanks to the upper bound on resolvent entries given in Lemma 10.
We apply many times the resolvent identity and for technical convenience, it will be easier to have a uniform bound on our random variables. We thus start by truncating our random variables . Set and with . The matrix has independent entries above the diagonal. Moreover, since , with overwhelming probability, and . It is also straightforward to check that . It implies that and for . We define the matrix with for ,
[TABLE]
The matrix is a Wigner matrix as in Theorem 2 with entries in . Moreover, from Gershgorin’s circle theorem [15, Theorem 6.6.1], with overwhelming probability, the operator norm of satisfies . Observe that from the spectral theorem, for any Hermitian matrix , . In particular, from the resolvent identity (4.10), we get if . The same truncation procedure applies for . In the proof of Lemma 13, we may thus assume without loss of generality that the random variables have support in .
It will also be convenient to assume that the random subset does not contain too many points on a given row or column. To that end, for , let be the -algebra generated by the random variable , and . For , we set
[TABLE]
Note that is -measurable. We have
[TABLE]
Besides, from [6, Proposition 1.1], for any ,
[TABLE]
If , it follows that with overwhelming probability, the following event, say , holds: where for ease of notation we have set
[TABLE]
Now, let be as in Lemma 10 and, for , we denote by the event that holds and that the conclusion of Lemma 10 holds for and (with the convention ). If holds, then for all with and , we have,
[TABLE]
where .
After these preliminaries, we may now write the resolvent expansion. Our goal is to write as a sum of martingale differences up to error terms. The outcome will be Equation (5.5) below. We define as the symmetric matrix obtained from by setting to [math] the entries and . By construction is -measurable. We denote by the resolvent of . The resolvent identity (4.10) implies that
[TABLE]
(we omit to write the parameter for ease of notation). Now, we set for , and , where denotes the canonical vector of with all entries equal to [math] except the -th entry equal to . We have
[TABLE]
We use that and . If holds, we deduce that for all with and , we have
[TABLE]
Similarly, the resolvent identity (4.10) with and implies that, if holds, for all with and , we have
[TABLE]
Finally, the resolvent identity with and gives
[TABLE]
Note that, . We use , from (5.1)-(5.2)-(5.3), we deduce that
[TABLE]
where and, if holds,
[TABLE]
We rewrite, one last time the resolvent identity with and :
[TABLE]
If holds, we arrive at,
[TABLE]
where . We have thus found that
[TABLE]
where we have set, with , ,
[TABLE]
In this final step of the proof, we use concentration inequalities to estimate the terms in (5.5). We set . We write, for any ,
[TABLE]
By Lemma 10, we have for any , . Since , we have that . Also, from (5.2)-(5.4), . On the event , we have
[TABLE]
Azuma-Hoeffding martingale inequality implies that, for ,
[TABLE]
We apply the later inequality to . We deduce that, with overwhelming probability,
[TABLE]
We may treat similarly the random variable in (5.5). We set . Note that is -measurable and . Thus . Moreover, since , from (5.2), we find . If holds, we get
[TABLE]
We write, for ,
[TABLE]
From Azuma-Hoeffding martingale inequality, we deduce that, with overwhelming probability,
[TABLE]
We now estimate the random variable in (5.5). We will also use Azuma-Hoeffding inequality but we need to introduce a backward filtration (because we have to deal with the random variables instead of as in ). We define as the -algebra generated by the random variables, , and . By construction and are -measurable random variables. Let be the event that holds and that the conclusion of Lemma 10 holds for . If holds, then for all with and , we have,
[TABLE]
Arguing as in (5.2), if holds then
[TABLE]
The variable is -measurable and . We write, for ,
[TABLE]
where . We have and
[TABLE]
Arguing as above, from Azuma-Hoeffding martingale inequality, we deduce that with overwhelming probability,
[TABLE]
Similarly, repeating the argument leading to (5.7) with and the filtration gives with overwhelming probability,
[TABLE]
We note also that if holds then
[TABLE]
where the last inequality holds provided that . So finally, from (5.5)-(5.6)-(5.7)-(5.8)-(5.9), we have proved that for a given such that and , with overwhelming probability
[TABLE]
where the inequality holds provided that . Recall that . By a net argument (as in the proof of Lemma 11), we deduce that with overwhelming probability for all such that , . It concludes the proof of Lemma 13.
5.2 Proof of Lemma 12
Let be as in Lemma 9 and . We set and let . Let . We start with by bounding and . Since and have the same distribution, we only prove that with overwhelming probability
[TABLE]
By Lemma 9, with overwhelming probability, and for some integer ,
[TABLE]
and,
[TABLE]
By Lemma 13, we deduce that if , with overwhelming probability,
[TABLE]
It proves (5.10).
We may now conclude the proof of Lemma 12. Fix . As already noticed, from [3, Theorem 2.7], there exists such that, with probability at least , . From what precedes, with probability at least , holds and for all , we have
[TABLE]
with . On this event, we readily find and for some , . Assume that this last inequality is false for . Since , if , then and we deduce that . We note that, on our event, for some , we have . In particular, . So necessarily, and, from the triangle inequality, . This is a contradiction since . It concludes the proof of Lemma 12.
Acknowledgments We would like to thank Jaehun Lee for pointing out a mistake in the proof of Lemma 3 in an early version of this paper. We also would like to thank the referees for their valuable reports.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices , volume 118 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge, 2010.
- 2[2] I. Benjamini, G. Kalai, and O. Schramm. Noise sensitivity of Boolean functions and applications to percolation. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques , 90(1):5–43, 1999.
- 3[3] Paul Bourgade, László Erdős, and Horng-Tzer Yau. Edge universality of beta ensembles. Comm. Math. Phys. , 332(1):261–353, 2014.
- 4[4] J. Bourgain, J. Kahn, G. Kalai, Y. Katznelson, and N. Linial. The influence of variables in product spaces. Israel Journal of Mathematics , 77(1-2):55–64, 1992.
- 5[5] Sourav Chatterjee. Concentration inequalities with exchangeable pairs (Ph. D. thesis) . Ph D thesis, Stanford University, 2005.
- 6[6] Sourav Chatterjee. Stein’s method for concentration inequalities. Probab. Theory Related Fields , 138(1-2):305–321, 2007.
- 7[7] Sourav Chatterjee. Superconcentration and related topics . Springer, 2016.
- 8[8] László Erdős, Horng-Tzer Yau, and Jun Yin. Bulk universality for generalized Wigner matrices. Probab. Theory Related Fields , 154(1-2):341–407, 2012.
