An Exposition on Wigner's Semicircular Law
Wooyoung Chin

TL;DR
This paper refines the semicircular law for Hermitian random matrices using the moment method, accommodating broader conditions including infinite variance cases, and provides detailed expositions suitable for newcomers.
Contribution
It introduces a strengthened semicircular law under weaker assumptions and extends results to cases with row sums converging to normal distribution, including infinite variance scenarios.
Findings
Strengthened semicircular law under minimal assumptions
Extension to matrices with infinite variance entries
Detailed exposition suitable for beginners
Abstract
We revisit the moment method to obtain a slightly strengthened version of the usual semicircular law. Our version assumes only that the upper triangular entries of Hermitian random matrices are independent, have mean zero and variances close to in a certain sense, and satisfy a Lindeberg-type condition. As an application, we derive another semicircular law for the case when the sum of a row converges in distribution to the standard normal distribution, including the case where all matrix entries may have infinite variance. The appendix, making up the majority of the paper, provides for those new to the subject, a rigorous exposition of most details involved, including also a proof of a semicircular law that uses the Stieltjes transform method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Advanced Combinatorial Mathematics · Advanced Algebra and Geometry
An Exposition on Wigner’s Semicircular Law
Wooyoung Chin
Abstract
We revisit the moment method to obtain a slightly strengthened version of the usual semicircular law. Our version assumes only that the upper triangular entries of Hermitian random matrices are independent, have mean zero and variances close to in a certain sense, and satisfy a Lindeberg-type condition. As an application, we derive another semicircular law for the case when the sum of a row converges in distribution to the standard normal distribution, including the case where all matrix entries may have infinite variance. The appendix, making up the majority of the paper, provides for those new to the subject, a rigorous exposition of most details involved, including also a proof of a semicircular law that uses the Stieltjes transform method.
Contents
1 Introduction
If is an Hermitian matrix, then is diagonalizable and all eigenvalues of are real. We denote the eigenvalues of , counted with multiplicities, as
[TABLE]
(For details, see Subsection B.1.) We define the spectral distribution of as the Borel probability measure
[TABLE]
on . If is a random Hermitian matrix, then is a random Borel probability measure on . (For details, see Subsection B.4.)
Definition 1.1** (The semicircle distribution).**
The Borel probability measure on given by
[TABLE]
is called the semicircle distribution. Here, .
Since the seminal work [Wig55] by Wigner, there have been many theorems that assume to be random , Hermitian matrices satisfying certain conditions, and show that converges in some sense to . Let us call such theorems semicircular laws.
In the main part of this paper, we study semicircular laws assuming joint independence of the upper triangular entries (we include the diagonal in both the upper and the lower triangles). We first prove a semicircular law (Theorem 1.2) with rather weak assumptions. In particular, we don’t require the entries to be identically distributed, and we allow the entries to deviate from unit variance. It is notable that other than the mostly standard reduction steps, our proof is just a simple application of the moment method.
After proving the main theorem, we apply the theorem to obtain another semicircular law (Theorem 5.2) which more or less assumes that the sum of a row converges in distribution to the standard normal distribution. This theorem allows the entries to have infinite variances.
The appendices make up about two thirds of this paper. There we provide a self-contained and rigorous account of the details (including the measure-theoretic ones) involved in the main part of the paper. In the main part of the paper, we refer to the appendix whenever we need a fact given there. After that, for completeness we provide a proof of a semicircular law (little weaker than the one proved in the main part, but still stronger than the laws appear in many textbooks) which uses the Stieltjes transform method.
We assumed no prior knowledge more advanced than one-semester courses in probability theory and combinatorics. A total newcomer to the field might want to read the Appendices A–C first, then read the main part, and then go to Appendix D–E.
Now we state our main theorem.
Theorem 1.2** (A general semicircular law).**
For each , let be a random Hermitian matrix (see Definition B.13) whose upper triangular entries are jointly independent, have mean zero, and have finite variances. We assume that are defined on the same probability space. If
[TABLE]
[TABLE]
and
[TABLE]
then as a.s.
Remark 1.3*.*
One sufficient condition for (1.1) and (1.2) to hold is
[TABLE]
Note that we can take to show (1.2). An even more special case is when we have \operatorname{\mathbf{Var}}\bigl{[}w_{ij}^{(n)}\bigr{]}=1/n for all and . This case is Theorem 2.9 in [BS10]. If there is some finite such that
[TABLE]
then (1.2) holds for the same . This case is more or less equivalent to Corollary 1 in [GNT15], which is proved first for matrices with Gaussian entries, and then generalized to arbitrary matrices by proving an analogue of the Lindeberg universality principle for random matrices. In this paper, we will prove Theorem 1.2 directly by the moment method without appealing to the universality principle.
Remark 1.4*.*
Theorem 1.2 assumes no dependence between , , …, yet it asserts an a.s. convergence. This is in contrast to some versions of the semicircular law where only convergence in probability is asserted (e.g. [AGZ10, Theorem 2.1.1]), or is assumed to be the top left minor of a fixed infinite random Hermitian matrix (e.g. [Tao12, Theorem 2.4.2]). If are Borel probability distributions on a separable metric space , and , then the following two statements are equivalent:
- (i)
a.s. whenever are random elements of defined on a common probability space such that each has distribution ; 2. (ii)
\sum_{n=1}^{\infty}\mu_{n}\bigl{(}\{\,x\in S\mid d(x,c)>\epsilon\,\}\bigr{)}<\infty for all .
This can be shown using the Borel-Cantelli lemmas [Bil12, Theorem 4.3 and 4.4]. This type of strong convergence is possible in Theorem 1.2 because of a strong concentration of measure result we will use.
The rest of the paper is organized as follows. In Section 2, we will first reduce Theorem 1.2 to a form with stronger assumptions. Then we will see that the reduced semicircular law follows from some moment computations. In Section 3, we will develop a tool needed for the moment computation, and in Section 4, we will perform the actual moment computation. In Section 5, we will derive the aforementioned semicircular law which assumes Gaussian convergence of the sum of a row.
2 Preliminary reductions
Assume that satisfies the conditions of Theorem 1.2.
2.1 Convergence in expectation is enough
If we have (for the meaning of , see Theorem A.5), then
[TABLE]
for all continuous and bounded . By the concentration inequality Theorem C.3 for spectral measures and the Borel-Cantelli lemma, we have
[TABLE]
for all with , where is on , [math] on , and linear on . This implies that a.s. by Theorem A.3. Therefore, it is enough to show .
2.2 Truncation
Since (1.3) holds, we have positive integers such that
[TABLE]
for all for each . If we let for all , and for all for each , then and
[TABLE]
Let W_{n}^{\prime}:=\bigl{(}w_{ij}^{(n)}\operatorname{\mathbf{1}}(|w_{ij}^{(n)}|\leq\eta_{n})\bigr{)}_{i,j=1}^{n}. Since
[TABLE]
it is enough to show by Corollary B.15 and Theorem A.3.
2.3 Centralization
For each and , let
[TABLE]
and let . Since
[TABLE]
it is enough to show by Corollary B.15 and Theorem A.3.
We claim that satisfies all conditions is supposed to satisfy in Theorem 1.2. The fact that (1.1) and (1.2) still hold even if we replace by follows from the following:
[TABLE]
The condition (1.3) for easily follows from the bound . Since for all and , by doubling , , …we have and . Thus, from now on, we can assume that for some satisfying .
2.4 Rescaling
Fix . We will choose a number for each so that always hold. Start by letting for all . We start with the first row and the first column. If \sum_{j=1}^{n}\operatorname{\mathbf{Var}}\bigl{[}w_{1j}^{(n)}\bigr{]}\leq C, then do nothing. Otherwise, lower (not below [math]) so that
[TABLE]
and let for all . We note that at this point we have
[TABLE]
Assume that , and that we’ve examined up to -th row. If
[TABLE]
then do nothing. Otherwise, lower (not below [math]) so that
[TABLE]
and let for all . At this point we have
[TABLE]
This can be shown by an induction on . After completing the whole process, we are left with numbers such that
[TABLE]
for all , and
[TABLE]
Let \widehat{W}_{n}=\bigl{(}c_{ij}^{(n)}w_{ij}^{(n)}\bigr{)}_{i,j=1}^{n}. Since holds for any , we have
[TABLE]
by (2.2). Thus, by Corollary B.15 and Theorem A.3, it is enough to show .
The altered matrix has an advantage over that (2.1) holds. Also, the modulus of each entry of is still bounded by . We claim that also satisfies all conditions is assumed to satisfy in Theorem 1.2. First, each entry of obviously has mean zero. Also, since each entry of has modulus less than or equal to the corresponding entry of , the condition (1.3) is satisfied by . The condition (1.2) for obviously holds as we have an even stronger property (2.1). Finally, (1.1) for follows from (2.2) and the fact that (1.2) is satisfied by . This proves our claim, and so from now on, we can also assume that (1.4) is true.
2.5 Reduction to moment convergence
On top of the assumptions of Theorem 1.2, we now also have the following.
- (i)
There are with such that for all and . 2. (ii)
There is some finite such that (1.4) holds.
Since , every eigenvalue of has absolute value at most . So, is supported on , and in particular has moments of all orders. As
[TABLE]
for any by the ratio test, is determined by its moments by [Bil12, Theorem 30.1]. Thus, by the moment convergence theorem [Bil12, Theorem 30.2], it is enough to show
[TABLE]
For each , since there are continuous bounded with for all , we have
[TABLE]
On the other hand, we can directly compute the moments of as follows.
Lemma 2.1**.**
For any , we have
[TABLE]
Proof.
A trigonometric substitution gives
[TABLE]
As
[TABLE]
for any , we have
[TABLE]
∎
Note that whenever is odd. Thus, it is enough to show that
[TABLE]
and that
[TABLE]
These will be proved in Section 4 by using the content of Section 3.
3 Trees and products of variances
Our graphs will be undirected. We allow graphs to have loops, but don’t allow them to have multiple edges. Let be a finite graph. For any , denote by the collection of all injections from into . Given any and with ends , we let
[TABLE]
It is well-defined since each is Hermitian. Then we let
[TABLE]
Here stands for “product.” Also, the notation will no longer appear.
Lemma 3.1**.**
If is a finite tree with edges, , , and , then
[TABLE]
Proof.
If , then (3.1) obviously holds. (We define the product of zero terms as .) To proceed by induction, assume that (3.1) holds for , and let be a tree with edges. Choose any leaf of different from , and let be the only vertex of adjacent to . Since
[TABLE]
for all , we have
[TABLE]
by the induction hypothesis. ∎
Lemma 3.2**.**
For any finite tree ,
[TABLE]
Proof.
Let . If , then (3.2) obviously holds. To proceed by induction, assume that the result holds for trees with edges, and let be a tree with edges. Let be a leaf of , and be the only vertex of adjacent to in . Note that
[TABLE]
by the induction hypothesis. By Lemma 3.1, we have
[TABLE]
Combining (3.3), (3.4), and the fact that
[TABLE]
we can conclude that (3.2) holds. ∎
4 Computation of moments
Fix a . Let us call any -tuple with a closed walk of length . If is a closed walk, we let be the graph (possibly having loops but having no multiple edges) with the vertex set and the edge set
[TABLE]
Two closed walks and are said to be isomorphic if for any we have if and only if . If , then a canonical closed walk of length on vertices is a closed walk with such that
- (i)
and 2. (ii)
for each .
Let denote the set of such walks. It is straightforward to show that any closed walk is isomorphic to exactly one canonical closed walk. For any , let denote the set of all closed walks with which are isomorphic to .
Note that
[TABLE]
Here the upper bound of is (rather arbitrarily) set to since is empty for any . We will compute
[TABLE]
for each and .
Lemma 4.1**.**
Let and . If walks on some edge exactly once, i.e. for exactly one , then
[TABLE]
for any and .
Proof.
Since the upper triangular entries of are jointly independent, can be broken into or , and a random variable independent from . Since , the desired conclusion follows. ∎
Lemma 4.2**.**
Let and . Assume that doesn’t walk on any edge exactly once, i.e. for each there is a with such that . Then we have , and the following hold.
- (i)
If , then
[TABLE] 2. (ii)
If , then
[TABLE]
Proof.
As each edge of is walked on at least twice by , the graph has at most edges. Since is a connected graph with vertices, we have , and has a spanning tree with edges. If , then there is an injection with for all .
(i) Assume . Using the bound , and the fact that walks on any edge of at least twice, we can derive
[TABLE]
Note that since . Since the map given by is a bijection, we have
[TABLE]
by Lemma 3.2.
(ii) Assume . Since has edges and each edge of is walked on twice by , we see that . As each edge of is traversed once in each direction, i.e. for each there is an with such that and , we have
[TABLE]
by Lemma 3.2. ∎
Proof of (2.3) and
(2.4).
Lemma 4.1 and 4.2 tell us that we have (4.3) if and only if doesn’t walk on any edge exactly once and . Otherwise, we have (4.2). If is odd, then is not an integer, and so we cannot have . So, for any odd , we have
[TABLE]
by (4.1).
Assume that is even. Let be the set of all which traverses each edge of twice. Then by Lemma 4.2 (ii) and (4.1), we have
[TABLE]
A Dyck path of length is a finite sequence satisfying the following:
- (i)
; 2. (ii)
for all ; 3. (iii)
for all .
Given an , let where is the distance between () and in . Then it is clear that is indeed a Dyck path, and it is not difficult to see that is a bijection from to the set of all Dyck paths of length . It is well-known that there are exactly Dyck paths of length ; see [vLW01, Example 14.8]. Thus, we indeed have
[TABLE]
This finishes the proof of the semicircular law Theorem 1.2. ∎
5 Gaussian convergence
The paper [Jun18] considers real symmetric random matrices with size , , …whose upper triangular entries are i.i.d. In that paper, it is shown that if the sum of a row of converges in distribution to the standard normal distribution as , then as a.s. We prove this fact generalized to random matrices with non-i.i.d. entries in this section. By doing so, we will demonstrate how one can apply Theorem 1.2 to obtain a semicircular law for random matrices whose entries might have infinite variances.
The type of convergence described in the following fact will appear many times in this section.
Proposition 5.1** (Uniform convergence of triangular arrays).**
Let be a topological space, , and be a finite sequence in for each . For any , the following two conditions are equivalent:
- (i)
* as for any choice of for each ;* 2. (ii)
for any neighborhood of , there exists some such that for all and .
Proof.
We omit the straightforward proof. ∎
The following is the main result of this section.
Theorem 5.2** (Gaussian convergence semicircular law).**
For each , let be a random real symmetric matrix whose upper triangular entries are jointly independent and have symmetric distributions. We assume that are defined on the same probability space. Assume that is a null array in the sense that as for any choice of for each . If also for any choice of for each , then as a.s.
The following two facts will be used in the proof of Theorem 5.2.
Theorem 5.3** (Gaussian convergence).**
For each , let be jointly independent real-valued random variables. Assume that as regardless of how we choose for each . Then as if and only if the following conditions hold:
- (i)
* for all ;* 2. (ii)
; 3. (iii)
.
Proof.
See [Kal02, Theorem 5.15]. ∎
Theorem 5.4** (Bernstein’s inequality).**
Suppose that are independent real-valued random variables, each with mean [math], and each bounded by . If , then
[TABLE]
Proof.
The proof of [Bil99, M20] with a slight change works. ∎
Proof of Theorem 5.2.
By Theorem 5.3, we have
[TABLE]
for any choice of for each , for any . Then Proposition 5.1 implies
[TABLE]
Let and . Since
[TABLE]
by bounding from above we would be able to apply Theorem B.11. For any given , we have some such that
[TABLE]
by (5.1). Since , , are jointly independent, Bernstein’s inequality (Theorem 5.4) implies
[TABLE]
As for each , we have
[TABLE]
and therefore
[TABLE]
by (5.2). By Theorem B.11, it now suffices to show as a.s.
We claim that satisfies all conditions of Theorem 1.2. Since each is symmetric, each entry of has mean zero. By Theorem 5.3, we have
[TABLE]
for any choice of for each . So, by using Proposition 5.1, we can see that the conditions 1.1 and 1.2 with replaced by hold. Finally, the condition (1.3) with replaced by follows from
[TABLE]
and (5.1). ∎
Appendix A Probability measures on
A.1 Weak convergence
Definition A.1** (The space ).**
Let denote the set of all Borel probability measures on . We equip with the smallest topology that makes continuous for all continuous bounded . Then we equip with the Borel -algebra.
Note that if , then we have if and only if under the topology of .
Definition A.2** (Lévy metric).**
If and are distribution functions, then the Lévy distance between and is defined by
[TABLE]
It is not difficult to show that is indeed a metric on . For any given , let denote the distribution function of .
Theorem A.3** (Characterizations of weak convergence).**
If , , , , then the following are equivalent:
- (i)
; 2. (ii)
* for all with , where is the function which has value on , has value [math] on , and is linear on ;* 3. (iii)
.
Proof.
(i) implies (ii): Directly follows from the definition of convergence in distribution.
(ii) implies (i): Assume that for all with , and let be the distribution functions of . Let be any continuity point of , and let be given. Since is right continuous, we have for some . If we choose any with , then
[TABLE]
As is also left continuous at , a similar reasoning yields . Since is arbitrary, we have .
(i) implies (iii): Let be given. Choose continuity points of such that , , , and
[TABLE]
Let be such that implies
[TABLE]
Let be arbitrarily given. If , then
[TABLE]
for any . If where , then
[TABLE]
for any . If , then
[TABLE]
for any . Similarly we can show that for all and . So, for all . As was arbitrary, we have .
(iii) implies (i): Let be a continuity point of , and let be given. Since is continuous at , there is a such that for all . Let be such that for all . Then,
[TABLE]
for all . Now observe that
[TABLE]
and
[TABLE]
for all . Since was arbitrary, . ∎
A.2 Expected probability measures
Lemma A.4**.**
For any Borel , the map defined by is measurable.
Proof.
For any and , let be the map which is on , [math] on , and linear on . Then the map is continuous, and so measurable. Since as by bounded convergence, the map is measurable for any . Let be the collection of all Borel such that is measurable. If are disjoint, then
[TABLE]
is measurable, and so . If , then is measurable, and so . These show that is a -system containing for all . As the rays form a -system that generates the Borel -algebra of , the - theorem concludes the proof. ∎
Theorem A.5**.**
Let be a random element of . Then there exists a unique satisfying
[TABLE]
for all . The probability measure satisfies
[TABLE]
for all continuous and bounded .
Proof.
As uniqueness is easy, we only need to show the existence. Define by . Since is surely nondecreasing, is nondecreasing. Since and as surely, and as by bounded convergence. If and , then surely by the right continuity of distribution functions, and so by bounded convergence. This shows that is right continuous, and so the proof that is a distribution function is finished.
Let denote the Borel probability measure on with distribution . For any , we have
[TABLE]
If are disjoint, then
[TABLE]
by monotone convergence. Since any open subset of is a countable union of disjoint open intervals, and any open interval is a countable union of disjoint bounded intervals of the form , we see that
[TABLE]
holds for any open .
Now we show (A.1). By linearity of integral and expectation, we may assume that is nonnegative. For each , let . We want to apply Tonelli’s theorem to the map given by , so we first show that this map is jointly measurable. For each , let by
[TABLE]
Since is measurable for any open , each is measurable. As increases to , we can conclude that is measurable. Now we can use Tonelli’s theorem to conclude that
[TABLE]
∎
Appendix B Spectra of Hermitian matrices
B.1 Basic facts
Recall the following version of the spectral theorem from linear algebra.
Theorem B.1** (Spectral Theorem).**
Let be an -dimensional complex inner product space. For any self-adjoint linear operator , there exists an orthonormal basis of consisting of eigenvectors of .
Proof.
See [Tao12, Theorem 1.3.1] or [HK71, Theorem 9 in Section 9.5]. ∎
Also recall the following.
Proposition B.2**.**
Any eigenvalue of a self-adjoint linear operator on a complex inner product space is real.
Proof.
Let be an eigenvalue of a self-adjoint linear operator on a complex inner product space . If and , then
[TABLE]
and so . ∎
The following naturally follows from Theorem B.1 and Proposition B.2.
Corollary B.3**.**
For any Hermitian matrix , there exists an diagonal matrix with real entries and an unitary matrix such that .
Definition B.4** (Ordered eigenvalues).**
If is an Hermitian matrix, then we denote the eigenvalues of counted with multiplicities as
[TABLE]
Definition B.5** (Spectral distributions).**
If is an Hermitian matrix, then the spectral distribution of is the Borel probability measure on defined by
[TABLE]
We write as a shorthand for .
Theorem B.6** (Courant-Fischer minimax theorem).**
Let be an Hermitian matrix. For each , we have
[TABLE]
and
[TABLE]
where ranges over the subspaces of .
Proof.
We only need to show the first equality, since the second follows by applying the first to . By the spectral theorem, we may assume that is diagonal with at the -entry. Note that if , then we have . If is the subspace spanned by , then . To show the other direction, let be any -dimensional subspace of . If is the subspace spanned by , then follows from
[TABLE]
If we choose any with , then
[TABLE]
∎
Theorem B.7** (Cauchy interlacing law).**
If is an Hermitian matrix and is the top left minor of , then
[TABLE]
for any .
Proof.
For any , we have
[TABLE]
So, by Theorem B.6, we have
[TABLE]
By applying this result to , we have
[TABLE]
∎
B.2 Perturbations by small Frobenius norms
We will show that spectral distributions are stable under two types of perturbations. The first can be described using the following norm.
Definition B.8**.**
If is an complex matrix, then the Frobenius norm of is given by
[TABLE]
Note that the Frobenius norm is just the -norm on . If is a Hermitian matrix, then . The following inequality tells us that the ordered tuple of eigenvalues is stable under perturbations with small Frobenius norms.
Theorem B.9** (Hoffman-Wielandt inequality).**
If and are Hermitian matrices, then
[TABLE]
Proof.
Recall that eigenvalues and traces are similarity invariant. So, by the spectral theorem, we have
[TABLE]
Thus, it is enough to show
[TABLE]
(Recall that the right side of the desired inequality is equal to .) Again by the spectral theorem, we may assume that is diagonal with at its -th entry, and write for some unitary where is the diagonal matrix with as its -th entry. If denotes the -entry of , we have
[TABLE]
It is enough to show that if and , then the maximum of where and is obtained when ; that is, and whenever . Let where and is given. If , we have for some . Since , we have for some . Let , , , and for all other ’s. Then we have and . Also,
[TABLE]
Note that has more ’s on the diagonal than . If we repeat this procedure, we will arrive at , and this shows our claim. ∎
From the Hoffman-Wielandt inequality (Theorem B.9), it follows that the spectral distribution is also stable under perturbations of small Frobenius norms.
Corollary B.10**.**
If and are Hermitian matrices, then
[TABLE]
(For the definition of , see Definition B.8.)
Proof.
For any and , we will show
[TABLE]
and
[TABLE]
Let i:=\bigl{|}\{\,\ell\mid\lambda_{\ell}(A)>x\,\}\bigr{|} and j:=\bigl{|}\{\,\ell\mid\lambda_{\ell}(B)>x+\epsilon\,\}\bigr{|}. Since and for each , we have
[TABLE]
As , (B.1) follows. Now let i^{\prime}:=\bigl{|}\{\,\ell\mid\lambda_{\ell}(A)>x\,\}\bigr{|} and j^{\prime}:=\bigl{|}\{\,\ell\mid\lambda_{\ell}(B)>x-\epsilon\,\}\bigr{|}. Then,
[TABLE]
and so (B.2) follows.
Now let be such that . Then, . Since
[TABLE]
for all , we have , and thus the desired claim follows. ∎
B.3 Perturbations by small ranks
The second type of perturbation is the low-rank perturbation.
Theorem B.11** (Rank inequality).**
If and are Hermitian matrices, then
[TABLE]
where .
Note that .
Proof.
Let . Note that replacing and with and for some unitary doesn’t change each side of the desired inequality. So, using Corollary B.3, we may assume that is diagonal. By swapping rows and columns, we can further assume that
[TABLE]
where is a matrix. If , then and by the Cauchy interlacing law (Theorem B.7), and so
[TABLE]
By the same reasoning, we also have
[TABLE]
and so
[TABLE]
Even if or , this inequality can be proved by a similar argument. Now the desired inequality follows since is arbitrary. ∎
The following is a generalization of Theorem B.11.
Corollary B.12**.**
If and are Hermitian matrices and satisfies , then
[TABLE]
Proof.
Let be such that
[TABLE]
Define by letting for each , extending linearly between and for each , and setting to constants on and . Note that exists as an integrable function, and we have
[TABLE]
for any . Also, as , we have
[TABLE]
Observe that
[TABLE]
Similarly we have
[TABLE]
and so
[TABLE]
by Theorem B.11. ∎
B.4 Random Hermitian matrices
Definition B.13** (Random Hermitian matrices).**
Let denote the space of all Hermitian matrices. We equip with the standard Euclidean metric (and so the metric topology and the Borel -algebra) by identifying with , which is thought to represent the lower triangle of an Hermitian matrix. A random element of is called a random Hermitian matrix.
From Hoffman-Wielandt inequality (Theorem B.9), it follows that the map given by is continuous. This fact combined with the following lemma shows that the spectral distribution of a random Hermitian matrix is measurable.
Lemma B.14**.**
The map given by
[TABLE]
is continuous (and so is measurable).
Proof.
For any continuous bounded , the map is continuous. Thus the given map is continuous by the definition of the topology of weak convergence. ∎
The following is a “random version” of Corollary B.10. If is a random Hermitian matrix, we let be the distribution function of , i.e. .
Corollary B.15**.**
If and are random Hermitian matrices, then
[TABLE]
(For the definition of , see Definition B.8.)
Proof.
If \operatorname{\mathbf{E}}\bigl{[}\lVert X-Y\rVert_{F}^{2}\bigr{]}=\infty, there is nothing to prove; so we may assume \operatorname{\mathbf{E}}\bigl{[}\lVert X-Y\rVert_{F}^{2}\bigr{]}<\infty. By applying (B.1) and (B.2) to and , and taking the expectation, we have
[TABLE]
for all and . As in the proof of Corollary B.10, let be such that \epsilon^{3}:=\frac{1}{n}\operatorname{\mathbf{E}}\bigl{[}\lVert X-Y\rVert_{F}^{2}\bigr{]}. Since
[TABLE]
for all , we have , and thus the desired inequality holds. ∎
Appendix C Concentration of measure
Lemma C.1** (Hoeffding’s lemma).**
Let . If is a -valued random variable, then
[TABLE]
Proof.
We may assume . Note that . For any ,
[TABLE]
and so
[TABLE]
Since and , we have
[TABLE]
If , then
[TABLE]
If , then
[TABLE]
In any case, we have \operatorname{\mathbf{E}}[e^{X}]\leq\exp\bigl{(}2(b-a)^{2}\bigr{)}. ∎
Theorem C.2** (McDiarmid’s inequality).**
Let be measurable spaces, and be a bounded measurable function. Assume that
[TABLE]
for any , for each , and , where doesn’t depend on . If are independent random elements of , then
[TABLE]
for any where .
Proof.
Let us first show
[TABLE]
for any by induction on . If , in which case is a singleton, is essentially a constant, and , there is nothing to prove. We now proceed by induction on .
Note that we may assume that each is the projection . Let and be the distributions of and . Let be defined by
[TABLE]
For any , for each , and , we have
[TABLE]
Since
[TABLE]
the induction hypothesis implies
[TABLE]
Define by
[TABLE]
Whenever is fixed for each , we have
[TABLE]
by Hoeffding’s lemma (Lemma C.1). Thus, (C.3) and (C.2) yield
[TABLE]
finishing the proof of (C.1).
We can now finish the proof. Observe that
[TABLE]
By some calculus one can find that minimizes the right side, yielding
[TABLE]
Applying this result to , we obtain
[TABLE]
∎
The following inequality was found independently by Guntuboyina and Leeb [GL09], and Bordenave, Caputo, and Chafaï [BCC11].
Theorem C.3** (Concentration for spectral measures).**
Let be a random Hermitian matrix whose rows of the lower triangle are jointly independent. If satisfies , and , then
[TABLE]
Proof.
Let and for each . Given , let be the Hermitian matrix whose th row of the lower triangle is for each . Let and . If we change a row or a column of a matrix, then the change in rank is at most . Since can be obtained from by changing a row and then changing a column, the rank of
[TABLE]
is at most . Thus, Corollary B.12 tells us that
[TABLE]
Let be the th row of the lower triangle of . Then, are independent, and . By applying Theorem C.2 to given by and , we obtain
[TABLE]
for any . Our desired result follows by letting . ∎
Appendix D Reduction to unit variance case
The Stieltjes transform method, which is the topic of the next section, is able to prove a semicircular law (Theorem E.1) which assumes that every entry of has variance excatly . However, it seems not so easy to reduce Theorem 1.2 itself to the case the Stieltjes transform can handle. This section provides an alternative semicircular law, which is somewhat weaker than 1.2, that can still be reduced to what the Stieltjes transform can handle. If you’re satisfied by the reduced version (Theorem E.1), feel free to skip to Section E. Otherwise, the following is the alternative semicirular law. It was pointed out in the Remark following Theorem 1.2 as a special case of Theorem 1.2.
Theorem D.1** (A semicircular law).**
For each , let be a random Hermitian matrix whose upper triangular entries are jointly independent, have mean zero, and have finite variances. We assume that are defined on the same probability space. If
[TABLE]
and
[TABLE]
then as a.s.
D.1 Extension of the underlying probability space
Let be the probability space on which are defined. If is another probability space, and is a measurable map such that for all , then the random matrices satisfy all conditions of Theorem D.1. Assume that we proved -a.s. Since
[TABLE]
we will have -a.s. if we can show that
[TABLE]
For any with , let be defined as in Theorem A.3. Since is a real-valued random variable for any random Hermitian matrix , the event
[TABLE]
is measurable. So, (D.2) follows from Theorem A.3, and thus -a.s. follows. This shows that we can think that ’s and are the given random matrices and the underlying space. By considering , we may assume that we have i.i.d. random variables ’s, where and , independent from , which satisfy
[TABLE]
D.2 Repeating what we already know
The first three steps of Section 2 (that is, until centralization) works for our case with a slight change. Applying those steps, we can now assume the following, and need to prove .
- (i)
The upper triangular entries of are jointly independent and have mean zero. 2. (ii)
We have (D.1). 3. (iii)
There are such that and as .
D.3 Replacing and rescaling
Let
[TABLE]
and define by
[TABLE]
Note that
[TABLE]
as . Since for any , we also have
[TABLE]
Combining previous two displays, we obtain
[TABLE]
and so it is enough to show by Corollary B.15 and Theorem A.3.
If , then \operatorname{\mathbf{Var}}\bigl{[}v_{ij}^{(n)}\bigr{]}\geq\tfrac{1}{2n}, and so
[TABLE]
As for any , by letting we have and . Since the upper triangular entries of are jointly independent random variables with mean zero and variance , we can now assume that the upper triangular entries of have variance , and there are such that and .
Appendix E The Stieltjes transform method
Theorem E.1** (A unit-variance semicircular law).**
For each , let be a random Hermitian matrix whose upper triangular entries are jointly independent random variables with mean zero and variance . We assume that are defined on the same probability space. If there are with and as , then as a.s.
For the readers who skipped Section D: note that the first step in Section 2 lets us upgrade the conclusion of Theorem E.1 to as a.s.
E.1 Stieltjes transform
Let . Weak convergence of probability measures on can be coded in terms of Stieltjes transforms.
Definition E.2** (Stieltjes transform).**
Let be a positive, finite Borel measure on . The Stieltjes transform of is given by
[TABLE]
Note that for all . So is a continuous and bounded function, and it also follows that .
Theorem E.3** (Stieltjes inversion formula).**
If is a positive, finite Borel measure on , then the following hold:
- (i)
for any , is a nonnegative function with ; 2. (ii)
* as .* 3. (iii)
* as ;*
Proof.
If , then there is nothing to prove. By renormalization, we may, and will, assume . Note that
[TABLE]
Let be a real-valued random variable with distribution , and be a standard Cauchy random variable (i.e. the law of has density ) independent of . Then as a.s., and thus in distribution. Since the right side of (E.1) is the density of the law of , both (i) and (ii) are proved. As
[TABLE]
by dominated convergence, (iii) is also proved. ∎
Theorem E.4** (Stieltjes continuity theorem).**
If are Borel probability measures on , then if and only if for all .
Proof.
The “only if” direction follows immediately from the definition of weak convergence. To show the “if” direction, assume that for all . Whenever and vaguely for some finite , we have for all since vanishes at infinity. This implies for all , and thus by Theorem E.3. As any subsequence of has a vaguely convergent further subsequence, it follows that any subsequence of has a further subsequence converging vaguely to . This shows vaguely, and so . ∎
E.2 Predecessor comparison
In the remainder of this section, we will show that for all . To do so, we will first express in terms of the resolvent . By the spectral theorem, for any Hermitian matrix the matrix is invertible for any . Let for any Hermitian and . Using the spectral theorem, we can also see that
[TABLE]
Thus, we have
[TABLE]
and so it is enough to show that
[TABLE]
for all .
To understand the limiting behavior of , we relate it with the minors of using the Schur complement formula, which will be presented below. For each , let be the matrix obtained by removing the -th row and column from . Also, let denote the -th column of with removed. (So, is an -dimensional column vector.) Let us denote the -entry of a matrix by . Recall that if is an invertible matrix, then
[TABLE]
where is the -cofactor of . So we have
[TABLE]
where is the identity matrix.
Proposition E.5** (Schur complement formula).**
Consider a matrix
[TABLE]
with complex entries, where and are square matrices and is invertible. Then we have
[TABLE]
Proof.
Note that
[TABLE]
Since
[TABLE]
and
[TABLE]
we have
[TABLE]
by the multiplicativity of determinant. ∎
By Proposition E.5, we have
[TABLE]
and so
[TABLE]
Summing over and taking the expectation, we obtain
[TABLE]
(The fact that the expectation on the right side is well-defined follows from Lemma E.6 below.) We will show that the right side of (E.2) gets close to
[TABLE]
as grows, and obtain a recursive relation involving the limit of .
E.3 Derivation of a recurrence relation
The following fact will be used repeatedly. In particular, it will guarantee that many denominators we face in the computation below are nonzero.
Lemma E.6**.**
If is an Hermitian matrix and , then the following hold:
- (i)
; 2. (ii)
\operatorname{tr}\bigl{(}(A-zI)(A-\bar{z}I)\bigr{)}^{-1}\leq n/(\Im z)^{2}; 3. (iii)
* for any .*
Proof.
Let where is unitary and is real diagonal with diagonal entries . (i) Since trace is similarity invariant, we have
[TABLE]
(ii) Also,
[TABLE]
(iii) Finally, if we let , then
[TABLE]
∎
In the rest of this subsection, we fix , and transform
[TABLE]
step-by-step to obtain
[TABLE]
(asymptotically) in the end.
E.3.1 From to
Instead of numbering the rows and columns of using , let us use as if still lies in . For , let denote the -entry of . Since
[TABLE]
we have
[TABLE]
by Lemma E.6 (ii). The fact that ’s are in will guarantee that all terms in the computation below are well-defined and finite. Using the fact that and are independent, and each entry of is of mean zero and variance , we have
[TABLE]
Note that the last term in the last line must be real. Since
[TABLE]
by the Cauchy-Schwarz inequality, we have
[TABLE]
by (E.3). It follows that
[TABLE]
(We are fixing .) Therefore, by E.6 (i) and (iii), we have
[TABLE]
as .
E.3.2 From to
Since the maps and have bounded variations, Theorem C.3 implies that there are (depending on , but we are fixing ) such that
[TABLE]
for any , , and a random Hermitian matrix . So, we have
[TABLE]
for any and . It follows that
[TABLE]
by dominated convergence. Therefore,
[TABLE]
as .
E.3.3 From to
Let be the matrix obtained by replacing all the entries in the -th row and column of by [math]. Since has the same (multi)set of eigenvalues as except that it has one more zero eigenvalue, we have
[TABLE]
By the Hoffman-Wielandt inequality (Theorem B.9), we have
[TABLE]
Combining the results of the previous two displays, we obtain
[TABLE]
for all and , and so
[TABLE]
as . Therefore,
[TABLE]
as .
E.3.4 The result
Combining (E.2) and the final results of the previous three subsubsections, we obtain
[TABLE]
for any .
E.4 Convergence of the Stieltjes transform
Let . Fix for now, and let us write . If there are such that as , then we would have
[TABLE]
which contradicts (E.4). Thus is bounded, and therefore any subsequence of has a convergent subsequence. If we show that any convergent subsequence of should converge to a number independent of the subsequence we choose, then we will have the convergence of to that number.
Assume as . Since the left side of (E.4) converges to \bigl{|}s+1/(z+s)\bigr{|} along , we have
[TABLE]
Solving the quadratic equation, we obtain
[TABLE]
We need to decide which branch of we use. For simplicity, we will define only for , and it will suffice. Choose the branch of and defined on which are continuous and have nonnegative imaginary part for all . Then let . This will make continuous and have nonnegative imaginary part on .
Since for all , we have . On the other hand, has a negative imaginary part. Thus we have
[TABLE]
Since this is true for any subsequence of converging to , we have
[TABLE]
E.5 Computation of the limiting distribution
Lemma E.7**.**
If is a positive, finite Borel measure on satisfying for all , then .
Proof.
As
[TABLE]
we have by Theorem E.3 (iii). Since is continuous on , we have
[TABLE]
for each fixed . Since is a probability density (by Theorem E.3 (i)) converging pointwise to the probability density as , we have as by Scheffé’s theorem ([Bil12, Theorem 16.12]). Now follows from Theorem E.3 (ii). ∎
The proof of Lemma E.7 shows how one can figure out what the limiting spectral distribution should be in the first place. Now we finish our proof of the semicircular law. Since are probability measures, there are integers such that vaguely as for some positive, finite measure . Since as for all , (E.5) implies for all . So, we have by Lemma E.7. Now follows from (E.5) and Theorem E.4. Interestingly, we were able to avoid an actual computation of , in which we might have used something like the residue theorem or the Cauchy integral formula.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AGZ 10] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices , volume 118 of Cambridge Studies in Advanced Mathematics . Cambridge University Press, Cambridge, 2010.
- 2[BCC 11] Charles Bordenave, Pietro Caputo, and Djalil Chafaï. Spectrum of non-Hermitian heavy tailed random matrices. Comm. Math. Phys. , 307(2):513–560, 2011.
- 3[Bil 99] Patrick Billingsley. Convergence of probability measures . Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, second edition, 1999. A Wiley-Interscience Publication.
- 4[Bil 12] Patrick Billingsley. Probability and measure . Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, 2012. Anniversary edition [of MR 1324786], With a foreword by Steve Lalley and a brief biography of Billingsley by Steve Koppes.
- 5[BS 10] Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random matrices . Springer Series in Statistics. Springer, New York, second edition, 2010.
- 6[GL 09] Adityanand Guntuboyina and Hannes Leeb. Concentration of the spectral measure of large Wishart matrices with dependent entries. Electron. Commun. Probab. , 14:334–342, 2009.
- 7[GNT 15] F. Götze, A. A. Naumov, and A. N. Tikhomirov. Limit theorems for two classes of random matrices with dependent entries. Theory Probab. Appl. , 59(1):23–39, 2015.
- 8[HK 71] Kenneth Hoffman and Ray Kunze. Linear algebra . Second edition. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1971.
