Asymptotic Analysis for Extreme Eigenvalues of Principal Minors of Random Matrices
T. Tony Cai, Tiefeng Jiang, Xiaoou Li

TL;DR
This paper analyzes the asymptotic behavior of extreme eigenvalues of principal minors of Wishart and Wigner matrices, with applications to high-dimensional statistics, signal processing, and compressed sensing.
Contribution
It provides new asymptotic results for extreme eigenvalues of principal minors in large random matrices, extending to Wishart and Wigner types, with practical applications.
Findings
Asymptotic distributions of maximum and minimum eigenvalues derived
Results applicable to high-dimensional statistics and signal processing
Insights into constructing compressed sensing matrices
Abstract
Consider a standard white Wishart matrix with parameters and . Motivated by applications in high-dimensional statistics and signal processing, we perform asymptotic analysis on the maxima and minima of the eigenvalues of all the principal minors, under the asymptotic regime that go to infinity. Asymptotic results concerning extreme eigenvalues of principal minors of real Wigner matrices are also obtained. In addition, we discuss an application of the theoretical results to the construction of compressed sensing matrices, which provides insights to compressed sensing in signal processing and high dimensional linear regression in statistics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Blind Source Separation Techniques · Mathematical Analysis and Transform Methods
Asymptotic Analysis for Extreme Eigenvalues of Principal Minors of Random Matrices
T. Tony Cai
Department of Statistics, The Wharton School, University of Pennsylvania
Tiefeng Jiang
School of Statistics, University of Minnesota
Xiaoou Li
School of Statistics, University of Minnesota
Abstract
Consider a standard white Wishart matrix with parameters and . Motivated by applications in high-dimensional statistics and signal processing, we perform asymptotic analysis on the maxima and minima of the eigenvalues of all the principal minors, under the asymptotic regime that go to infinity. Asymptotic results concerning extreme eigenvalues of principal minors of real Wigner matrices are also obtained. In addition, we discuss an application of the theoretical results to the construction of compressed sensing matrices, which provides insights to compressed sensing in signal processing and high dimensional linear regression in statistics.
Keywords: random matrix, extremal eigenvalue, maximum of random variables, minimum of random variables.
1 Introduction
Random matrix theory is traditionally focused on the spectral analysis of eigenvalues and eigenvectors of a single random matrix. See, for example, Wigner, (1955, 1958); Dyson, 1962a ; Dyson, 1962b ; Dyson, 1962c ; Mehta, (2004); Tracy and Widom, (1994, 1996, 2000); Diaconis and Evans, (2001); Johnstone, (2001, 2008); Jiang, 2004b ; Jiang, 2004a ; Bryc et al., (2006); Bai and Silverstein, (2010). It is important in its own right and has been proved to be a powerful tool in a wide range of fields including high-dimensional statistics, quantum physics, electrical engineering, and number theory.
The laws of large numbers and the limiting distributions for the extreme eigenvalues of the Wishart matrices are now well known, see, e.g., Bai, (1999) and Johnstone, (2001, 2008). Let be a random matrix with i.i.d. entries and let . Let be the eigenvalues of . The limiting distribution of the largest eigenvalue satisfies, for with ,
[TABLE]
where and and is the distribution function of the Tracy-Widom law of type I. The results for the smallest eigenvalue can be found in, e.g., Edelman, (1988) and Bai and Yin, (1993). These results have also been extended to generalized Wishart matrices, i.e., the entries of are i.i.d. but not necessarily normally distributed, in, e.g., Bai and Silverstein, (2010); Péché, (2009); Tao and Vu, (2010).
Motivated by applications in high-dimensional statistics and signal processing, we study in this paper the extreme eigenvalues of the principal minors of a Wishart matrix . Write Let with the size of being and . Then is a principal minor of Denote by the eigenvalues of in descending order. We are interested in the largest and the smallest eigenvalues of all the principal minors of in the setting that , , and are large but relatively smaller than . More specifically, we are interested in the properties of the maximum of the eigenvalues of all minors:
[TABLE]
and the minimum of the eigenvalues of all minors:
[TABLE]
where denotes the cardinality of the set .
This is a problem of significant interest in its own right, and it has important applications in statistics and engineering. Before we establish the properties for the extreme eigenvalues and , of the principal minors of a Wishart matrix , we first discuss an application in signal processing and statistics, namely the construction of the compressed sensing matrix, as the motivation for our study. The properties of the extreme eigenvalues and can also be used in other applications, including testing for the covariance structure of a high-dimensional Gaussian distribution, which is an important problem in statistics.
1.1 Construction of Compressed Sensing Matrices
Compressed sensing, which aims to develop efficient data acquisition techniques that allow accurate reconstruction of highly undersampled sparse signals, has received much attention recently in several fields, including signal processing, applied mathematics and statistics. The development of the compressed sensing theory also provides crucial insights into inference for high dimensional linear regression in statistics. It is now well understood that the constrained minimization method provides an effective way for recovering sparse signals. See, e.g., Candes and Tao, (2005, 2007), Donoho, (2006), and Donoho et al., (2006). More specifically, in compressed sensing, one observes with
[TABLE]
where , with being much smaller than , is a sparse signal of interest, and is a vector of measurement errors. One wishes to recover the unknown sparse signal based on using an efficient algorithm.
Since the number of measurements is much smaller than the dimension , without structural assumptions, the signal is under-determined, even in the noiseless case. A usual assumption in compressed sensing is that is sparse and one of the most commonly used frameworks for sparse signal recovery is the Restricted Isometry Property (RIP). See Candes and Tao, (2005). A vector is said to be -sparse if , where is the support of . In compressed sensing, the RIP requires subsets of certain cardinality of the columns of to be close to an orthonormal system. For an integer , define the restricted isometry constant to be the smallest non-negative numbers such that for all -sparse vectors ,
[TABLE]
There are a variety of sufficient conditions on the RIP for the exact/stable recovery of -sparse signals. A sharp condition was established in Cai and Zhang, (2014) and a conjecture was proved in Zhang and Li, (2018). Let
[TABLE]
For any given , the condition guarantees the exact recovery of all sparse signals in the noiseless case through the constrained minimization
[TABLE]
Moreover, for any , is not sufficient to guarantee the exact recovery of all -sparse signals for large . In addition, the conditions is also shown to be sufficient for stable recovery of approximately sparse signals in the noisy case.
One of the major goals of compressed sensing is the construction of the measurement matrix , with the number of measurements as small as possible relative to , such that all -sparse signals can be accurately recovered. Deterministic construction of large measurement matrices that satisfy the RIP is known to be difficult. Instead, random matrices are commonly used. Certain random matrices have been shown to satisfy the RIP conditions with high probability. See, e.g., Baraniuk et al., (2008). When the measurement matrix is a Gaussian matrix with i.i.d. entries, for any given , the condition is equivalent to that the extreme eigenvalues, and , of the principal minors of the Wishart matrix satisfy
[TABLE]
Hence the condition (1.5) can be viewed as a condition on and as defined in (1.2) and (1.3), respectively.
1.2 Main results and organization of the paper
In this paper, we investigate the asymptotic behavior of the extreme eigenvalues and defined in (1.2) and (1.3). We also consider the extreme eigenvalues of a related Wigner matrix. We then discuss the application of the results in the construction of compressed sensing matrices.
The rest of the paper is organized as follows. Section 2 describes the precise setting of the problem. The main results are stated in Section 3. The proofs of the main theorems are given in Section 4. The proofs of all the supporting lemma are given in the Appendix. The proof strategy for the main results is given in Section 4.1.
2 Problem settings
In this paper, we consider a white Wishart matrix , where and are independent -distributed random variables. For , set the principal minor . For an symmetric matrix , let and denote the largest and the smallest eigenvalues of , respectively. Let
[TABLE]
and denotes the cardinality of the set . We also define
[TABLE]
Of interest is the asymptotic behavior of and when both and grow large.
Notice is the sum of independent and identically distributed (i.i.d.) random variables. By the standard central limit theorem, for given and , we have
[TABLE]
as , where we use “” to indicate convergence in distribution. Motivated by this limiting distribution, we also consider the Wigner matrix , which is a symmetric matrix whose upper triangular entries are independent Gaussian variables with the following distribution
[TABLE]
For , set . We will work on the corresponding statistics
[TABLE]
and
[TABLE]
In this paper, we study asymptotic results regarding the four statistics , , and .
3 Main results
Throughout the paper, we will let and let with a speed depending on . The following technical assumptions will be used in our main results.
Assumption 1. The integer is fixed and ; or with
[TABLE]
Notice the second part of Assumption 1 implies that . It says the population dimension can be very large and it can be as large as . This assumption is used in the analysis of and . The requirement is used in the last step in (4.90). The second part of the condition is needed in a few places including (4.86). The key scales and in condition (3.1) are tight, the terms of lower order and can be improved to be relatively smaller.
The next assumption is needed for studying the properties of and .
Assumption 2. The integer satisfies that
[TABLE]
This condition is the same as the first part of (3.1). We start with asymptotic results for in (2.1) and in (2.2).
Theorem 1**.**
Suppose Assumption 1 in (3.1) holds. Recall defined as in (2.1). Then,
[TABLE]
in probability as Furthermore,
[TABLE]
for all and .
Remark 1**.**
Suppose Assumption 1 in (3.1) holds. Recall defined as in (2.2). Similar to the proof of Theorem 1 it can be shown that
[TABLE]
in probability as , and furthermore,
[TABLE]
for all and . For reasons of space, we omit the details here.
We now turn to the asymptotic analysis for and .
Theorem 2**.**
Suppose Assumption 2 in (3.2) is satisfied. Recall defined as in (2.5). Then,
[TABLE]
in probability as Furthermore,
[TABLE]
for all and .
Remark 2**.**
Suppose Assumption 2 in (3.2) is satisfied. Review above (2.4), we know and have the same distribution. Let be defined as in (2.6). It follows that and have the same distribution. Then, by Theorem 2,
[TABLE]
in probability as Furthermore,
[TABLE]
for all and .
To better explain the convergence results in the (3.4) – (3.10), we give the following comments.
Remark 3**.**
Equation (3.4) has the following implications, whose rigorous justification is given in Section 4.
* for all ;* 2. 2.
* for all ;* 3. 3.
.
We now elaborate on the above results. First, the moment generating function of exists and is close to when is large. As a result, has a sub-exponential tail probability for large . Second, converges to [math] in for all . Third, the variance of vanishes for large , indicating that as . Overall, we can see (3.4) is stronger than the typical convergence in probability. This provides information on the behavior of the tail probability. Similar interpretations can also be made for (3.6), (3.8) and (3.10), respectively.
3.1 Extensions
In this section, we discuss extensions of Theorems 1 and 2. Similar extensions can also be made to Remarks 1 and 2. They are omitted for the clarity of presentation.
First, we point out that Theorems 1 and 2 still hold if we replace the size- principal minors by the principal minors with the size no larger than in the definition of and , by the eigenvalue interlacing theorem [see, e.g., Horn and Johnson, (2012)]. We then have the following corollary.
Corollary 1**.**
Define and . Then, Theorems 1 and 2 still hold if “” and “” are replaced by “” and “”, respectively.
Next, we extend Theorem 2 to allow other values of variance for the Wigner matrix. Here, we assume that the matrix to have the following distribution, instead of that in (2.4). For some ,
[TABLE]
In addition, assume that is symmetric and are independent for . Note that if , then the above distribution is the same as that defined in (2.4). For defined in (3.11), we consider the statistic . The following law of large numbers is obtained.
Theorem 3**.**
Suppose and that Assumption 2 in (3.2) is satisfied. In addition, assume has the distribution as in (3.11) with . Then,
[TABLE]
in probability as .
Remark 4**.**
A related open question is whether Theorem 1 can be extended to other distribution of for the Wishart distribution. We conjecture that with certain assumptions on the moments of and under the asymptotic regime that is sufficiently large compared to and , and , the asymptotic behavior of will be similar to that of as is discussed in Theorem 3. We leave this question for future research, because it requires development of some technical tools that are beyond the scope of the current paper.
Some special cases for this question have been answered in the literature for Wishart matrices with non-Gaussian entries. For example, if , and follows an asymmetric Rademacher distribution and , then it is easy to check
[TABLE]
and . As a result, . Analysis on similar quantities has been studied extensively in the literature including Jiang, 2004a ; Cai and Jiang, (2012); Zhou, (2007); Shao and Zhou, (2014); Li et al., (2012); Li and Rosalsky, (2006); Li et al., (2010); Fan et al., (2018); Cai et al., (2013).* The limiting distributions of are the Gumbel distribution.*
3.2 Application to Construction of Compressed Sensing Matrices
The main results given above have direct implications for the construction of compressed sensing matrix whose entries are i.i.d. . As discussed in the introduction, the goal is to construct the measurement matrix with the number of measurements as small as possible relative to , such that -sparse signals can be accurately recovered. For any given , the RIP framework guarantees accurate recover of all -sparse signals if the extreme eigenvalues, and , of the principal minors of the Wishart matrix satisfy
[TABLE]
where is given in (1.6).
By setting , , and , it follows from Theorems 1 and Remark 1 that, under Assumption 1 in (3.1),
[TABLE]
and
[TABLE]
On the other hand, Assumption 1 implies that . So the above asymptotic approximation gives and , and hence (3.12) is satisfied. That is, Assumption 1 guarantees the exact recovery of all sparse signals in the noiseless case through the constrained minimization as explained in (1.5) and (1.6).
4 Technical Proofs
Throughout the proof, as mentioned earlier, we will let and ; the integer is either fixed or . The following notation will be adopted. We write if there is a constant independent of and (unless otherwise indicated) such that . Moreover, we write , if there is a sequence independent of and such that and . Define . This is a sequence growing to infinity with a very slow speed compared to and .
This section is organized as follows. We first introduce the main steps in proving Theorems 1 and 2 in Section 4.1. In Section 4.2, we present the proofs for Theorems 1-3, Corollary 1, and Remark 3. The proofs for all technical lemmas are given in the Appendix. For reader’s convenience, we list the content of each section below.
Section 4.1. The Strategy of the Proofs for Theorems 1 and 2.
Section 4.2. Proof of the results in Section 3.
Section 4.2.1. Proof of Theorem 2.
Section 4.2.2. Proof of Theorem 1.
Section 4.2.3. Proofs of Theorem 3 and Remark 3.
4.1 The Strategy of the Proofs for Theorems 1 and 2
We first explain the proof strategy for Theorem 2 and then explain that for Theorem 1, since Wigner matrices have simpler structure than Wishart matrices. The proof of Theorem 2 consists of three steps. The first step is to find an upper bound on the right tail probability for . Our method here is to first develop a moderate deviation bound of for each and , and then use the union bound to control . The second step is to find an upper bound on the left tail probability for . Our approach is to construct a sequence of events with high probability, such that when occurs, there exists satisfying and . The third step is to combine the left and right tail bounds obtained from the previous two steps to show (3.8).
The proof of Theorem 1 is based on a similar strategy to that of Theorem 2. A new and key ingredient is to control the approximation speed of the Wishart matrix to the Wigner matrix (after normalization). Change-of-measure arguments are used to quantify the approximation speed in the moderate deviation domain.
We point out that the proof for the asymptotic lower bound of in this paper is different from the standard technique for analyzing the maximum/minimum statistic for a large random matrix (see, e.g. Jiang, 2004a ). In particular, the proof in Jiang, 2004a employs the Chen-Stein’s Poisson approximation method [see, e.g., Arratia et al., (1990)] and the asymptotic independence. However, this method does not fit our problem. For this reason, new technique are developed and, in particular, we construct an event on which achieves the asymptotic lower bound.
4.2 Proof of the results in Section 3
As mentioned earlier, Wigner matrices have simpler structure than Wishart matrices. Thus, we first present the proof of Theorem 2, followed by the proof of Theorem 1. At the end of the section, the proofs of Corollary 1, Theorem 3 and Remark 3 are presented.
In each proof we will need auxiliary results. To make the proof clearer, we place the proofs of the auxiliary results in the Appendix. Sometimes a statement or a formula holds as is sufficiently large. We will not say “as is sufficiently large” if the context is apparent.
4.2.1 Proof of Theorem 2
To prove Theorem 2, we need the following two key results.
Proposition 1**.**
Suppose Assumption 2 in (3.2) is satisfied. Recall defined as in (2.5). Then,
[TABLE]
for every and every .
Proposition 2**.**
Suppose Assumption 2 in (3.2) is satisfied. Recall defined as in (2.5). Then,
[TABLE]
for every and every .
Another auxiliary lemma is need. Its proof is put in the Appendix.
Lemma 1**.**
Let be a random variable with for all . Then
[TABLE]
for every and every
Proof of Theorem 2.
By Propositions 1 and 2, we have
[TABLE]
for any and . Consequently, for given , there exists a sequence of positive numbers such that
[TABLE]
for all as is sufficiently large. Now we estimate
[TABLE]
By applying Lemma 1 to , we see
[TABLE]
According to (4.4), the above display can be bounded from above by
[TABLE]
which tends to [math] as . The proof is then complete. ∎
Now we proceed to prove Propositions 1 and 2.
Proof of Proposition 1.
For any , we have from the definition of that
[TABLE]
where in the last inequality we use the fact that are identically distributed for all different with . The following result enables us to bound the last probability.
Lemma 2**.**
Let be defined as above (2.5) with . Then there is a constant such that
[TABLE]
for all and all .
Taking in the above lemma, we know as is large enough, and hence
[TABLE]
Note that , , and as is sufficiently large. Thus, the above inequality further implies
[TABLE]
uniformly for all as sufficiently large, where is fixed. With the above inequality, we complete the proof.
∎
Proof of Proposition 2.
Recall . The proof will be evidently finished if the following two limits hold. For each and each ,
[TABLE]
and
[TABLE]
We now verify the above two limits.
The proof of (4.12). Recall
[TABLE]
for any square and symmetric matrix , where is the largest eigenvalue of
For each such that and , set
[TABLE]
where
[TABLE]
If then . According to (4.14), if there exists such that and occurs, then
[TABLE]
Define
[TABLE]
where is the indicator function of . Then,
[TABLE]
For any random variable with and , we have
[TABLE]
Applying this inequality to , we obtain
[TABLE]
We proceed to find a lower bound on and an upper bound on in two steps.
Step 1: the estimate of . Note that are identically (not independently) distributed Bernoulli variables for different with success rate . Thus, we have
[TABLE]
where we choose with a bit abuse of notation. For convenience, write
[TABLE]
Since the upper triangular entries of are independent Gaussian variables, we have from (4.15) that
[TABLE]
Recall that and for . Hence
[TABLE]
where . It is well known that
[TABLE]
as . Recall the assumption that , so . Thus, by (4.25) and (4.26),
[TABLE]
Note that since . It follows that . Also, . As a result, from (4.23) we have
[TABLE]
It follows that
[TABLE]
Combining this with (4.22), we see
[TABLE]
To control , we need the next result, which will be proved in the Appendix.
Lemma 3**.**
For all , we have
[TABLE]
Using the above lemma, (4.29), and note that , we have
[TABLE]
Step 2: the estimate of . Reviewing in (4.18), we have
[TABLE]
Note that is determined by and . By (4.15),
[TABLE]
Single out the terms where and , we further have
[TABLE]
On the other hand, \mathbb{E}\tilde{Q}_{m,p}=\binom{p}{m}P\big{(}\tilde{A}_{\{1,...,m\}}\big{)} and hence
[TABLE]
Combining (4.32), (4.34) and (4.35), we arrive at
[TABLE]
Observe that and . It follows that
[TABLE]
Similar to (4.25) we have
[TABLE]
Again, we find an approximation for the above display by using (4.26) and simplifying it. We arrive at
[TABLE]
Therefore, for the last term in (4.37), we see
[TABLE]
The following lemma enables us to evaluate the coefficient of
Lemma 4**.**
For any and , we have
[TABLE]
Applying the above lemma to (4.40), we get
[TABLE]
This inequality together with (4.31) implies that
[TABLE]
Combining the above display with (4.37), we arrive at
[TABLE]
Lemma 5**.**
For all integers satisfying , we have
[TABLE]
Therefore,
[TABLE]
We now study the last two terms one by one. For ,
[TABLE]
for sufficiently large under Assumption 2 in (3.2). Recalling , we see from (4.31) that
[TABLE]
Combining (4.46), (4.47) and (4.48), we arrive at
[TABLE]
This together with (4.19) and (4.21) yields
[TABLE]
uniformly for all . Consequently, we get (4.12).
The proof of (4.13). For any with , write . Note that . Thus,
[TABLE]
As a result,
[TABLE]
where the function for . To proceed, we discuss two scenarios: and . For , we have
[TABLE]
where for any and the inequality for any is used in the last step. Note and since . Thus,
[TABLE]
for sufficiently large . This further implies
[TABLE]
for any . Note that for any . Then, for the other scenario where , we have
[TABLE]
as is large enough. Thus,
[TABLE]
for any Joining (4.54) and (4.56), we see (4.13). This completes the whole proof. ∎
4.2.2 Proof of Theorem 1
To prove Theorem 1, we need the following two propositions.
Proposition 3**.**
Suppose Assumption 1 in (3.1) holds. Recall defined as in (2.1). Then,
[TABLE]
for any and .
Proposition 4**.**
Suppose Assumption 1 in (3.1) holds. Recall defined as in (2.1). Then,
[TABLE]
for any and .
Proof of Theorem 1.
Similar to the proof of Theorem 2, it is sufficient to prove (3.4). By the same argument as in the proof of Theorem 2, with the upper bound for given in Proposition 3 and the upper bound for for given in Proposition 4, we get (3.4). ∎
In the following we start to prove Propositions 3 and 4.
Proof of Proposition 3.
Without loss of generality, we assume since the expectation in (3.4) is monotonically decreasing in .
Let be as above (2.1) with . Analogous to (4.8), we have
[TABLE]
We now bound the last probability. Since the above tail probability involve moderate bound and large deviation bound for different ranges of , we will discuss three different cases and use different proof strategies. Recall . Set
[TABLE]
The three cases are: (1) , (2) , and (3) . They cover all situations for . For the first two cases, the upper bound is based on the next lemma, which gives a moderate deviation bound for the spectrum of from the identity matrix .
Lemma 6**.**
There exists a constant such that for all , , and , we have
[TABLE]
and
[TABLE]
where for and for .
Case 1: . Let be given. Choose , , and in Lemma 6. The choice of and satisfies that and hence . Set . Notice that from Lemma 6 is increasing for . Then, by the lemma,
[TABLE]
The following lemma says that both of the last two terms go to zero.
Lemma 7**.**
Suppose Assumption 1 in (3.1) holds. Let and be given. For , and , we have
[TABLE]
and
[TABLE]
Combining (4.57), (4.61)-(4.63), we conclude
[TABLE]
Case 2: . Review in (4.58). Now we choose , and . Then . By (4.59),
[TABLE]
where . The last two terms are analyzed in the next lemma.
Lemma 8**.**
Suppose Assumption 1 in (3.1) holds. Let be as in (4.58). For , and , we have
[TABLE]
as is sufficiently large. In addition,
[TABLE]
as .
Joining (4.65)-(4.67), we obtain
[TABLE]
which together with (4.57) implies that
[TABLE]
This completes our analysis for Case 2. By using the same argument as obtaining (4.68), we have the following limit, which will be used later on.
[TABLE]
We next study Case 3.
Case 3: . Note that this case is only possible if . We point out that Lemma 6 is not a suitable approach for bounding the tail probability in this case because the term , which cannot be easily controlled, will dominate the other terms in the error bound for very large . Instead, we will use another approach to obtain an upper bound of . The main step here is to quantify the approximation of the extreme eigenvalue of a Wishart matrix to that of a Wigner matrix. We will analyze their density functions and leverage them with the results in the proof of Theorem 2.
Let be the order statistics of the eigenvalues of such that . Write with . Let where ’s are as in (2.4). Let the eigenvalues of be . Set . Intuitively, the law of is close to that of when is large. The next lemma quantifies the approximation speed. Review for any .
Lemma 9**.**
Let be the density function of , and let be the density function of . Assume . Then,
[TABLE]
for all with .
Let , where is as in (4.58). Then for such that ,
[TABLE]
There are three probabilities above, denote the second one by . For , we use the change-of-measure argument. In fact,
[TABLE]
Now
[TABLE]
since and . By the definition of in (4.58),
[TABLE]
where Assumption 1 from (3.1) is used. Therefore,
[TABLE]
Note that for any and . It follows from (4.11) that
[TABLE]
by the fact and Assumption 1. Combining this with (4.71), we have
[TABLE]
We next analyze \mathbb{P}\big{(}\max_{1\leq i\leq m}|\nu_{i}|\geq r_{m,n}\big{)}. Recall , where is as in (4.58). Recall that we only discuss Case 3 when , and this is only meaningful when . Thus, . Thus, from (4.68) we have
[TABLE]
By (4.2.2),
[TABLE]
Since , by combining (4.76) and (4.77), we see that
[TABLE]
Combining this with (4.75), we further have
[TABLE]
This completes our analysis for Case 3.
Now, we combine (4.64), (4.68) and (4.79), and arrive at
[TABLE]
This and (4.57) conclude
[TABLE]
∎
Proof of Proposition 4.
Noticing the expectation in (3.4) is non-increasing in . Without loss of generality, we assume .
Here we discuss two scenarios that are similar to those in the proof of Theorem 2. They are 1) and 2) , where
Scenario 1: . Similar to the proof of Theorem 2, we define the event as follows. For each with , set
[TABLE]
where and . We also define
[TABLE]
Similar to the discussion between (4.14) and (4.21) in the proof of Theorem 2, we have
[TABLE]
In the rest of the discussion under Scenario 1, we will develop a lower bound for and an upper bound for in two steps.
Step 1: the estimate of . For a symmetric matrix , we use to denote its spectral norm. Set . Review in (4.58). Since are identically distributed, we have
[TABLE]
where
[TABLE]
It is easy to check that Assumption 1 in (3.1) implies
[TABLE]
Similar to Lemma 9, we need the following lemma, which quantifies the speed that a Wishart matrix converges to a Wigner matrix. The difference is that the spectral norm is used here instead of in Lemma 9.
Write for above (2.1) with . Review that the Wigner matrix , where ’s are as in (2.4).
Lemma 10**.**
Let be the density function of and be the density function of . If , then
[TABLE]
*for all symmetric matrix with . *
Below, we combine the above lemma and some change of measure arguments to obtain a lower bound of . Define a non-random set . By the first limit from (4.86), . Therefore, from Lemma 10 we have
[TABLE]
where is as in (4.15) with and Under Assumption 1 in (3.1), evidently and . This implies that
[TABLE]
Thus, we have
[TABLE]
Obviously, . Recalling and as in (4.15) and (4.18), respectively, we see that . Thus, we further have from (4.85) and (4.88) that
[TABLE]
To further obtain a lower bound of the above expression, we analyze each term on the right-hand side. Recall the definition of below (4.15), we know . By (4.31),
[TABLE]
where the condition from Assumption 1 in (3.1) is essentially used in the last step. Now,
[TABLE]
where the fact that and have the same distribution is used in the last step. The following lemma help us estimate the last probability.
Lemma 11**.**
[Lemma 4.1 from Jiang and Li, (2015)] Let be defined by above (2.5) with . Then there is a constant such that
[TABLE]
for all and all .
By letting in Lemma 11, we have
[TABLE]
Combining the above inequality with (4.91), we arrive at
[TABLE]
Since , we know Moreover, . Consequently,
[TABLE]
Comparing the above inequality with (4.90), we arrive at
[TABLE]
This result, combined with (4.89), gives
[TABLE]
which joint with (4.31) concludes
[TABLE]
This completes our analysis for .
Step 2: the estimate of . Replacing “” in (4.15) with “” in (4.83), and using the same argument as obtaining (4.37), we have from Lemma 5 that
[TABLE]
Now we bound the last term above. Review below (4.85). Trivially,
[TABLE]
By (4.95),
[TABLE]
Let be the density function of and be the density function of . Review (4.82). Define (non-random) set
[TABLE]
Then,
[TABLE]
where By Lemma 10 and by a change-measure argument similar to the one getting (4.88), we see
[TABLE]
The benefit of the above step is transferring the probability on the Wishart matrix to that on the Wigner matrix up to a certain error. Combining (4.100)-(4.102), we have
[TABLE]
Combining this with (4.99), we have
[TABLE]
where
[TABLE]
Thus,
[TABLE]
According to (4.43) and (4.97), the first term on the right-hand side of the above inequality is no more than
[TABLE]
Notice that and and . Thus, the above display further implies
[TABLE]
We next study the last two terms from (4.105).
By the condition from Assumption 1 in (3.1) and the second limit in (4.86),
[TABLE]
Recall (4.16). It is readily seen that . Consequently, it is known from (4.98) that
[TABLE]
uniformly over . Therefore, we conclude from (4.98) and (4.108) that
[TABLE]
Combining (4.105)-(4.109), we see
[TABLE]
By (4.84) and the above inequality,
[TABLE]
Finally, from the inequality we have that
[TABLE]
for any and .
Scenario 2: . Review (2.1). By the fact that for any non-negative definite matrix , we have
[TABLE]
where and are i.i.d. random variables. Thus, by independence,
[TABLE]
Note that is a sum of i.i.d. random variables with and . We discuss two situations: and .
Assuming for now. Recalling , we get from the Berry-Essen Theorem that
[TABLE]
for some constant . Combine the above inequalities with (4.53) to see
[TABLE]
By (3.1), . It is easy to check
[TABLE]
We proceed to the second situation: . In this case, . By Lemma 1 from Laurent and Massart, (2000),
[TABLE]
for any . Thus,
[TABLE]
This inequality and (4.114) yield
[TABLE]
Consequently,
[TABLE]
Hence,
[TABLE]
By collecting (4.112), (4.117) and (4.121) together, we arrive at
[TABLE]
The proof is completed. ∎
4.2.3 Proofs of Theorem 3 and Remark 3
The following lemma serves the proof of Theorem 3. Its own proof is placed in Appendix.
Lemma 12**.**
Let be as defined in (3.11) with . Then
[TABLE]
for all , and .
Proof of Theorem 3.
For any , we first show that
[TABLE]
by using Lemma 12. To do so, set ,
and . Rewrite such that
[TABLE]
It is easy to check that the coefficient of is always sitting in for any and . This, the fact that and the definition of lead to
[TABLE]
We can see that
[TABLE]
It follows that
[TABLE]
This and (4.125) implies (4.124). Consequently,
[TABLE]
To complete the proof, it is enough to check that
[TABLE]
for each For notational simplicity, let and Similar to the proof of Theorem 2, define
[TABLE]
for each with . We next compute and , respectively, where and . By independence,
[TABLE]
Since and for all , we further have
[TABLE]
where for Similar to (4.38),
[TABLE]
From (4.26), as . Then,
[TABLE]
where
[TABLE]
Notice
[TABLE]
Similar to (4.28), we obtain that . Thus,
[TABLE]
By the same argument as obtaining (4.31), we see
[TABLE]
In particular the above goes to infinity as . By (4.26) and (4.130),
[TABLE]
The right hand side above without the term “” is identical to
[TABLE]
The above two assertions yield
[TABLE]
Let us take a closer look at the above display. For ,
[TABLE]
Thus, for and ,
[TABLE]
Combining this with (4.138), we obtain that
[TABLE]
uniformly for . Define . From (4.135),
[TABLE]
as . Moreover, we see from (4.135) and (4.138) that
[TABLE]
By Lemma 5 and a similar argument to (4.37), we get
[TABLE]
This and the above two limits imply . As a result, by (4.20). According to (4.14), if there exists such that and occurs, then
[TABLE]
Therefore,
[TABLE]
This implies (4.128). The proof is finished.
∎
Proof of Remark 3.
These results are direct consequences of the following lemma, whose proof is given in Appendix B. ∎
Lemma 13**.**
Let be a sequence of non-negative random variables. Consider the following statements.
- (i)
* for all and .*
- (ii)
* for all .*
- (iii)
* for all .*
- (iv)
* for all .*
- (v)
* for all .*
Then, (i) (ii) (iii) (iv) and (v). Here, “A B” means two statements A and B are equivalent, and A B means statement A implies statement B.
Acknowledgment
The research of Tony Cai was supported in part by NSF Grant DMS-1712735 and NIH grants R01-GM129781 and R01-GM123056. Tiefeng Jiang is partially supported by NSF Grant DMS-1406279. Xiaoou Li is partially supported by NSF Grant DMS-1712657.
Appendix A Auxiliary results on Gamma functions
Recall the Gamma function for
Lemma A.1**.**
Let
[TABLE]
If , then as .
Proof of Lemma A.1.
Easily,
[TABLE]
where
[TABLE]
Then
[TABLE]
Write
[TABLE]
By Lemma 5.1 from Jiang and Qi (2015), there exists a constant free of and such that
[TABLE]
for all and . It is easy to see that
[TABLE]
where is a constant free of and . This implies that
[TABLE]
as . Write
[TABLE]
Easily, as uniformly for all Hence,
[TABLE]
as . In summary,
[TABLE]
as On the other hand, by the Stirling formula,
[TABLE]
as From (A.1) and the above two assertions we see
[TABLE]
as , which together with (A.2) proves the lemma. ∎
Lemma A.2**.**
Let
[TABLE]
If , then as .
Proof of Lemma A.2.
Observe
[TABLE]
By definition,
[TABLE]
From (A.1) and (A.3), we see that
[TABLE]
By comparing this identity with (A.4), we conclude . ∎
Appendix B Proofs of lemmas
The following result is based on a slight modification of the second inequality of (4.8) from Jiang and Li, (2015) and a care taken by noticing that the version of the Wigner matrix here is times of the version there. It will enable us to bound the last probability.
Proof of Lemma 2.
Review the proof of Lemma 4.1 from Jiang and Li, (2015). Notice that the version of the Wigner matrix here is times of the version there. From the second inequality in (4.8) in the paper, there is a positive constant not depending on such that
[TABLE]
for all and all . Since the right hand side above is increasing in , without loss of generality, we assume . It is easy to see under the assumption that . By taking we get the desired conclusion. ∎
Proof of Lemma 3.
Note that
[TABLE]
and for . Thus,
[TABLE]
On the other hand, by the Sterling formula,
[TABLE]
Therefore,
[TABLE]
Combining the two inequalities, we complete the proof. ∎
Proof of Lemma 4.
Write
[TABLE]
for Obviously, is a convex function. This leads to that It is trivial to check that . The first identity is thus obtained. The second identity follows from the first one. ∎
Proof of Lemma 5.
By rearranging the terms, we have
[TABLE]
∎
Proof of Lemma 1.
Note that
[TABLE]
Use the Fubini Theorem to see
[TABLE]
∎
Proof of Lemma 6.
The technique to be used here is similar to that from Fey et al., (2008), where the large deviations for the extreme eigenvalues of Wishart matrices are developed. Thus we will omit the repetitive details and only state the main steps. To ease notation, we write , and we use to denote its smallest eigenvalue. The event \big{\{}\frac{\lambda_{1}(W_{\{1,...,m\}})-n}{\sqrt{n}}\geq\sqrt{n}y\big{\}} is equal to and \big{\{}\frac{\lambda_{m}(W_{\{1,...,m\}})-n}{\sqrt{n}}\leq-\sqrt{n}y\big{\}} is equal to . We start to bound . Since , we assume without loss of generality.
Note that and the sphere can be covered by for some . Here, we use to denote an open ball centered around with radius . It is straightforward to verify that for any , there always exists such that
[TABLE]
Therefore, by considering occurs or not, we have
[TABLE]
for all . We next analyze , and separately.
We start with , which is the minimum number of balls with the radius required to cover . By a result from Rogers, (1963) we see
[TABLE]
for all and . As a result,
[TABLE]
We proceed to an upper bound for . Recall that , where we use the notation
[TABLE]
Thus,
[TABLE]
where we define . Review . Since ’s are standard normals, so are . By the large deviation bound for the sum of i.i.d. random variables [see, e.g., page 27 from Dembo and Zeitouni (1998)],
[TABLE]
where is any Borel set and . Since for it is easy to check that
[TABLE]
Observe that is decreasing for . This together with (B.13) and (B.14) implies that
[TABLE]
for all
Now we estimate appeared in (B.9). Noting that is semi-positive definite, we have , and hence
[TABLE]
for by (B.14). Combining (B.11), (B.15), and (B.16), we obtain from (B.9) that
[TABLE]
for and . This confirms (4.60).
To get (4.59), just notice . From (B.8) and (B.9) we see that
[TABLE]
Then (4.59) follows from similar arguments to (B.15)-(B.17). ∎
Proof of Lemma 7.
Review Assumption 1 in (3.1). We start with the analysis of (4.63). Here, we consider two sub-cases: and . For , we have and
[TABLE]
Trivially . Note that and under Assumption 1 in (3.1). It follows that
[TABLE]
This implies
[TABLE]
Now we consider another sub-case where . For this case, . It is not hard to see
[TABLE]
for . Apparently, for as is sufficiently large. It follows that
[TABLE]
This implies
[TABLE]
Combining (B.20) and (B.22), we obtain
[TABLE]
This completes the proof of (4.63). We next show (4.62).
Recall . Obviously, as . It is elementary to check there exists such that for all . Hence,
[TABLE]
for all . Reviewing and , we have
[TABLE]
since for all . Furthermore, as is sufficiently large, and by Assumption 1 in (3.1). Consequently,
[TABLE]
We obtain (4.62) and the proof is completed. ∎
Proof of Lemma 8.
It is trivial to show that
[TABLE]
for . Recall the assumption that . Then as is sufficiently large. Now,
[TABLE]
By (B.27), we see
[TABLE]
Now, reviewing and , we have . This joint with (B.29) implies that
[TABLE]
Since , we know uniformly in . Thus,
[TABLE]
as is sufficiently large. We then get (4.66). Evidently,
[TABLE]
as This implies that
[TABLE]
The assertion (4.67) is verified. ∎
Proof of Lemma 9.
Review the notation above (2.1) with Let be the eigenvalues of . According to James, (1964) or Muirhead, (2009), has density function
[TABLE]
where . In addition, has density
[TABLE]
see, for example, Chapter 17 from Mehta, (2004). Note that , so we can write down the expression of as follows.
[TABLE]
for and , otherwise. Denote
[TABLE]
Then,
[TABLE]
for . By Lemma A.1 in Appendix A,
[TABLE]
for . By the Taylor expansion,
[TABLE]
for all . Therefore,
[TABLE]
for Writing , it is easy to check
[TABLE]
Combining (B.38)-(B.40), and noting that for all , we get
[TABLE]
provided , where is the error term and it is controlled by
[TABLE]
By using the trivial bound that for each , we obtain the desired conclusion from the above two assertions. ∎
Proof of Lemma 10.
According to the density function of the Wishart distribution [see, e.g., Anderson, (1962) or Muirhead, (2009)], the density function for is
[TABLE]
for every positive definite matrix , where is the multivariate gamma function defined by
[TABLE]
and we write for the determinant of a matrix . It is easy to see that the density function for is given by
[TABLE]
for every matrix such that is positive definite. Simplifying the above display, we further have
[TABLE]
where . On the other hand,
[TABLE]
where ; see, for instance, Mehta, (2004). Now we consider
[TABLE]
for every and by Lemma A.2 in Appendix A, where are the eigenvalues of . From (B.39) and (B.40),
[TABLE]
if , where is the error term satisfying
[TABLE]
In addition, and The above three assertions lead to
[TABLE]
provided . The proof is finished. ∎
Proof of Lemma 12.
Let be balls centered around , respectively, such that covers the unit sphere . Then, for any , by (B.8),
[TABLE]
According to the distribution of ,
[TABLE]
where for any . In fact, for any ,
[TABLE]
such that is equal to
[TABLE]
by independence. Recall for all . Thus, for , the first term on the right side of (B.50) is bounded by
[TABLE]
since . Observe that
[TABLE]
Thus,
[TABLE]
if . Now turn to estimate the last probability in (B.50). Note that
[TABLE]
Note that and have the same distribution, where , and and are independent. Also Thus, the last probability in (B.50) is dominated by
[TABLE]
Notice under the given condition . Let for . It is easy to check that and that as By (B.14), the last probability above is no more than
[TABLE]
Hence,
[TABLE]
Combining the above display with (B.50) and (B.54), we have
[TABLE]
The desired conclusion follows since (Rogers,, 1963). ∎
Proof of Lemma 13.
(i)(ii): Easily,
[TABLE]
Taking on both sides and then letting , we obtain
[TABLE]
On the other side, \liminf_{p\to\infty}\mathbb{E}\big{[}e^{\alpha Z_{p}}\big{]}\geq 1 since . Hence, \lim_{p\to\infty}\mathbb{E}\big{[}e^{\alpha Z_{p}}\big{]}=1.
(ii) (i): For each , we know . Thus,
[TABLE]
Taking on both sides and then letting , we have
[TABLE]
Thus, .
(ii) (iii): First, . By the Markov inequality, for all and . It follows that
[TABLE]
for all . The conclusion then follows by first letting and then sending
(iii) (iv): This is a direct consequence of the Chebyshev inequality and the equality .
(iii) (v): Let in (iii), then
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Anderson, (1962) Anderson, T. W. (1962). An introduction to multivariate statistical analysis . Wiley New York.
- 2Arratia et al., (1990) Arratia, R., Goldstein, L., and Gordon, L. (1990). Poisson approximation and the chen-stein method. Statistical Science , pages 403–424.
- 3Bai, (1999) Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica , 9(3):611–662.
- 4Bai and Silverstein, (2010) Bai, Z. D. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices , volume 20. Springer.
- 5Bai and Yin, (1993) Bai, Z. D. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability , 21(3):1275–1294.
- 6Baraniuk et al., (2008) Baraniuk, R., Davenport, M., De Vore, R., and Wakin, M. (2008). A simple proof of the restricted isometry property for random matrices. Constructive Approximation , 28(3):253–263.
- 7Bryc et al., (2006) Bryc, W., Dembo, A., and Jiang, T. (2006). Spectral measure of large random hankel, markov and toeplitz matrices. The Annals of Probability , pages 1–38.
- 8Cai et al., (2013) Cai, T. T., Fan, J., and Jiang, T. (2013). Distributions of angles in random packing on spheres. The Journal of Machine Learning Research , 14(1):1837–1864.
