Asymptotic Analysis for Extreme Eigenvalues of Principal Minors of   Random Matrices

T. Tony Cai; Tiefeng Jiang; Xiaoou Li

arXiv:1905.08757·math.ST·May 22, 2019

Asymptotic Analysis for Extreme Eigenvalues of Principal Minors of Random Matrices

T. Tony Cai, Tiefeng Jiang, Xiaoou Li

PDF

Open Access

TL;DR

This paper analyzes the asymptotic behavior of extreme eigenvalues of principal minors of Wishart and Wigner matrices, with applications to high-dimensional statistics, signal processing, and compressed sensing.

Contribution

It provides new asymptotic results for extreme eigenvalues of principal minors in large random matrices, extending to Wishart and Wigner types, with practical applications.

Findings

01

Asymptotic distributions of maximum and minimum eigenvalues derived

02

Results applicable to high-dimensional statistics and signal processing

03

Insights into constructing compressed sensing matrices

Abstract

Consider a standard white Wishart matrix with parameters $n$ and $p$ . Motivated by applications in high-dimensional statistics and signal processing, we perform asymptotic analysis on the maxima and minima of the eigenvalues of all the $m \times m$ principal minors, under the asymptotic regime that $n, p, m$ go to infinity. Asymptotic results concerning extreme eigenvalues of principal minors of real Wigner matrices are also obtained. In addition, we discuss an application of the theoretical results to the construction of compressed sensing matrices, which provides insights to compressed sensing in signal processing and high dimensional linear regression in statistics.

Equations584

\mathbb{P}\Big{(}\frac{\lambda_{1}(W)-\mu_{n}}{\sigma_{n}}\leq x\Big{)}\to F_{1}(x)

\mathbb{P}\Big{(}\frac{\lambda_{1}(W)-\mu_{n}}{\sigma_{n}}\leq x\Big{)}\to F_{1}(x)

λ_{m a x} (k) = 1 \leq i \leq k, S \subset {1, ..., p}, ∣ S ∣ = k max λ_{i} (W_{S})

λ_{m a x} (k) = 1 \leq i \leq k, S \subset {1, ..., p}, ∣ S ∣ = k max λ_{i} (W_{S})

λ_{m i n} (k) = 1 \leq i \leq k, S \subset {1, ..., p}, ∣ S ∣ = k min λ_{i} (W_{S}),

λ_{m i n} (k) = 1 \leq i \leq k, S \subset {1, ..., p}, ∣ S ∣ = k min λ_{i} (W_{S}),

y = X β + z

y = X β + z

(1 - δ_{k}) ∥ β ∥_{2}^{2} \leq ∥ X β ∥_{2}^{2} \leq (1 + δ_{k}) ∥ β ∥_{2}^{2} .

(1 - δ_{k}) ∥ β ∥_{2}^{2} \leq ∥ X β ∥_{2}^{2} \leq (1 + δ_{k}) ∥ β ∥_{2}^{2} .

b_{*}(t)=\left\{\begin{array}[]{ll}\frac{t}{4-t}&0<t<{4\over 3}\\ \sqrt{t-1\over t}&t\geq{4\over 3}\end{array}.\right.

b_{*}(t)=\left\{\begin{array}[]{ll}\frac{t}{4-t}&0<t<{4\over 3}\\ \sqrt{t-1\over t}&t\geq{4\over 3}\end{array}.\right.

\hat{β} = arg min {∥ γ ∥_{1} : y = X γ, γ \in R^{p}} .

\hat{β} = arg min {∥ γ ∥_{1} : y = X γ, γ \in R^{p}} .

1 - b_{*} (t) < λ_{m i n} (t k) \leq λ_{m a x} (t k) < 1 + b_{*} (t) .

1 - b_{*} (t) < λ_{m i n} (t k) \leq λ_{m a x} (t k) < 1 + b_{*} (t) .

T_{m, n, p} = S \subset {1, ..., p}, ∣ S ∣ = m max λ_{1} (W_{S}),

T_{m, n, p} = S \subset {1, ..., p}, ∣ S ∣ = m max λ_{1} (W_{S}),

V_{m, n, p} = S \subset {1, ..., p}, ∣ S ∣ = m min λ_{m} (W_{S}) .

V_{m, n, p} = S \subset {1, ..., p}, ∣ S ∣ = m min λ_{m} (W_{S}) .

\frac{w _{ij} - n}{n} ⟹ N (0, 2) if i = j, and \frac{w _{ij}}{n} ⟹ N (0, 1) if i \neq = j,

\frac{w _{ij} - n}{n} ⟹ N (0, 2) if i = j, and \frac{w _{ij}}{n} ⟹ N (0, 1) if i \neq = j,

\tilde{w}_{ij} \sim {N (0, 2) if i = j; N (0, 1) if i < j .

\tilde{w}_{ij} \sim {N (0, 2) if i = j; N (0, 1) if i < j .

\tilde{T}_{m, p} = S \subset {1, ..., p}, ∣ S ∣ = m max λ_{1} (\tilde{W}_{S})

\tilde{T}_{m, p} = S \subset {1, ..., p}, ∣ S ∣ = m max λ_{1} (\tilde{W}_{S})

\tilde{V}_{m, p} = S \subset {1, ..., p}, ∣ S ∣ = m min λ_{m} (\tilde{W}_{S}) .

\tilde{V}_{m, p} = S \subset {1, ..., p}, ∣ S ∣ = m min λ_{m} (\tilde{W}_{S}) .

m = o (min {\frac{( lo g p ) ^{1/3}}{lo g lo g p}, \frac{n ^{1/4}}{( lo g n ) ^{3/2} ( lo g p ) ^{1/2}}}) .

m = o (min {\frac{( lo g p ) ^{1/3}}{lo g lo g p}, \frac{n ^{1/4}}{( lo g n ) ^{3/2} ( lo g p ) ^{1/2}}}) .

\displaystyle\text{$m\geq 2$ is fixed, or $m\to\infty$ with $m=o\Big{(}\frac{(\log p)^{1/3}}{\log\log p}\Big{)}$}.

\displaystyle\text{$m\geq 2$ is fixed, or $m\to\infty$ with $m=o\Big{(}\frac{(\log p)^{1/3}}{\log\log p}\Big{)}$}.

Z_{n} := \frac{T _{m, n, p} - n}{n} - 2 m lo g p \to 0

Z_{n} := \frac{T _{m, n, p} - n}{n} - 2 m lo g p \to 0

n \to \infty lim E [e^{α ∣ Z_{n} ∣} 1_{{∣ Z_{n} ∣ \geq δ}}] = 0

n \to \infty lim E [e^{α ∣ Z_{n} ∣} 1_{{∣ Z_{n} ∣ \geq δ}}] = 0

Z_{n}^{'} := \frac{V _{m, n, p} - n}{n} + 2 m lo g p \to 0

Z_{n}^{'} := \frac{V _{m, n, p} - n}{n} + 2 m lo g p \to 0

n \to \infty lim E [e^{α ∣ Z_{n}^{'} ∣} 1_{{∣ Z_{n}^{'} ∣ \geq δ}}] = 0

n \to \infty lim E [e^{α ∣ Z_{n}^{'} ∣} 1_{{∣ Z_{n}^{'} ∣ \geq δ}}] = 0

\tilde{Z}_{p} := \tilde{T}_{m, p} - 2 m lo g p \to 0

\tilde{Z}_{p} := \tilde{T}_{m, p} - 2 m lo g p \to 0

p \to \infty lim E [e^{α ∣ \tilde{Z}_{p} ∣} 1_{{∣ \tilde{Z}_{p} ∣ \geq δ}}] = 0

p \to \infty lim E [e^{α ∣ \tilde{Z}_{p} ∣} 1_{{∣ \tilde{Z}_{p} ∣ \geq δ}}] = 0

\tilde{Z}_{p}^{'} := \tilde{V}_{m, p} + 2 m lo g p \to 0

\tilde{Z}_{p}^{'} := \tilde{V}_{m, p} + 2 m lo g p \to 0

p \to \infty lim E [e^{α ∣ \tilde{Z}_{p}^{'} ∣} 1_{{∣ \tilde{Z}_{p}^{'} ∣ \geq δ}}] = 0

p \to \infty lim E [e^{α ∣ \tilde{Z}_{p}^{'} ∣} 1_{{∣ \tilde{Z}_{p}^{'} ∣ \geq δ}}] = 0

\tilde{w}_{ij} \sim {N (0, η) if i = j; N (0, 1) if i < j .

\tilde{w}_{ij} \sim {N (0, η) if i = j; N (0, 1) if i < j .

\frac{T ~ _{m, p}}{[ 4 ( m - 1 ) + 2 η ] lo g p} \to 1

\frac{T ~ _{m, p}}{[ 4 ( m - 1 ) + 2 η ] lo g p} \to 1

W_{{i, j}} = (n \sum_{k = 1}^{n} x_{k i} x_{k j} \sum_{k = 1}^{n} x_{k i} x_{k j} n)

W_{{i, j}} = (n \sum_{k = 1}^{n} x_{k i} x_{k j} \sum_{k = 1}^{n} x_{k i} x_{k j} n)

1 - b_{*} (t) < λ_{m i n} (t k) \leq λ_{m a x} (t k) < 1 + b_{*} (t)

1 - b_{*} (t) < λ_{m i n} (t k) \leq λ_{m a x} (t k) < 1 + b_{*} (t)

λ_{m a x} (t k) = 1 + 2 \frac{t k lo g p}{n} (1 + o_{p} (1))

λ_{m a x} (t k) = 1 + 2 \frac{t k lo g p}{n} (1 + o_{p} (1))

λ_{m i n} (t k) = 1 - 2 \frac{t k lo g p}{n} (1 + o_{p} (1)) .

λ_{m i n} (t k) = 1 - 2 \frac{t k lo g p}{n} (1 + o_{p} (1)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Blind Source Separation Techniques · Mathematical Analysis and Transform Methods

Full text

Asymptotic Analysis for Extreme Eigenvalues of Principal Minors of Random Matrices

T. Tony Cai

Department of Statistics, The Wharton School, University of Pennsylvania

Tiefeng Jiang

School of Statistics, University of Minnesota

Xiaoou Li

School of Statistics, University of Minnesota

Abstract

Consider a standard white Wishart matrix with parameters $n$ and $p$ . Motivated by applications in high-dimensional statistics and signal processing, we perform asymptotic analysis on the maxima and minima of the eigenvalues of all the $m\times m$ principal minors, under the asymptotic regime that $n,p,m$ go to infinity. Asymptotic results concerning extreme eigenvalues of principal minors of real Wigner matrices are also obtained. In addition, we discuss an application of the theoretical results to the construction of compressed sensing matrices, which provides insights to compressed sensing in signal processing and high dimensional linear regression in statistics.

Keywords: random matrix, extremal eigenvalue, maximum of random variables, minimum of random variables.

1 Introduction

Random matrix theory is traditionally focused on the spectral analysis of eigenvalues and eigenvectors of a single random matrix. See, for example, Wigner, (1955, 1958); Dyson, 1962a ; Dyson, 1962b ; Dyson, 1962c ; Mehta, (2004); Tracy and Widom, (1994, 1996, 2000); Diaconis and Evans, (2001); Johnstone, (2001, 2008); Jiang, 2004b ; Jiang, 2004a ; Bryc et al., (2006); Bai and Silverstein, (2010). It is important in its own right and has been proved to be a powerful tool in a wide range of fields including high-dimensional statistics, quantum physics, electrical engineering, and number theory.

The laws of large numbers and the limiting distributions for the extreme eigenvalues of the Wishart matrices are now well known, see, e.g., Bai, (1999) and Johnstone, (2001, 2008). Let ${X}={X}_{n\times p}$ be a random matrix with i.i.d. $N(0,1)$ entries and let $W={X}^{\intercal}{X}$ . Let $\lambda_{1}(W)\geq\cdots\geq\lambda_{p}(W)$ be the eigenvalues of $W$ . The limiting distribution of the largest eigenvalue $\lambda_{1}(W)$ satisfies, for $n,p\rightarrow\infty$ with $n/p\to\gamma$ ,

[TABLE]

where $\mu_{n}=(\sqrt{n-1}+\sqrt{p})^{2}$ and $\sigma_{n}=(\sqrt{n-1}+\sqrt{p})(\frac{1}{\sqrt{n-1}}+\frac{1}{\sqrt{p}})^{1/3}$ and $F_{1}(x)$ is the distribution function of the Tracy-Widom law of type I. The results for the smallest eigenvalue $\lambda_{p}(W)$ can be found in, e.g., Edelman, (1988) and Bai and Yin, (1993). These results have also been extended to generalized Wishart matrices, i.e., the entries of ${X}$ are i.i.d. but not necessarily normally distributed, in, e.g., Bai and Silverstein, (2010); Péché, (2009); Tao and Vu, (2010).

Motivated by applications in high-dimensional statistics and signal processing, we study in this paper the extreme eigenvalues of the principal minors of a Wishart matrix $W$ . Write ${X}=(x_{ij})_{n\times p}=({x}_{1},\cdots,{x}_{p}).$ Let $S=\{i_{1},\cdots,i_{k}\}\subset\{1,2,\cdots,p\}$ with the size of $S$ being $k$ and ${X}_{S}=({x}_{i_{1}},\cdots,{x}_{i_{k}})$ . Then $W_{S}={X}_{S}^{\intercal}{X}_{S}$ is a $k\times k$ principal minor of $W.$ Denote by $\lambda_{1}(W_{S})\geq\cdots\geq\lambda_{k}(W_{S})$ the eigenvalues of $W_{S}$ in descending order. We are interested in the largest and the smallest eigenvalues of all the $k\times k$ principal minors of $W$ in the setting that $n$ , $p$ , and $k$ are large but $k$ relatively smaller than $\min\{n,p\}$ . More specifically, we are interested in the properties of the maximum of the eigenvalues of all $k\times k$ minors:

[TABLE]

and the minimum of the eigenvalues of all $k\times k$ minors:

[TABLE]

where $|S|$ denotes the cardinality of the set $S$ .

This is a problem of significant interest in its own right, and it has important applications in statistics and engineering. Before we establish the properties for the extreme eigenvalues $\lambda_{\max}(k)$ and $\lambda_{\min}(k)$ , of the $k\times k$ principal minors of a Wishart matrix $W$ , we first discuss an application in signal processing and statistics, namely the construction of the compressed sensing matrix, as the motivation for our study. The properties of the extreme eigenvalues $\lambda_{\max}(k)$ and $\lambda_{\min}(k)$ can also be used in other applications, including testing for the covariance structure of a high-dimensional Gaussian distribution, which is an important problem in statistics.

1.1 Construction of Compressed Sensing Matrices

Compressed sensing, which aims to develop efficient data acquisition techniques that allow accurate reconstruction of highly undersampled sparse signals, has received much attention recently in several fields, including signal processing, applied mathematics and statistics. The development of the compressed sensing theory also provides crucial insights into inference for high dimensional linear regression in statistics. It is now well understood that the constrained $\ell_{1}$ minimization method provides an effective way for recovering sparse signals. See, e.g., Candes and Tao, (2005, 2007), Donoho, (2006), and Donoho et al., (2006). More specifically, in compressed sensing, one observes $(X,y)$ with

[TABLE]

where $y\in\mathbb{R}^{n}$ , $X\in\mathbb{R}^{n\times p}$ with $n$ being much smaller than $p$ , $\beta\in\mathbb{R}^{p}$ is a sparse signal of interest, and $z\in\mathbb{R}^{n}$ is a vector of measurement errors. One wishes to recover the unknown sparse signal $\beta\in\mathbb{R}^{p}$ based on $(X,y)$ using an efficient algorithm.

Since the number of measurements $n$ is much smaller than the dimension $p$ , without structural assumptions, the signal $\beta$ is under-determined, even in the noiseless case. A usual assumption in compressed sensing is that $\beta$ is sparse and one of the most commonly used frameworks for sparse signal recovery is the Restricted Isometry Property (RIP). See Candes and Tao, (2005). A vector is said to be $k$ -sparse if $|{\rm supp}(v)|\leq k$ , where ${\rm supp}(v)=\{i:v_{i}\neq 0\}$ is the support of $v$ . In compressed sensing, the RIP requires subsets of certain cardinality of the columns of $X$ to be close to an orthonormal system. For an integer $1\leq k\leq p$ , define the restricted isometry constant $\delta_{k}$ to be the smallest non-negative numbers such that for all $k$ -sparse vectors $\beta$ ,

[TABLE]

There are a variety of sufficient conditions on the RIP for the exact/stable recovery of $k$ -sparse signals. A sharp condition was established in Cai and Zhang, (2014) and a conjecture was proved in Zhang and Li, (2018). Let

[TABLE]

For any given $t>0$ , the condition $\delta_{tk}<b_{*}(t)$ guarantees the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_{1}$ minimization

[TABLE]

Moreover, for any $\varepsilon>0$ , $\delta_{tk}<b_{*}(t)+\varepsilon$ is not sufficient to guarantee the exact recovery of all $k$ -sparse signals for large $k$ . In addition, the conditions $\delta_{tk}<b_{*}(t)$ is also shown to be sufficient for stable recovery of approximately sparse signals in the noisy case.

One of the major goals of compressed sensing is the construction of the measurement matrix ${X}_{n\times p}$ , with the number of measurements $n$ as small as possible relative to $p$ , such that all $k$ -sparse signals can be accurately recovered. Deterministic construction of large measurement matrices that satisfy the RIP is known to be difficult. Instead, random matrices are commonly used. Certain random matrices have been shown to satisfy the RIP conditions with high probability. See, e.g., Baraniuk et al., (2008). When the measurement matrix $X$ is a Gaussian matrix with i.i.d. $N(0,{1\over n})$ entries, for any given $t$ , the condition $\delta_{tk}<b_{*}(t)$ is equivalent to that the extreme eigenvalues, $\lambda_{\max}(tk)$ and $\lambda_{\min}(tk)$ , of the $tk\times tk$ principal minors of the Wishart matrix $W={X}^{\intercal}{X}$ satisfy

[TABLE]

Hence the condition (1.5) can be viewed as a condition on $\lambda_{\min}(tk)$ and $\lambda_{\max}(tk)$ as defined in (1.2) and (1.3), respectively.

1.2 Main results and organization of the paper

In this paper, we investigate the asymptotic behavior of the extreme eigenvalues $\lambda_{\max}(m)$ and $\lambda_{\min}(m)$ defined in (1.2) and (1.3). We also consider the extreme eigenvalues of a related Wigner matrix. We then discuss the application of the results in the construction of compressed sensing matrices.

The rest of the paper is organized as follows. Section 2 describes the precise setting of the problem. The main results are stated in Section 3. The proofs of the main theorems are given in Section 4. The proofs of all the supporting lemma are given in the Appendix. The proof strategy for the main results is given in Section 4.1.

2 Problem settings

In this paper, we consider a white Wishart matrix $W=(w_{ij})_{1\leq i,j\leq p}=X^{\intercal}X$ , where $X=(x_{ij})_{1\leq i\leq n,1\leq j\leq p}$ and $x_{ij}$ are independent $N(0,1)$ -distributed random variables. For $S\subset\{1,...,p\}$ , set the principal minor $W_{S}=(w_{ij})_{i,j\in S}$ . For an $m\times m$ symmetric matrix $A$ , let $\lambda_{1}(A)$ and $\lambda_{m}(A)$ denote the largest and the smallest eigenvalues of $A$ , respectively. Let

[TABLE]

and $|S|$ denotes the cardinality of the set $S$ . We also define

[TABLE]

Of interest is the asymptotic behavior of $T_{m,n,p}$ and ${V}_{m,n,p}$ when both $n$ and $p$ grow large.

Notice $W_{ij}$ is the sum of $n$ independent and identically distributed (i.i.d.) random variables. By the standard central limit theorem, for given $i\geq 1$ and $j\geq 1$ , we have

[TABLE]

as $n\to\infty$ , where we use “ $\Longrightarrow$ ” to indicate convergence in distribution. Motivated by this limiting distribution, we also consider the Wigner matrix $\tilde{W}=(\tilde{w}_{ij})_{1\leq i,j\leq p}$ , which is a symmetric matrix whose upper triangular entries are independent Gaussian variables with the following distribution

[TABLE]

For $S\subset\{1,...,p\}$ , set $\tilde{W}_{S}=(\tilde{w}_{ij})_{i,j\in S}$ . We will work on the corresponding statistics

[TABLE]

and

[TABLE]

In this paper, we study asymptotic results regarding the four statistics $T_{m,n,p}$ , ${V}_{m,n,p}$ , $\tilde{V}_{m,p}$ and $\tilde{T}_{m,p}$ .

3 Main results

Throughout the paper, we will let $n\to\infty$ and let $p=p_{n}\to\infty$ with a speed depending on $n$ . The following technical assumptions will be used in our main results.

Assumption 1. The integer $m\geq 2$ is fixed and $\log p=o(n^{1/2})$ ; or $m\to\infty$ with

[TABLE]

Notice the second part of Assumption 1 implies that $\log p=o(n^{1/2}(\log n^{-3}))$ . It says the population dimension $p$ can be very large and it can be as large as $\exp\{o(n^{1/2}/\log n^{3})\}$ . This assumption is used in the analysis of $T_{m,n,p}$ and ${V}_{m,n,p}$ . The requirement $m=o((\log p)^{1/3}/\log\log p)$ is used in the last step in (4.90). The second part of the condition $m=o(n^{1/4}(\log n)^{-3/2}(\log p)^{-1/2})$ is needed in a few places including (4.86). The key scales $(\log p)^{1/3}$ and $n^{1/4}$ in condition (3.1) are tight, the terms of lower order $\log\log p$ and $(\log n)^{3/2}$ can be improved to be relatively smaller.

The next assumption is needed for studying the properties of $\tilde{V}_{m,p}$ and $\tilde{T}_{m,p}$ .

Assumption 2. The integer $m$ satisfies that

[TABLE]

This condition is the same as the first part of (3.1). We start with asymptotic results for $T_{m,n,p}$ in (2.1) and ${V}_{m,n,p}$ in (2.2).

Theorem 1.

Suppose Assumption 1 in (3.1) holds. Recall $T_{m,n,p}$ defined as in (2.1). Then,

[TABLE]

in probability as $n\to\infty.$ Furthermore,

[TABLE]

for all $\alpha>0$ and $\delta>0$ .

Remark 1.

Suppose Assumption 1 in (3.1) holds. Recall ${V}_{m,n,p}$ defined as in (2.2). Similar to the proof of Theorem 1 it can be shown that

[TABLE]

in probability as $n\to\infty$ , and furthermore,

[TABLE]

for all $\alpha>0$ and $\delta>0$ . For reasons of space, we omit the details here.

We now turn to the asymptotic analysis for $\tilde{T}_{m,p}$ and $\tilde{V}_{m,p}$ .

Theorem 2.

Suppose Assumption 2 in (3.2) is satisfied. Recall $\tilde{T}_{m,p}$ defined as in (2.5). Then,

[TABLE]

in probability as $n\to\infty.$ Furthermore,

[TABLE]

for all $\alpha>0$ and $\delta>0$ .

Remark 2.

Suppose Assumption 2 in (3.2) is satisfied. Review $\tilde{W}=(\tilde{w}_{ij})_{1\leq i,j\leq p}$ above (2.4), we know $\tilde{W}$ and $-\tilde{W}$ have the same distribution. Let $\tilde{V}_{m,p}$ be defined as in (2.6). It follows that $-\tilde{T}_{m,p}$ and $\tilde{V}_{m,p}$ have the same distribution. Then, by Theorem 2,

[TABLE]

in probability as $n\to\infty.$ Furthermore,

[TABLE]

for all $\alpha>0$ and $\delta>0$ .

To better explain the convergence results in the (3.4) – (3.10), we give the following comments.

Remark 3.

Equation (3.4) has the following implications, whose rigorous justification is given in Section 4.

$\lim\limits_{n\to\infty}\mathbb{E}\left[e^{\alpha|Z_{n}|}\right]=1$ * for all $\alpha>0$ ;* 2. 2.

$\lim\limits_{n\to\infty}\mathbb{E}(|Z_{n}|^{\alpha})=0$ * for all $\alpha>0$ ;* 3. 3.

$\lim\limits_{n\to\infty}{\rm Var}(Z_{n})=0$ .

We now elaborate on the above results. First, the moment generating function of $|Z_{n}|$ exists and is close to $1$ when $n$ is large. As a result, $|Z_{n}|$ has a sub-exponential tail probability for large $n$ . Second, $Z_{n}$ converges to [math] in $L_{q}$ for all $q>0$ . Third, the variance of $Z_{n}$ vanishes for large $n$ , indicating that ${\rm Var}(T_{m,n,p})=o(n)$ as $n\to\infty$ . Overall, we can see (3.4) is stronger than the typical convergence in probability. This provides information on the behavior of the tail probability. Similar interpretations can also be made for (3.6), (3.8) and (3.10), respectively.

3.1 Extensions

In this section, we discuss extensions of Theorems 1 and 2. Similar extensions can also be made to Remarks 1 and 2. They are omitted for the clarity of presentation.

First, we point out that Theorems 1 and 2 still hold if we replace the size- $m$ principal minors by the principal minors with the size no larger than $m$ in the definition of $\tilde{T}_{m,p}$ and $T_{m,n,p}$ , by the eigenvalue interlacing theorem [see, e.g., Horn and Johnson, (2012)]. We then have the following corollary.

Corollary 1.

Define $\hat{T}_{m,n,p}=\max_{S\subset\{1,...,p\},|S|\leq m}\lambda_{1}(W_{S})$ and $\hat{T}_{m,p}=\max_{S\subset\{1,...,p\},|S|\leq m}\lambda_{1}(\tilde{W}_{S})$ . Then, Theorems 1 and 2 still hold if “ $T_{m,n,p}$ ” and “ $\tilde{T}_{m,p}$ ” are replaced by “ $\hat{T}_{m,n,p}$ ” and “ $\hat{T}_{m,p}$ ”, respectively.

Next, we extend Theorem 2 to allow other values of variance for the Wigner matrix. Here, we assume that the matrix $\tilde{W}$ to have the following distribution, instead of that in (2.4). For some $\eta\geq 0$ ,

[TABLE]

In addition, assume that $\tilde{W}$ is symmetric and $\tilde{w}_{ij}$ are independent for $i\leq j$ . Note that if $\eta=2$ , then the above distribution is the same as that defined in (2.4). For $\tilde{W}$ defined in (3.11), we consider the statistic $\tilde{T}_{m,p}$ . The following law of large numbers is obtained.

Theorem 3.

Suppose $p\to\infty$ and that Assumption 2 in (3.2) is satisfied. In addition, assume $\tilde{W}$ has the distribution as in (3.11) with $0\leq\eta\leq 2$ . Then,

[TABLE]

in probability as $n\to\infty$ .

Remark 4.

A related open question is whether Theorem 1 can be extended to other distribution of $x_{ij}$ for the Wishart distribution. We conjecture that with certain assumptions on the moments of $x_{ij}$ and under the asymptotic regime that $n$ is sufficiently large compared to $\log p$ and $m$ , and $\frac{{\rm Var}(x_{11}^{2})}{{\rm Var}(x_{11}x_{12})}\leq 2$ , the asymptotic behavior of $\frac{T_{m,n,p}-n}{\sqrt{n}}$ will be similar to that of $\tilde{T}_{m,p}$ as is discussed in Theorem 3. We leave this question for future research, because it requires development of some technical tools that are beyond the scope of the current paper.

Some special cases for this question have been answered in the literature for Wishart matrices with non-Gaussian entries. For example, if $m=2$ , and $x_{ij}$ follows an asymmetric Rademacher distribution $\mathbb{P}(x_{ij}=1)=p$ and $\mathbb{P}(x_{ij}=-1)=1-p$ , then it is easy to check

[TABLE]

and $\lambda_{1}(W_{[i,j]})=n+|\sum_{k=1}^{n}x_{ki}x_{kj}|$ . As a result, $T_{m,n,p}=\max_{1\leq i<j\leq p}\lambda_{1}(W_{[i,j]}])=n+\max_{1\leq i<j\leq p}|\sum_{k=1}^{n}x_{ki}x_{kj}|$ . Analysis on similar quantities has been studied extensively in the literature including Jiang, 2004a ; Cai and Jiang, (2012); Zhou, (2007); Shao and Zhou, (2014); Li et al., (2012); Li and Rosalsky, (2006); Li et al., (2010); Fan et al., (2018); Cai et al., (2013).* The limiting distributions of $T_{m,n,p}$ are the Gumbel distribution.*

3.2 Application to Construction of Compressed Sensing Matrices

The main results given above have direct implications for the construction of compressed sensing matrix $X_{n\times p}$ whose entries are i.i.d. $N(0,{1\over n})$ . As discussed in the introduction, the goal is to construct the measurement matrix $X$ with the number of measurements $n$ as small as possible relative to $p$ , such that $k$ -sparse signals $\beta$ can be accurately recovered. For any given $t$ , the RIP framework guarantees accurate recover of all $k$ -sparse signals $\beta$ if the extreme eigenvalues, $\lambda_{\max}(tk)$ and $\lambda_{\min}(tk)$ , of the $tk\times tk$ principal minors of the Wishart matrix $W={X}^{\intercal}{X}$ satisfy

[TABLE]

where $b_{*}(t)$ is given in (1.6).

By setting $m=tk$ , $\lambda_{\max}(tk)=T_{m,n,p}/n$ , and $\lambda_{\min}(tk)={V}_{m,n,p}/n$ , it follows from Theorems 1 and Remark 1 that, under Assumption 1 in (3.1),

[TABLE]

and

[TABLE]

On the other hand, Assumption 1 implies that $\sqrt{\frac{m\log p}{n}}=\sqrt{\frac{tk\log p}{n}}=o(1)$ . So the above asymptotic approximation gives $\lambda_{\max}(tk)=1+o_{p}(1)$ and $\lambda_{\min}(tk)=1+o_{p}(1)$ , and hence (3.12) is satisfied. That is, Assumption 1 guarantees the exact recovery of all $k$ sparse signals in the noiseless case through the constrained $\ell_{1}$ minimization as explained in (1.5) and (1.6).

4 Technical Proofs

Throughout the proof, as mentioned earlier, we will let $n\to\infty$ and $p=p_{n}\to\infty$ ; the integer $m\geq 2$ is either fixed or $m=m_{n}\to\infty$ . The following notation will be adopted. We write $a_{n}=O(b_{n})$ if there is a constant $\kappa$ independent of $n,p$ and $m$ (unless otherwise indicated) such that $|a_{n}|\leq\kappa b_{n}$ . Moreover, we write $a_{n}=o(b_{n})$ , if there is a sequence $c_{n}$ independent of $n,p$ and $m$ such that $c_{n}\to 0$ and $|a_{n}|\leq c_{n}b_{n}$ . Define ${\xi_{p}}=\log\log\log p$ . This is a sequence growing to infinity with a very slow speed compared to $n$ and $p$ .

This section is organized as follows. We first introduce the main steps in proving Theorems 1 and 2 in Section 4.1. In Section 4.2, we present the proofs for Theorems 1-3, Corollary 1, and Remark 3. The proofs for all technical lemmas are given in the Appendix. For reader’s convenience, we list the content of each section below.

Section 4.1. The Strategy of the Proofs for Theorems 1 and 2.

Section 4.2. Proof of the results in Section 3.

Section 4.2.1. Proof of Theorem 2.

Section 4.2.2. Proof of Theorem 1.

Section 4.2.3. Proofs of Theorem 3 and Remark 3.

4.1 The Strategy of the Proofs for Theorems 1 and 2

We first explain the proof strategy for Theorem 2 and then explain that for Theorem 1, since Wigner matrices have simpler structure than Wishart matrices. The proof of Theorem 2 consists of three steps. The first step is to find an upper bound on the right tail probability $\mathbb{P}(\tilde{T}_{m,p}\geq 2\sqrt{m\log p}+t)$ for $t\geq\delta$ . Our method here is to first develop a moderate deviation bound of $\mathbb{P}(\lambda_{1}(\tilde{W}_{S})\geq 2\sqrt{m\log p}+t)$ for each $S\subset\{1,...,p\}$ and $|S|=m$ , and then use the union bound to control $\mathbb{P}(\tilde{T}_{m,p}\geq 2\sqrt{m\log p}+t)$ . The second step is to find an upper bound on the left tail probability $\mathbb{P}(\tilde{T}_{m,p}\leq 2\sqrt{m\log p}-t)$ for $t\geq\delta$ . Our approach is to construct a sequence of events $E_{p,m}$ with high probability, such that when $E_{p,m}$ occurs, there exists $S\subset\{1,...,p\}$ satisfying $|S|=m$ and $\lambda_{1}(\tilde{W}_{S})\geq 2\sqrt{m\log p}-t$ . The third step is to combine the left and right tail bounds obtained from the previous two steps to show (3.8).

The proof of Theorem 1 is based on a similar strategy to that of Theorem 2. A new and key ingredient is to control the approximation speed of the Wishart matrix to the Wigner matrix (after normalization). Change-of-measure arguments are used to quantify the approximation speed in the moderate deviation domain.

We point out that the proof for the asymptotic lower bound of $\tilde{T}_{m,p}$ in this paper is different from the standard technique for analyzing the maximum/minimum statistic for a large random matrix (see, e.g. Jiang, 2004a ). In particular, the proof in Jiang, 2004a employs the Chen-Stein’s Poisson approximation method [see, e.g., Arratia et al., (1990)] and the asymptotic independence. However, this method does not fit our problem. For this reason, new technique are developed and, in particular, we construct an event on which $\tilde{T}_{m,p}$ achieves the asymptotic lower bound.

4.2 Proof of the results in Section 3

As mentioned earlier, Wigner matrices have simpler structure than Wishart matrices. Thus, we first present the proof of Theorem 2, followed by the proof of Theorem 1. At the end of the section, the proofs of Corollary 1, Theorem 3 and Remark 3 are presented.

In each proof we will need auxiliary results. To make the proof clearer, we place the proofs of the auxiliary results in the Appendix. Sometimes a statement or a formula holds as $n$ is sufficiently large. We will not say “as $n$ is sufficiently large” if the context is apparent.

4.2.1 Proof of Theorem 2

To prove Theorem 2, we need the following two key results.

Proposition 1.

Suppose Assumption 2 in (3.2) is satisfied. Recall $\tilde{T}_{m,p}$ defined as in (2.5). Then,

[TABLE]

for every $\alpha>0$ and every $\delta>0$ .

Proposition 2.

Suppose Assumption 2 in (3.2) is satisfied. Recall $\tilde{T}_{m,p}$ defined as in (2.5). Then,

[TABLE]

for every $\alpha>0$ and every $\delta>0$ .

Another auxiliary lemma is need. Its proof is put in the Appendix.

Lemma 1.

Let $Z\geq 0$ be a random variable with $\mathbb{E}[e^{\alpha Z}]<\infty$ for all $\alpha>0$ . Then

[TABLE]

for every $\alpha>0$ and every $\delta>0.$

Proof of Theorem 2.

By Propositions 1 and 2, we have

[TABLE]

for any $\alpha>0$ and $\delta>0$ . Consequently, for given $\alpha>0$ , there exists a sequence of positive numbers $a_{p}\to 0$ such that

[TABLE]

for all $t\geq\delta$ as $p$ is sufficiently large. Now we estimate

[TABLE]

By applying Lemma 1 to $Z_{p,m}=|\tilde{T}-2\sqrt{m\log p}|$ , we see

[TABLE]

According to (4.4), the above display can be bounded from above by

[TABLE]

which tends to [math] as $p\to\infty$ . The proof is then complete. ∎

Now we proceed to prove Propositions 1 and 2.

Proof of Proposition 1.

For any $t>0$ , we have from the definition of $\tilde{T}_{m,p}$ that

[TABLE]

where in the last inequality we use the fact that $W_{S}$ are identically distributed for all different $S$ with $|S|=m$ . The following result enables us to bound the last probability.

Lemma 2.

Let $\tilde{W}_{\{1,...,m\}}$ be defined as above (2.5) with $S=\{1,...,m\}$ . Then there is a constant $\kappa>0$ such that

[TABLE]

for all $x>4\sqrt{m}$ and all $m\geq 2$ .

Taking $x:=2\sqrt{m\log p}+t$ in the above lemma, we know $x>4\sqrt{m}$ as $n$ is large enough, and hence

[TABLE]

Note that $-\frac{1}{4}t^{2}\leq 0$ , $\kappa m\log(2\sqrt{m\log p})=O({m}\log\log p)$ , and $\kappa m\log(1+\frac{t}{2\sqrt{m\log p}})=O(\frac{\sqrt{m}}{\sqrt{\log p}}t)<t$ as $p$ is sufficiently large. Thus, the above inequality further implies

[TABLE]

uniformly for all $t\geq 0$ as $p$ sufficiently large, where $\alpha>0$ is fixed. With the above inequality, we complete the proof.

∎

Proof of Proposition 2.

Recall ${\xi_{p}}=\log\log\log p$ . The proof will be evidently finished if the following two limits hold. For each $\alpha>0$ and each $\delta>0$ ,

[TABLE]

and

[TABLE]

We now verify the above two limits.

The proof of (4.12). Recall

[TABLE]

for any $k\times k$ square and symmetric matrix $A=(a_{ij})_{1\leq i,j\leq k}$ , where $\lambda_{1}(A)$ is the largest eigenvalue of $A.$

For each $S\subset\{1,...,p\}$ such that $|S|=m$ and $\tilde{W}_{S}=(\tilde{W}_{ij})$ , set

[TABLE]

where

[TABLE]

If $0<t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ then $0<\varepsilon_{m,p,t}<1$ . According to (4.14), if there exists $S_{0}\subset\{1,...,p\}$ such that $|S_{0}|=m$ and $\tilde{A}_{S_{0}}$ occurs, then

[TABLE]

Define

[TABLE]

where $\mathbf{1}_{\tilde{A}_{S}}$ is the indicator function of $\tilde{A}_{S}$ . Then,

[TABLE]

For any random variable $Y$ with $\mathbb{E}Y>0$ and $\mathbb{E}(Y^{2})<\infty$ , we have

[TABLE]

Applying this inequality to $\tilde{Q}_{m,p}$ , we obtain

[TABLE]

We proceed to find a lower bound on $\mathbb{E}\tilde{Q}_{m,p}$ and an upper bound on $Var(\tilde{Q}_{m,p})$ in two steps.

Step 1: the estimate of $\mathbb{E}\tilde{Q}_{m,p}$ . Note that $\mathbf{1}_{\tilde{A}_{S}}$ are identically (not independently) distributed Bernoulli variables for different $S$ with success rate $\mathbb{P}(\tilde{A}_{\{1,...,m\}})$ . Thus, we have

[TABLE]

where we choose $S_{0}=\{1,...,m\}$ with a bit abuse of notation. For convenience, write

[TABLE]

Since the upper triangular entries of $\tilde{W}$ are independent Gaussian variables, we have from (4.15) that

[TABLE]

Recall that $\tilde{W}_{kk}\sim N(0,2)$ and $\tilde{W}_{ij}\sim N(0,1)$ for $i\neq j$ . Hence

[TABLE]

where $\bar{\Phi}(z)=\int_{z}^{\infty}\frac{1}{\sqrt{2\pi}}e^{-\frac{w^{2}}{2}}dw$ . It is well known that

[TABLE]

as $x\to\infty$ . Recall the assumption that $t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ , so $\tau_{m,p,t}=\sqrt{\frac{4\log p}{m}}-\frac{t}{m}\geq{\xi_{p}}\to\infty$ . Thus, by (4.25) and (4.26),

[TABLE]

Note that $1>1-\varepsilon_{m,p,t}\geq\xi_{p}\sqrt{\frac{m}{4\log p}}$ since $0<t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ . It follows that $|\log(1-\varepsilon_{m,p,t})|=O\left(\log\sqrt{\frac{\log p}{m\xi_{p}^{2}}}\right)=O(\log\log p)$ . Also, $\log\sqrt{\frac{4\log p}{m}}=O(\log\log p)$ . As a result, from (4.23) we have

[TABLE]

It follows that

[TABLE]

Combining this with (4.22), we see

[TABLE]

To control $\binom{p}{m}$ , we need the next result, which will be proved in the Appendix.

Lemma 3.

For all $m\geq p\geq 1$ , we have

[TABLE]

Using the above lemma, (4.29), and note that $m\log m=O(m^{2}\log\log p)$ , we have

[TABLE]

Step 2: the estimate of $Var(\tilde{Q}_{m,p})$ . Reviewing $\tilde{Q}_{m,p}$ in (4.18), we have

[TABLE]

Note that $\mathbb{P}(\tilde{A}_{S_{1}}\cap\tilde{A}_{S_{2}})$ is determined by $|S_{1}\cap S_{2}|$ and $m$ . By (4.15),

[TABLE]

Single out the terms where $l=0$ and $l=m$ , we further have

[TABLE]

On the other hand, $\mathbb{E}\tilde{Q}_{m,p}=\binom{p}{m}P\big{(}\tilde{A}_{\{1,...,m\}}\big{)}$ and hence

[TABLE]

Combining (4.32), (4.34) and (4.35), we arrive at

[TABLE]

Observe that $\frac{p!}{(p-2m+l)!}=p(p-1)\cdots(p-2m+l-1)\leq p^{2m-l}$ and $\frac{1}{l!(m-l)!(m-l)!}\leq 1$ . It follows that

[TABLE]

Similar to (4.25) we have

[TABLE]

Again, we find an approximation for the above display by using (4.26) and simplifying it. We arrive at

[TABLE]

Therefore, for the last term in (4.37), we see

[TABLE]

The following lemma enables us to evaluate the coefficient of $\log p.$

Lemma 4.

For any $0<\varepsilon<1$ and $m\geq 2$ , we have

[TABLE]

Applying the above lemma to (4.40), we get

[TABLE]

This inequality together with (4.31) implies that

[TABLE]

Combining the above display with (4.37), we arrive at

[TABLE]

Lemma 5.

For all integers $p\geq m\geq 1$ satisfying $2m<p$ , we have

[TABLE]

Therefore,

[TABLE]

We now study the last two terms one by one. For $m\geq 2$ ,

[TABLE]

for $n$ sufficiently large under Assumption 2 in (3.2). Recalling $\varepsilon_{m,p,t}=(4m\log p)^{-1/2}t$ , we see from (4.31) that

[TABLE]

Combining (4.46), (4.47) and (4.48), we arrive at

[TABLE]

This together with (4.19) and (4.21) yields

[TABLE]

uniformly for all $\delta\leq t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ . Consequently, we get (4.12).

The proof of (4.13). For any $S\subset\{1,...,p\}$ with $|S|=m$ , write $\tilde{W}_{S}=(\tilde{W}_{ij})_{i,j\in S}$ . Note that $\lambda_{1}(\tilde{W}_{S})\geq\max_{i\in S}\tilde{W}_{ii}$ . Thus,

[TABLE]

As a result,

[TABLE]

where the function $\Phi(z)=\int_{-\infty}^{z}\frac{1}{\sqrt{2\pi}}e^{-\frac{s^{2}}{2}}ds$ for $z\in\mathbb{R}$ . To proceed, we discuss two scenarios: $2\sqrt{m\log p}-m{\xi_{p}}\leq t\leq 4\sqrt{m\log p}$ and $t>4\sqrt{m\log p}$ . For $2\sqrt{m\log p}-m{\xi_{p}}\leq t\leq 4\sqrt{m\log p}$ , we have

[TABLE]

where $\bar{\Phi}(z)=1-\Phi(z)$ for any $z\in\mathbb{R}$ and the inequality $\log(1-x)\leq-x$ for any $x<1$ is used in the last step. Note $\bar{\Phi}\left(\frac{1}{\sqrt{2}}m{\xi_{p}}\right)=(1+o(1))\frac{1}{\sqrt{4\pi}m{\xi_{p}}}e^{-\frac{m^{2}{\xi_{p}}^{2}}{4}}$ and $p^{0.1}({\xi_{p}})^{-1}e^{-\frac{m^{2}{\xi_{p}}^{2}}{4}}\to\infty$ since ${\xi_{p}}=\log\log\log p$ . Thus,

[TABLE]

for sufficiently large $p$ . This further implies

[TABLE]

for any $\alpha>0$ . Note that $\Phi(-x)=\bar{\Phi}(x)\leq\frac{1}{\sqrt{2\pi}\,x}e^{-x^{2}/2}\leq e^{-x^{2}/2}$ for any $x\geq 1$ . Then, for the other scenario where $t\geq 4\sqrt{m\log p}$ , we have

[TABLE]

as $n$ is large enough. Thus,

[TABLE]

for any $\alpha>0.$ Joining (4.54) and (4.56), we see (4.13). This completes the whole proof. ∎

4.2.2 Proof of Theorem 1

To prove Theorem 1, we need the following two propositions.

Proposition 3.

Suppose Assumption 1 in (3.1) holds. Recall $T_{m,n,p}$ defined as in (2.1). Then,

[TABLE]

for any $\alpha>0$ and $\delta>0$ .

Proposition 4.

Suppose Assumption 1 in (3.1) holds. Recall $T_{m,n,p}$ defined as in (2.1). Then,

[TABLE]

for any $\alpha>0$ and $\delta>0$ .

Proof of Theorem 1.

Similar to the proof of Theorem 2, it is sufficient to prove (3.4). By the same argument as in the proof of Theorem 2, with the upper bound for $\mathbb{P}(\frac{T_{m,n,p}-n}{\sqrt{n}}\geq 2\sqrt{m\log p}+t)$ given in Proposition 3 and the upper bound for $\mathbb{P}(\frac{T_{m,n,p}-n}{\sqrt{n}}\leq 2\sqrt{m\log p}-t)$ for $t>\delta$ given in Proposition 4, we get (3.4). ∎

In the following we start to prove Propositions 3 and 4.

Proof of Proposition 3.

Without loss of generality, we assume $\delta<1$ since the expectation in (3.4) is monotonically decreasing in $\delta$ .

Let $W_{\{1,...,m\}}$ be as $W_{S}$ above (2.1) with $S=\{1,2,\cdots,m\}$ . Analogous to (4.8), we have

[TABLE]

We now bound the last probability. Since the above tail probability involve moderate bound and large deviation bound for different ranges of $t$ , we will discuss three different cases and use different proof strategies. Recall $\xi_{p}=\log\log\log p$ . Set

[TABLE]

The three cases are: (1) $t>\frac{\delta\sqrt{n}}{100}$ , (2) $\delta\vee\omega_{n}\leq t\leq\frac{\delta\sqrt{n}}{100}$ , and (3) $\delta\leq t<\delta\vee\omega_{n}$ . They cover all situations for $t\geq\delta$ . For the first two cases, the upper bound is based on the next lemma, which gives a moderate deviation bound for the spectrum of $\frac{1}{\sqrt{n}}W_{\{1,...,m\}}$ from the identity matrix $I_{m}$ .

Lemma 6.

There exists a constant $\kappa>0$ such that for all $n,p,m$ , $r\geq 1$ , $0<d<1/2$ and $y>2dmr$ , we have

[TABLE]

and

[TABLE]

where $I(s)=\frac{1}{2}(s-1-\log s)$ for $s>0$ and $I(s)=\infty$ for $s\leq 0$ .

Case 1: $t>\frac{\delta\sqrt{n}}{100}$ . Let $\alpha>0$ be given. Choose $r=\max(2,1+\frac{80\alpha t}{mn})$ , $d=\min(\frac{1}{2},\frac{t}{4m\sqrt{n}r})$ , and $y=\frac{2\sqrt{m\log p}+t}{\sqrt{n}}$ in Lemma 6. The choice of $r,d,$ and $y$ satisfies that $2dmr\leq\frac{t}{2\sqrt{n}}$ and hence $y-2dmr\geq\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}$ . Set $z=\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}$ . Notice that $I(s)$ from Lemma 6 is increasing for $s\geq 1$ . Then, by the lemma,

[TABLE]

The following lemma says that both of the last two terms go to zero.

Lemma 7.

Suppose Assumption 1 in (3.1) holds. Let $\alpha>0$ and $\delta>0$ be given. For $r=\max(2,1+\frac{80\alpha t}{mn})$ , $d=\min(\frac{1}{2},\frac{t}{4m\sqrt{n}r})$ and $z=\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}$ , we have

[TABLE]

and

[TABLE]

Combining (4.57), (4.61)-(4.63), we conclude

[TABLE]

Case 2: $\delta\vee\omega_{n}\leq t\leq\frac{\delta\sqrt{n}}{100}$ . Review $\omega_{n}$ in (4.58). Now we choose $r=2$ , $d=\frac{t}{8m\sqrt{n}}<\frac{1}{2}$ and $y=\frac{2\sqrt{m\log p}+t}{\sqrt{n}}$ . Then $y>\frac{t}{2\sqrt{n}}=2dmr$ . By (4.59),

[TABLE]

where $z:=y-2dmr=\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}$ . The last two terms are analyzed in the next lemma.

Lemma 8.

Suppose Assumption 1 in (3.1) holds. Let $\omega_{n}$ be as in (4.58). For $\delta\vee\omega_{n}\leq t\leq\frac{\delta\sqrt{n}}{100}$ , $z=\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}$ and $d=\frac{t}{8m\sqrt{n}}$ , we have

[TABLE]

as $n$ is sufficiently large. In addition,

[TABLE]

as $n\to\infty$ .

Joining (4.65)-(4.67), we obtain

[TABLE]

which together with (4.57) implies that

[TABLE]

This completes our analysis for Case 2. By using the same argument as obtaining (4.68), we have the following limit, which will be used later on.

[TABLE]

We next study Case 3.

Case 3: $\delta\leq t<\delta\vee\omega_{n}$ . Note that this case is only possible if $n\geq\exp\{((\log p)/m)^{1/2}{\xi_{p}}^{-1}\delta\}$ . We point out that Lemma 6 is not a suitable approach for bounding the tail probability in this case because the term $m\log(1/d)$ , which cannot be easily controlled, will dominate the other terms in the error bound for very large $n$ . Instead, we will use another approach to obtain an upper bound of $\mathbb{P}\left(\lambda_{1}(W_{\{1,...,m\}})\geq 2\sqrt{m\log p}+t\right)$ . The main step here is to quantify the approximation of the extreme eigenvalue of a Wishart matrix to that of a Wigner matrix. We will analyze their density functions and leverage them with the results in the proof of Theorem 2.

Let $\mu=(\mu_{1},...,\mu_{m})$ be the order statistics of the eigenvalues of $W_{\{1,...,m\}}$ such that $\mu_{1}>\mu_{2}>...>\mu_{m}$ . Write $\nu=(\nu_{1},...,\nu_{m})$ with $\nu_{i}=(\mu_{i}-n)/\sqrt{n}$ . Let $\tilde{W}_{\{1,...,m\}}=(\tilde{w}_{ij})_{1\leq i,j\leq m}$ where $\tilde{w}_{ij}$ ’s are as in (2.4). Let the eigenvalues of $\tilde{W}_{\{1,...,m\}}$ be $\lambda_{1}>...>\lambda_{m}$ . Set $\lambda=(\lambda_{1},...,\lambda_{m})$ . Intuitively, the law of $\nu$ is close to that of $\lambda$ when $n$ is large. The next lemma quantifies the approximation speed. Review $\|x\|_{\infty}=\max_{1\leq i\leq m}|x_{i}|$ for any $x=(x_{1},\cdots,x_{m})\in\mathbb{R}^{m}$ .

Lemma 9.

Let $g_{n,m}(\cdot)$ be the density function of $\nu$ , and let $h_{m}(\cdot)$ be the density function of $\lambda$ . Assume $m^{3}=o(n)$ . Then,

[TABLE]

for all $v\in\mathbb{R}^{m}$ with $\|v\|_{\infty}\leq{\frac{2}{3}}\sqrt{n}$ .

Let $r_{m,n}=2\sqrt{m\log p}+\omega_{n}$ , where $\omega_{n}$ is as in (4.58). Then for $t$ such that $\delta\leq t\leq\omega_{n}$ ,

[TABLE]

There are three probabilities above, denote the second one by $H_{n}$ . For $H_{n}$ , we use the change-of-measure argument. In fact,

[TABLE]

Now

[TABLE]

since $r_{m,n}>\sqrt{m\log p}$ and $m=o(n)$ . By the definition of $\omega_{n}$ in (4.58),

[TABLE]

where Assumption 1 from (3.1) is used. Therefore,

[TABLE]

Note that $t\leq\frac{1}{\beta}e^{\beta t}$ for any $\beta>0$ and $t>0$ . It follows from (4.11) that

[TABLE]

by the fact $t\geq\delta$ and Assumption 1. Combining this with (4.71), we have

[TABLE]

We next analyze $\mathbb{P}\big{(}\max_{1\leq i\leq m}|\nu_{i}|\geq r_{m,n}\big{)}$ . Recall $r_{m,n}=2\sqrt{m\log p}+\omega_{n}$ , where $\omega_{n}$ is as in (4.58). Recall that we only discuss Case 3 when $\delta\leq t<\delta\vee\omega_{n}$ , and this is only meaningful when $\omega_{n}>\delta$ . Thus, $\delta\vee\omega_{n}=\omega_{n}\leq\frac{\sqrt{n}\delta}{100}$ . Thus, from (4.68) we have

[TABLE]

By (4.2.2),

[TABLE]

Since $\max_{1\leq i\leq m}|\nu_{i}|=\max(\nu_{1},-\nu_{m})$ , by combining (4.76) and (4.77), we see that

[TABLE]

Combining this with (4.75), we further have

[TABLE]

This completes our analysis for Case 3.

Now, we combine (4.64), (4.68) and (4.79), and arrive at

[TABLE]

This and (4.57) conclude

[TABLE]

∎

Proof of Proposition 4.

Noticing the expectation in (3.4) is non-increasing in $\delta$ . Without loss of generality, we assume $\delta<1$ .

Here we discuss two scenarios that are similar to those in the proof of Theorem 2. They are 1) $\delta\leq t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ and 2) $t>2\sqrt{m\log p}-m{\xi_{p}}$ , where ${\xi_{p}}=\log\log\log p.$

Scenario 1: $\delta\leq t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ . Similar to the proof of Theorem 2, we define the event $A_{S}$ as follows. For each $S\subset\{1,...,p\}$ with $|S|=m$ , set

[TABLE]

where $\tau_{m,p,t}=(1-\varepsilon_{m,p,t})\sqrt{\frac{4\log p}{m}}$ and $\varepsilon_{m,p,t}=(4m\log p)^{-1/2}t$ . We also define

[TABLE]

Similar to the discussion between (4.14) and (4.21) in the proof of Theorem 2, we have

[TABLE]

In the rest of the discussion under Scenario 1, we will develop a lower bound for $\mathbb{E}(Q_{m,n,p})$ and an upper bound for $Var(Q_{m,n,p})$ in two steps.

Step 1: the estimate of $\mathbb{E}(Q_{m,n,p})$ . For a $m\times m$ symmetric matrix $M$ , we use $\|M\|$ to denote its spectral norm. Set $S_{0}=\{1,2,\cdots,m\}$ . Review $\omega_{n}$ in (4.58). Since $\{\mathbf{1}_{A_{S}};\,S\subset\{1,...,p\}\ \mbox{with}\ |S|=m\}$ are identically distributed, we have

[TABLE]

where

[TABLE]

It is easy to check that Assumption 1 in (3.1) implies

[TABLE]

Similar to Lemma 9, we need the following lemma, which quantifies the speed that a Wishart matrix converges to a Wigner matrix. The difference is that the spectral norm $\|\cdot\|$ is used here instead of $\|\cdot\|_{\infty}$ in Lemma 9.

Write $W_{\{1,...,m\}}$ for $W_{S}$ above (2.1) with $S=\{1,2,\cdots,m\}$ . Review that the Wigner matrix $\tilde{W}_{\{1,...,m\}}=(\tilde{w}_{ij})_{m\times m}$ , where $\tilde{w}_{ij}$ ’s are as in (2.4).

Lemma 10.

Let $f_{m,n}(w)$ be the density function of $\frac{1}{\sqrt{n}}(W_{\{1,...,m\}}-nI_{m})$ and $\tilde{f}_{m}(w)$ be the density function of $\tilde{W}_{\{1,...,m\}}$ . If $m^{3}=o(n)$ , then

[TABLE]

*for all $m\times m$ symmetric matrix $w$ with $\|w\|\leq\frac{2}{3}\sqrt{n}$ . *

Below, we combine the above lemma and some change of measure arguments to obtain a lower bound of $\mathbb{P}(A_{S_{0}}\cap\mathcal{L}_{m,n,p})$ . Define a non-random set $B_{m,p}=\{w_{ij}:w_{ij}\geq\tau_{m,p,t},1\leq i\leq j\leq m\}$ . By the first limit from (4.86), $s_{m,n,p}\leq\frac{2}{3}\sqrt{n}$ . Therefore, from Lemma 10 we have

[TABLE]

where $\tilde{A}_{\{1,...,m\}}$ is as in (4.15) with $S=\{1,\cdots,m\}$ and $\tilde{\mathcal{L}}_{m,n,p}=\{\|\tilde{W}_{\{1,...,m\}}\|\leq s_{m,n,p}\}.$ Under Assumption 1 in (3.1), evidently $\frac{m}{s_{m,n,p}^{2}}\to 0$ and $\frac{m}{\sqrt{n}\,s_{m,n,p}}\to 0$ . This implies that

[TABLE]

Thus, we have

[TABLE]

Obviously, $\mathbb{E}(Q_{m,n,p})=\binom{p}{m}\mathbb{P}(A_{\{1,...,m\}})$ . Recalling $\tilde{A}_{\{1,...,m\}}$ and $\tilde{Q}_{m,p}$ as in (4.15) and (4.18), respectively, we see that $\mathbb{E}(\tilde{Q}_{m,p})=\binom{p}{m}\mathbb{P}(\tilde{A}_{\{1,...,m\}})$ . Thus, we further have from (4.85) and (4.88) that

[TABLE]

To further obtain a lower bound of the above expression, we analyze each term on the right-hand side. Recall the definition of $\varepsilon_{m,p,t}$ below (4.15), we know $\varepsilon_{m,p,t}\in(0,1)$ . By (4.31),

[TABLE]

where the condition $m=o((\log p)^{1/3}/\log\log p)$ from Assumption 1 in (3.1) is essentially used in the last step. Now,

[TABLE]

where the fact that $\tilde{W}_{\{1,..,m\}}$ and $-\tilde{W}_{\{1,...,m\}}$ have the same distribution is used in the last step. The following lemma help us estimate the last probability.

Lemma 11.

[Lemma 4.1 from Jiang and Li, (2015)] Let $\tilde{W}_{\{1,...,m\}}$ be defined by $\tilde{W}_{S}$ above (2.5) with $S=\{1,...,m\}$ . Then there is a constant $\kappa>0$ such that

[TABLE]

for all $x>0$ and all $m\geq 2$ .

By letting $x=s_{m,n,p}$ in Lemma 11, we have

[TABLE]

Combining the above inequality with (4.91), we arrive at

[TABLE]

Since $s_{m,n,p}\geq 10\sqrt{m\log p}$ , we know $m\log p-\frac{1}{4}s_{m,n,p}^{2}\leq-\frac{6}{25}s_{m,n,p}^{2}.$ Moreover, $\sqrt{m}s_{m,n,p}=o(s_{m,n,p}^{2})$ . Consequently,

[TABLE]

Comparing the above inequality with (4.90), we arrive at

[TABLE]

This result, combined with (4.89), gives

[TABLE]

which joint with (4.31) concludes

[TABLE]

This completes our analysis for $\mathbb{E}(Q_{m,n,p})$ .

Step 2: the estimate of $Var(Q_{m,n,p})$ . Replacing “ $\tilde{A}_{S}$ ” in (4.15) with “ $A_{S}$ ” in (4.83), and using the same argument as obtaining (4.37), we have from Lemma 5 that

[TABLE]

Now we bound the last term above. Review $\mathcal{L}_{2m,n,p}$ below (4.85). Trivially,

[TABLE]

By (4.95),

[TABLE]

Let $f_{2m,n}(w)$ be the density function of $\frac{1}{\sqrt{n}}(W_{\{1,...,2m\}}-nI_{2m})$ and $\tilde{f}_{2m}(w)$ be the density function of $\tilde{W}_{\{1,...,2m\}}$ . Review (4.82). Define (non-random) set

[TABLE]

Then,

[TABLE]

where $B:=B_{\{1,\cdots,m\}}\cap B_{\{1,...,l,m+1,...,2m-l\}}\cap\{\|w\|\leq s_{2m,n,p}\}.$ By Lemma 10 and by a change-measure argument similar to the one getting (4.88), we see

[TABLE]

The benefit of the above step is transferring the probability on the Wishart matrix to that on the Wigner matrix up to a certain error. Combining (4.100)-(4.102), we have

[TABLE]

Combining this with (4.99), we have

[TABLE]

where

[TABLE]

Thus,

[TABLE]

According to (4.43) and (4.97), the first term on the right-hand side of the above inequality is no more than

[TABLE]

Notice that $1-\varepsilon_{m,p,t}\leq 1$ and $m\geq 2$ and $O(m^{2}\log\log p+mn^{-1/2}s_{m,n,p}^{3})=o(\log p)$ . Thus, the above display further implies

[TABLE]

We next study the last two terms from (4.105).

By the condition $m=o((\log p)^{1/3}/\log\log p)$ from Assumption 1 in (3.1) and the second limit in (4.86),

[TABLE]

Recall (4.16). It is readily seen that $[1-(1-\varepsilon_{m,p,t})^{2}]m\log p\geq\varepsilon_{m,p,t}m\log p\geq\frac{t}{2}\sqrt{m\log p}$ . Consequently, it is known from (4.98) that

[TABLE]

uniformly over $\delta\leq t\leq 2\sqrt{m\log p}-m{\xi_{p}}$ . Therefore, we conclude from (4.98) and (4.108) that

[TABLE]

Combining (4.105)-(4.109), we see

[TABLE]

By (4.84) and the above inequality,

[TABLE]

Finally, from the inequality $t^{2}\leq 2e^{t}$ we have that

[TABLE]

for any $\alpha>0$ and $\delta>0$ .

Scenario 2: $t>2\sqrt{m\log p}-m{\xi_{p}}$ . Review (2.1). By the fact that $\lambda_{1}(M)\geq\max_{1\leq i\leq m}M_{ii}$ for any non-negative definite matrix $M=(M_{ij})_{m\times m}$ , we have

[TABLE]

where $W_{ii}=\sum_{j=1}^{n}x_{ji}^{2}$ and $\{W_{ii};\,1\leq i\leq m\}$ are i.i.d. random variables. Thus, by independence,

[TABLE]

Note that $W_{11}=\sum_{j=1}^{n}x_{j1}^{2}$ is a sum of i.i.d. random variables with $Var(x_{11})=2$ and $\mathbb{E}(x_{11}^{6})<\infty$ . We discuss two situations: $2\sqrt{m\log p}-m{\xi_{p}}\leq t\leq 4\sqrt{m\log p}$ and $t\geq 4\sqrt{m\log p}$ .

Assuming $2\sqrt{m\log p}-m{\xi_{p}}\leq t\leq 4\sqrt{m\log p}$ for now. Recalling $\Phi(x)=(2\pi)^{-1/2}\int_{-\infty}^{x}e^{-t^{2}/2}\,dt$ , we get from the Berry-Essen Theorem that

[TABLE]

for some constant $\kappa>0$ . Combine the above inequalities with (4.53) to see

[TABLE]

By (3.1), $\sqrt{m\log p}\leq\log p$ . It is easy to check

[TABLE]

We proceed to the second situation: $t\geq 4\sqrt{m\log p}$ . In this case, $2\sqrt{m\log p}-t\leq-2\sqrt{m\log p}$ . By Lemma 1 from Laurent and Massart, (2000),

[TABLE]

for any $x>0$ . Thus,

[TABLE]

This inequality and (4.114) yield

[TABLE]

Consequently,

[TABLE]

Hence,

[TABLE]

By collecting (4.112), (4.117) and (4.121) together, we arrive at

[TABLE]

The proof is completed. ∎

4.2.3 Proofs of Theorem 3 and Remark 3

The following lemma serves the proof of Theorem 3. Its own proof is placed in Appendix.

Lemma 12.

Let $\tilde{W}=\tilde{W}_{m\times m}$ be as defined in (3.11) with $0\leq\eta\leq 2$ . Then

[TABLE]

for all $r\geq 4m$ , $\delta\in(0,1)$ and $x>2r\delta+1$ .

Proof of Theorem 3.

For any $0<\varepsilon<1$ , we first show that

[TABLE]

by using Lemma 12. To do so, set $x=(1+\varepsilon)\sqrt{[4m+2(\eta-2)]\log p}$ ,

$r=\sqrt{128m\log p}$ and $\delta=(8r)^{-1}\varepsilon\sqrt{[4m+2(\eta-2)]\log p}$ . Rewrite $\delta$ such that

[TABLE]

It is easy to check that the coefficient of $\varepsilon$ is always sitting in $[1/64,2/64]$ for any $m\geq 2$ and $\eta\in[0,2]$ . This, the fact that $\sup_{k\geq 2}(k^{1.5}\log k)\cdot\delta^{k}<\infty$ and the definition of $r$ lead to

[TABLE]

We can see that

[TABLE]

It follows that

[TABLE]

This and (4.125) implies (4.124). Consequently,

[TABLE]

To complete the proof, it is enough to check that

[TABLE]

for each $\varepsilon\in(0,1).$ For notational simplicity, let $K_{m}=4m+2(\eta-2)$ and $\tau_{m,p}=\log p/K_{m}.$ Similar to the proof of Theorem 2, define

[TABLE]

for each $S\subset\{1,...,p\}$ with $|S|=m$ . We next compute $\mathbb{P}(\tilde{A}_{S_{0}})$ and $\mathbb{P}(\tilde{A}_{S_{0}}\cap\tilde{A}_{S_{1}})$ , respectively, where $S_{0}=\{1,...,m\}$ and $S_{1}=\{1,...,l,m+1,...,2m-l\}$ . By independence,

[TABLE]

Since $\tilde{W}_{ii}\sim N(0,\eta)$ and $\tilde{W}_{ij}\sim N(0,1)$ for all $i\neq j$ , we further have

[TABLE]

where $\bar{\Phi}(x)=(2\pi)^{-1/2}\int_{x}^{\infty}e^{-t^{2}/2}\,dt$ for $x\in\mathbb{R}.$ Similar to (4.38),

[TABLE]

From (4.26), $\log\bar{\Phi}(x)=-\frac{x^{2}}{2}-\log(x)-\log\sqrt{2\pi}+o(1)$ as $x\to\infty$ . Then,

[TABLE]

where

[TABLE]

Notice

[TABLE]

Similar to (4.28), we obtain that $R_{m,p}=O(m^{2}\log\log p)$ . Thus,

[TABLE]

By the same argument as obtaining (4.31), we see

[TABLE]

In particular the above goes to infinity as $n\to\infty$ . By (4.26) and (4.130),

[TABLE]

The right hand side above without the term “ $O(m^{2}\log\log p)$ ” is identical to

[TABLE]

The above two assertions yield

[TABLE]

Let us take a closer look at the above display. For $1\leq l\leq m-1$ ,

[TABLE]

Thus, for $1\leq l\leq m-1$ and $0<\varepsilon<1$ ,

[TABLE]

Combining this with (4.138), we obtain that

[TABLE]

uniformly for $1\leq l\leq m-1$ . Define $\tilde{Q}_{p}=\sum_{S\subset\{1,...,p\},|S|=m}{\mathbf{1}}_{\tilde{A}_{S}}$ . From (4.135),

[TABLE]

as $n\to\infty$ . Moreover, we see from (4.135) and (4.138) that

[TABLE]

By Lemma 5 and a similar argument to (4.37), we get

[TABLE]

This and the above two limits imply $\frac{Var(\tilde{Q}_{p})}{(\mathbb{E}\tilde{Q}_{p})^{2}}\to 0$ . As a result, $\lim_{p\to\infty}\mathbb{P}(\tilde{Q}_{p}=0)=0$ by (4.20). According to (4.14), if there exists $S_{0}\subset\{1,...,p\}$ such that $|S_{0}|=m$ and $\tilde{A}_{S_{0}}$ occurs, then

[TABLE]

Therefore,

[TABLE]

This implies (4.128). The proof is finished.

∎

Proof of Remark 3.

These results are direct consequences of the following lemma, whose proof is given in Appendix B. ∎

Lemma 13.

Let $\{Z_{p}\}_{p\geq 1}$ be a sequence of non-negative random variables. Consider the following statements.

(i)

$\lim\limits_{p\to\infty}\mathbb{E}\left[e^{\alpha Z_{p}}\mathbf{1}_{\{Z_{p}\geq\delta\}}\right]=0$ * for all $\alpha>0$ and $\delta>0$ .*

(ii)

$\lim\limits_{p\to\infty}\mathbb{E}(e^{\alpha Z_{p}})=1$ * for all $\alpha>0$ .*

(iii)

$\lim\limits_{p\to\infty}\mathbb{E}(Z_{p}^{\alpha})=0$ * for all $\alpha>0$ .*

(iv)

$\lim\limits_{p\to\infty}\mathbb{P}(Z_{p}\geq\delta)=0$ * for all $\delta>0$ .*

(v)

$\lim\limits_{p\to\infty}{\rm Var}(Z_{p})=0$ * for all $\alpha>0$ .*

Then, (i) $\Longleftrightarrow$ (ii) $\implies$ (iii) $\implies$ (iv) and (v). Here, “A $\Longleftrightarrow$ B” means two statements A and B are equivalent, and A $\implies$ B means statement A implies statement B.

Acknowledgment

The research of Tony Cai was supported in part by NSF Grant DMS-1712735 and NIH grants R01-GM129781 and R01-GM123056. Tiefeng Jiang is partially supported by NSF Grant DMS-1406279. Xiaoou Li is partially supported by NSF Grant DMS-1712657.

Appendix A Auxiliary results on Gamma functions

Recall the Gamma function $\Gamma(x)=\int_{0}^{\infty}t^{x-1}e^{-t}\,dt$ for $x>0.$

Lemma A.1.

Let

[TABLE]

If $m^{3}=o(n)$ , then $\log C({n,m})-\log c_{m}=o(1)$ as $n\to\infty$ .

Proof of Lemma A.1.

Easily,

[TABLE]

where

[TABLE]

Then

[TABLE]

Write

[TABLE]

By Lemma 5.1 from Jiang and Qi (2015), there exists a constant $C>0$ free of $x$ and $b$ such that

[TABLE]

for all $x\geq 10$ and $|b|\leq x/2$ . It is easy to see that

[TABLE]

where $C^{\prime}$ is a constant free of $m$ and $n$ . This implies that

[TABLE]

as $n\to\infty$ . Write

[TABLE]

Easily, $\log(1-\frac{j}{n})=-\frac{j}{n}+O(\frac{m^{2}}{n^{2}})$ as $n\to\infty$ uniformly for all $1\leq j\leq m.$ Hence,

[TABLE]

as $n\to\infty$ . In summary,

[TABLE]

as $n\to\infty.$ On the other hand, by the Stirling formula,

[TABLE]

as $n\to\infty.$ From (A.1) and the above two assertions we see

[TABLE]

as $n\to\infty$ , which together with (A.2) proves the lemma. ∎

Lemma A.2.

Let

[TABLE]

If $m^{3}=o(n)$ , then $\log A(m,n)-\log B(m)\to 0$ as $n\to\infty$ .

Proof of Lemma A.2.

Observe

[TABLE]

By definition,

[TABLE]

From (A.1) and (A.3), we see that

[TABLE]

By comparing this identity with (A.4), we conclude $\log\frac{A(m,n)}{B(m)}\to 0$ . ∎

Appendix B Proofs of lemmas

The following result is based on a slight modification of the second inequality of (4.8) from Jiang and Li, (2015) and a care taken by noticing that the version of the Wigner matrix here is $\sqrt{2}$ times of the version there. It will enable us to bound the last probability.

Proof of Lemma 2.

Review the proof of Lemma 4.1 from Jiang and Li, (2015). Notice that the version of the Wigner matrix here is $\sqrt{2}$ times of the version there. From the second inequality in (4.8) in the paper, there is a positive constant $C$ not depending on $m$ such that

[TABLE]

for all $x>4\sqrt{m}$ and all $m\geq 2$ . Since the right hand side above is increasing in $C$ , without loss of generality, we assume $C>1$ . It is easy to see $\log C\leq Cm\leq Cm\log x$ under the assumption that $x>4\sqrt{m}$ . By taking $\kappa=3C$ we get the desired conclusion. ∎

Proof of Lemma 3.

Note that

[TABLE]

and $\frac{p-l}{m-l}\geq\frac{p}{m}$ for $l\geq 0$ . Thus,

[TABLE]

On the other hand, by the Sterling formula,

[TABLE]

Therefore,

[TABLE]

Combining the two inequalities, we complete the proof. ∎

Proof of Lemma 4.

Write

[TABLE]

for $1\leq x\leq m-1.$ Obviously, $g(x)$ is a convex function. This leads to that $\max_{1\leq l\leq m-1}g(l)=g(1)\vee g(m-1).$ It is trivial to check that $g(1)\geq g(m-1)$ . The first identity is thus obtained. The second identity follows from the first one. ∎

Proof of Lemma 5.

By rearranging the terms, we have

[TABLE]

∎

Proof of Lemma 1.

Note that

[TABLE]

Use the Fubini Theorem to see

[TABLE]

∎

Proof of Lemma 6.

The technique to be used here is similar to that from Fey et al., (2008), where the large deviations for the extreme eigenvalues of Wishart matrices are developed. Thus we will omit the repetitive details and only state the main steps. To ease notation, we write $U_{n}=\frac{1}{n}W_{\{1,...,m\}}$ , and we use $\lambda_{m}(U_{n})$ to denote its smallest eigenvalue. The event $\big{\{}\frac{\lambda_{1}(W_{\{1,...,m\}})-n}{\sqrt{n}}\geq\sqrt{n}y\big{\}}$ is equal to $\{\lambda_{1}(U_{n})\geq 1+y\}$ and $\big{\{}\frac{\lambda_{m}(W_{\{1,...,m\}})-n}{\sqrt{n}}\leq-\sqrt{n}y\big{\}}$ is equal to $\{\lambda_{m}(U_{n})\leq 1-y\}$ . We start to bound $\mathbb{P}(\lambda_{m}(U_{n})\leq 1-y)$ . Since $\lambda_{m}(W_{\{1,...,m\}})\geq 0$ , we assume $y\in(0,1)$ without loss of generality.

Note that $\lambda_{m}(U_{n})=\min_{v:\|v\|=1}v^{\intercal}U_{n}v$ and the sphere $S^{m-1}=\{v\in\mathbb{R}^{m}:\,\|v\|=1\}$ can be covered by $\cup B(v^{(i)},d)$ for some $v^{(1)},...,v^{(N_{d})}\in S^{m-1}$ . Here, we use $B(v,d)$ to denote an open ball centered around $v$ with radius $d$ . It is straightforward to verify that for any $v\in S^{m-1}$ , there always exists $j\in\{1,...,N_{d}\}$ such that

[TABLE]

Therefore, by considering $\{\lambda_{1}(U_{n})\geq rm\}$ occurs or not, we have

[TABLE]

for all $r>0$ . We next analyze $N_{d}$ , $\mathbb{P}(v^{\intercal}U_{n}v\leq 1-y+2dmr)$ and $\mathbb{P}(\lambda_{1}(U_{n})\geq mr)$ separately.

We start with $N_{d}$ , which is the minimum number of balls with the radius $d$ required to cover $S^{m-1}$ . By a result from Rogers, (1963) we see

[TABLE]

for all $0<d<1/2$ and $m\geq 1$ . As a result,

[TABLE]

We proceed to an upper bound for $\mathbb{P}(v^{\intercal}U_{n}v\leq 1-y+2dmr)$ . Recall that $U_{n}=\frac{1}{n}X_{\cdot,[1,..,m]}^{\intercal}X_{\cdot,[1,..,m]}$ , where we use the notation

[TABLE]

Thus,

[TABLE]

where we define $S_{v,i}=\sum_{l=1}^{m}X_{il}v_{l}$ . Review $\|v\|=1$ . Since $x_{ij}$ ’s are standard normals, so are $\{S_{v,i};\,1\leq i\leq n\}$ . By the large deviation bound for the sum of i.i.d. random variables [see, e.g., page 27 from Dembo and Zeitouni (1998)],

[TABLE]

where $A\subset\mathbb{R}$ is any Borel set and $I(x)=\sup_{t\in\mathbb{R}}\{tx-\log\mathbb{E}e^{tN(0,1)^{2}}\}$ . Since $\log\mathbb{E}(e^{tN(0,1)^{2}})=-\frac{1}{2}\log(1-2t)$ for $t<1/2,$ it is easy to check that

[TABLE]

Observe that $I(x)$ is decreasing for $x\leq 1$ . This together with (B.13) and (B.14) implies that

[TABLE]

for all $y>2dmr.$

Now we estimate $\mathbb{P}(\lambda_{1}(U_{n})\geq r)$ appeared in (B.9). Noting that $U_{n}$ is semi-positive definite, we have $\lambda_{1}(U_{n})\leq\text{trace}(U_{n})\leq\frac{1}{n}\sum_{i=1}^{n}\sum_{l=1}^{m}x_{il}^{2}$ , and hence

[TABLE]

for $r\geq 1$ by (B.14). Combining (B.11), (B.15), and (B.16), we obtain from (B.9) that

[TABLE]

for $y>2dmr$ and $r\geq 1$ . This confirms (4.60).

To get (4.59), just notice $\lambda_{1}(U_{n})=\max_{v:\|v\|=1}v^{\intercal}U_{n}v$ . From (B.8) and (B.9) we see that

[TABLE]

Then (4.59) follows from similar arguments to (B.15)-(B.17). ∎

Proof of Lemma 7.

Review Assumption 1 in (3.1). We start with the analysis of (4.63). Here, we consider two sub-cases: $t\leq\frac{mn}{80\alpha}$ and $t>\frac{mn}{80\alpha}$ . For $t\leq\frac{mn}{80\alpha}$ , we have $r=\max(2,1+\frac{80\alpha t}{mn})=2$ and

[TABLE]

Trivially $\frac{1-\log 2}{2}-\frac{1}{80}=0.14\cdots>\frac{1}{10}$ . Note that $\log\frac{mn}{80\alpha}=o(mn)$ and $m\log p=o(mn)$ under Assumption 1 in (3.1). It follows that

[TABLE]

This implies

[TABLE]

Now we consider another sub-case where $t\geq\frac{mn}{80\alpha}$ . For this case, $r=1+\frac{80\alpha t}{mn}$ . It is not hard to see

[TABLE]

for $r\geq 2$ . Apparently, $m\log p\leq\alpha t$ for $t\geq\frac{mn}{80\alpha}$ as $n$ is sufficiently large. It follows that

[TABLE]

This implies

[TABLE]

Combining (B.20) and (B.22), we obtain

[TABLE]

This completes the proof of (4.63). We next show (4.62).

Recall $z=\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}\geq\frac{t}{2\sqrt{n}}$ . Obviously, $z>\frac{\delta}{200}$ as $t>\frac{\delta\sqrt{n}}{100}$ . It is elementary to check there exists $\varepsilon>0$ such that $x-\log(1+x)\geq\varepsilon x$ for all $x>\frac{\delta}{200}$ . Hence,

[TABLE]

for all $t>\frac{\delta\sqrt{n}}{100}$ . Reviewing $r=\max(2,1+\frac{80\alpha t}{mn})$ and $d=\min(\frac{1}{2},\frac{t}{4m\sqrt{n}r})$ , we have

[TABLE]

since $0<\log(1+x)<x$ for all $x>0$ . Furthermore, $\alpha t+2\log t\leq 2\alpha t$ as $t$ is sufficiently large, and $m\log p=o(mn)$ by Assumption 1 in (3.1). Consequently,

[TABLE]

We obtain (4.62) and the proof is completed. ∎

Proof of Lemma 8.

It is trivial to show that

[TABLE]

for $0\leq z\leq 1$ . Recall the assumption that $\delta\in(0,1)$ . Then $z=\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{t}{2\sqrt{n}}\leq\frac{2\sqrt{m\log p}}{\sqrt{n}}+\frac{\delta}{200}\leq 1$ as $n$ is sufficiently large. Now,

[TABLE]

By (B.27), we see

[TABLE]

Now, reviewing $d=\frac{t}{8m\sqrt{n}}$ and $t\geq\delta$ , we have $m\log(1/d)=O(m\log m+m\log n)=O(m\log n)$ . This joint with (B.29) implies that

[TABLE]

Since $t\geq(m/\log p)^{1/2}{\xi_{p}}\log n$ , we know $O(m\log n)=o(t\sqrt{m\log p})$ uniformly in $t$ . Thus,

[TABLE]

as $n$ is sufficiently large. We then get (4.66). Evidently,

[TABLE]

as $n\to\infty.$ This implies that

[TABLE]

The assertion (4.67) is verified. ∎

Proof of Lemma 9.

Review the notation $W_{\{1,...,m\}}$ above (2.1) with $S=\{1,\cdots,m\}.$ Let $\mu_{1}>...>\mu_{m}$ be the eigenvalues of $W_{\{1,...,m\}}$ . According to James, (1964) or Muirhead, (2009), $\mu=(\mu_{1},...,\mu_{m})$ has density function

[TABLE]

where $c_{m,n}=m!2^{-nm/2}\prod_{j=1}^{m}\frac{\Gamma(3/2)}{\Gamma(1+(j/2))\Gamma((n-m+j)/2)}$ . In addition, $\lambda=(\lambda_{1},...,\lambda_{m})$ has density

[TABLE]

see, for example, Chapter 17 from Mehta, (2004). Note that $\nu_{i}=(\mu_{i}-n)/\sqrt{n}$ , so we can write down the expression of $g_{n,m}$ as follows.

[TABLE]

for $v_{1}>v_{2}>...>v_{m}>-\sqrt{n}$ and $g_{n,m}(v)=0$ , otherwise. Denote

[TABLE]

Then,

[TABLE]

for $v_{1}>v_{2}>...>v_{m}>-\sqrt{n}$ . By Lemma A.1 in Appendix A,

[TABLE]

for $v_{1}>v_{2}>...>v_{m}>-\sqrt{n}$ . By the Taylor expansion,

[TABLE]

for all $|x|<1$ . Therefore,

[TABLE]

for $|x|<\frac{2}{3}.$ Writing $\frac{n-m-1}{2}=\frac{n}{2}-\frac{m+1}{2}$ , it is easy to check

[TABLE]

Combining (B.38)-(B.40), and noting that $|v_{i}|\leq\|v\|_{\infty}$ for all $i$ , we get

[TABLE]

provided $\|v\|_{\infty}\leq\frac{2}{3}\sqrt{n}$ , where $\varpi_{m,n}$ is the error term and it is controlled by

[TABLE]

By using the trivial bound that $|v_{i}|\leq\|v\|_{\infty}$ for each $i$ , we obtain the desired conclusion from the above two assertions. ∎

Proof of Lemma 10.

According to the density function of the Wishart distribution [see, e.g., Anderson, (1962) or Muirhead, (2009)], the density function for $W_{\{1,...,m\}}$ is

[TABLE]

for every $m\times m$ positive definite matrix $V$ , where $\Gamma_{m}(\cdot)$ is the multivariate gamma function defined by

[TABLE]

and we write $|V|$ for the determinant of a matrix $V$ . It is easy to see that the density function for $\frac{W_{\{1,...,m\}}-nI_{m}}{\sqrt{n}}$ is given by

[TABLE]

for every $m\times m$ matrix $w$ such that $w+\sqrt{n}I_{m}$ is positive definite. Simplifying the above display, we further have

[TABLE]

where $A(m,n)=n^{m(m+1)/4+m(n-m-1)/2}e^{-nm/2}2^{-\frac{nm}{2}}/\Gamma_{m}(n/2)$ . On the other hand,

[TABLE]

where $B(m)=(2\pi)^{-m(m+1)/2}2^{-m/2}$ ; see, for instance, Mehta, (2004). Now we consider

[TABLE]

for every $\lambda_{i}>-\sqrt{n}$ and $i=1,\cdots,m$ by Lemma A.2 in Appendix A, where $\lambda_{1},\cdots\lambda_{m}$ are the eigenvalues of $w$ . From (B.39) and (B.40),

[TABLE]

if $\max_{1\leq i\leq m}|\lambda_{i}|\leq\frac{2}{3}\sqrt{n}$ , where $\varepsilon_{m,n}$ is the error term satisfying

[TABLE]

In addition, $|\sum_{i=1}^{m}\lambda_{i}|\leq m\max_{1\leq i\leq m}|\lambda_{i}|=m\|w\|,$ and $\sum_{i=1}^{m}\lambda_{i}^{2}\leq m\max_{1\leq i\leq m}|\lambda_{i}|^{2}=m\|w\|^{2}_{2},$ The above three assertions lead to

[TABLE]

provided $\|w\|=\max_{1\leq i\leq m}|\lambda_{i}|\leq\frac{2}{3}\sqrt{n}$ . The proof is finished. ∎

Proof of Lemma 12.

Let $B(v_{1},\delta),..,B(v_{N},\delta)$ be $N$ balls centered around $v_{1},..,v_{N}$ , respectively, such that $\cup B(v_{i},\delta)$ covers the unit sphere $\{v\in\mathbb{R}^{m};\,\|v\|=1\}$ . Then, for any $r>0$ , by (B.8),

[TABLE]

According to the distribution of $\tilde{W}$ ,

[TABLE]

where $f(y):=f(y)=2+(\eta-2)\sum_{i=1}^{m}y_{i}^{4}$ for any $y=(y_{1},\cdots,y_{m})^{\intercal}\in\mathbb{R}^{m}$ . In fact, for any $y=(y_{1},\cdots,y_{m})^{\intercal}\in\mathbb{R}^{m}$ ,

[TABLE]

such that $\sigma_{y}^{2}$ is equal to

[TABLE]

by independence. Recall $\mathbb{P}(N(0,1)\geq x)\leq e^{-x^{2}/2}$ for all $x\geq 1$ . Thus, for $x-2r\delta>1$ , the first term on the right side of (B.50) is bounded by

[TABLE]

since $0\leq\eta\leq 2$ . Observe that

[TABLE]

Thus,

[TABLE]

if $x-2r\delta>1$ . Now turn to estimate the last probability in (B.50). Note that

[TABLE]

Note that $\sum_{i=1}^{m}\sum_{j=1}^{m}\tilde{W}^{2}_{ij}$ and $\eta Q_{1}+2Q_{2}$ have the same distribution, where $Q_{1}\sim\chi^{2}_{m}$ , $Q_{2}\sim\chi^{2}_{m(m-1)/2}$ and $Q_{1}$ and $Q_{2}$ are independent. Also $\eta Q_{1}+2Q_{2}\leq 2(Q_{1}+Q_{2})\sim 2\cdot\chi^{2}_{m(m+1)/2}$ Thus, the last probability in (B.50) is dominated by

[TABLE]

Notice $\frac{r^{2}}{m(m+1)}\geq 8$ under the given condition $r\geq 4m$ . Let $I(x)=\frac{1}{2}(x-1-\log x)$ for $x>0$ . It is easy to check that $I(8)=(7-\log 8)/2>2.4$ and that $I(x)=\frac{1}{2}(x-1-\log x)\geq\frac{1}{4}x$ as $x\geq 8.$ By (B.14), the last probability above is no more than

[TABLE]

Hence,

[TABLE]

Combining the above display with (B.50) and (B.54), we have

[TABLE]

The desired conclusion follows since $N\leq m^{1.5}(\log m)\delta^{-m}$ (Rogers,, 1963). ∎

Proof of Lemma 13.

(i) $\implies$ (ii): Easily,

[TABLE]

Taking $\limsup_{p\to\infty}$ on both sides and then letting $\delta\downarrow 0$ , we obtain

[TABLE]

On the other side, $\liminf_{p\to\infty}\mathbb{E}\big{[}e^{\alpha Z_{p}}\big{]}\geq 1$ since $Z_{p}\geq 0$ . Hence, $\lim_{p\to\infty}\mathbb{E}\big{[}e^{\alpha Z_{p}}\big{]}=1$ .

(ii) $\implies$ (i): For each $\beta>0$ , we know $\mathbf{1}_{\{Z_{p}\geq\delta\}}\leq e^{\beta(Z_{p}-\delta)}$ . Thus,

[TABLE]

Taking $\limsup_{p\to\infty}$ on both sides and then letting $\beta\to\infty$ , we have

[TABLE]

Thus, $\lim_{p\to\infty}\mathbb{E}[e^{\alpha Z_{p}}\mathbf{1}_{\{Z_{p}\geq\delta\}}]=0$ .

(ii) $\implies$ (iii): First, $\mathbb{E}[Z_{p}^{\alpha}]=\alpha\int_{0}^{\infty}x^{\alpha-1}P(Z_{p}\geq x)\,dx$ . By the Markov inequality, $P(Z_{p}\geq x)\leq e^{-\beta x}\mathbb{E}e^{\beta Z_{p}}$ for all $x>0$ and $\beta>0$ . It follows that

[TABLE]

for all $\beta>0$ . The conclusion then follows by first letting $p\to\infty$ and then sending $\beta\to\infty.$

(iii) $\implies$ (iv): This is a direct consequence of the Chebyshev inequality and the equality $\lim_{p\to\infty}\mathbb{E}(Z_{p})=0$ .

(iii) $\implies$ (v): Let $\alpha=2$ in (iii), then $\limsup_{p}{\rm Var}(Z_{p})\leq\lim_{p\to\infty}\mathbb{E}(Z_{p}^{2})=0.$

∎

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson, (1962) Anderson, T. W. (1962). An introduction to multivariate statistical analysis . Wiley New York.
2Arratia et al., (1990) Arratia, R., Goldstein, L., and Gordon, L. (1990). Poisson approximation and the chen-stein method. Statistical Science , pages 403–424.
3Bai, (1999) Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statistica Sinica , 9(3):611–662.
4Bai and Silverstein, (2010) Bai, Z. D. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices , volume 20. Springer.
5Bai and Yin, (1993) Bai, Z. D. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. The Annals of Probability , 21(3):1275–1294.
6Baraniuk et al., (2008) Baraniuk, R., Davenport, M., De Vore, R., and Wakin, M. (2008). A simple proof of the restricted isometry property for random matrices. Constructive Approximation , 28(3):253–263.
7Bryc et al., (2006) Bryc, W., Dembo, A., and Jiang, T. (2006). Spectral measure of large random hankel, markov and toeplitz matrices. The Annals of Probability , pages 1–38.
8Cai et al., (2013) Cai, T. T., Fan, J., and Jiang, T. (2013). Distributions of angles in random packing on spheres. The Journal of Machine Learning Research , 14(1):1837–1864.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Asymptotic Analysis for Extreme Eigenvalues of Principal Minors of Random Matrices

Abstract

1 Introduction

1.1 Construction of Compressed Sensing Matrices

1.2 Main results and organization of the paper

2 Problem settings

3 Main results

Theorem 1**.**

Remark 1**.**

Theorem 2**.**

Remark 2**.**

Remark 3**.**

3.1 Extensions

Corollary 1**.**

Theorem 3**.**

Remark 4**.**

3.2 Application to Construction of Compressed Sensing Matrices

4 Technical Proofs

4.1 The Strategy of the Proofs for Theorems 1 and 2

4.2 Proof of the results in Section 3

4.2.1 Proof of Theorem 2

Proposition 1**.**

Proposition 2**.**

Lemma 1**.**

Proof of Theorem 2.

Proof of Proposition 1.

Lemma 2**.**

Proof of Proposition 2.

Lemma 3**.**

Lemma 4**.**

Lemma 5**.**

4.2.2 Proof of Theorem 1

Proposition 3**.**

Proposition 4**.**

Proof of Theorem 1.

Proof of Proposition 3.

Lemma 6**.**

Lemma 7**.**

Lemma 8**.**

Lemma 9**.**

Proof of Proposition 4.

Lemma 10**.**

Lemma 11**.**

4.2.3 Proofs of Theorem 3 and Remark 3

Lemma 12**.**

Proof of Theorem 3.

Proof of Remark 3.

Lemma 13**.**

Acknowledgment

Appendix A Auxiliary results on Gamma functions

Lemma A.1**.**

Proof of Lemma A.1.

Lemma A.2**.**

Proof of Lemma A.2.

Appendix B Proofs of lemmas

Proof of Lemma 2.

Proof of Lemma 3.

Proof of Lemma 4.

Proof of Lemma 5.

Proof of Lemma 1.

Proof of Lemma 6.

Proof of Lemma 7.

Proof of Lemma 8.

Proof of Lemma 9.

Proof of Lemma 10.

Proof of Lemma 12.

Proof of Lemma 13.

Theorem 1.

Remark 1.

Theorem 2.

Remark 2.

Remark 3.

Corollary 1.

Theorem 3.

Remark 4.

Proposition 1.

Proposition 2.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Proposition 3.

Proposition 4.

Lemma 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

Lemma 13.

Lemma A.1.

Lemma A.2.