Convergence of eigenvector empirical spectral distribution of sample   covariance matrices

Haokai Xi; Fan Yang; and Jun Yin

arXiv:1705.03954·math.PR·August 19, 2020

Convergence of eigenvector empirical spectral distribution of sample covariance matrices

Haokai Xi, Fan Yang, and Jun Yin

PDF

TL;DR

This paper establishes improved convergence rates for the eigenvector empirical spectral distribution of sample covariance matrices to the deformed Marčenko-Pastur law, under weaker moment conditions and more general matrix models.

Contribution

It provides sharper convergence rate bounds for VESD to the deformed MP distribution, extending previous results to broader settings with weaker assumptions.

Findings

01

Expected VESD converges to deformed MP law at rate N^{-1+ε}.

02

Almost sure convergence rate of VESD is improved to N^{-1/2+ε}.

03

Results hold under finite 6th and 8th moment conditions, with general covariance matrices.

Abstract

The eigenvector empirical spectral distribution (VESD) is a useful tool in studying the limiting behavior of eigenvalues and eigenvectors of covariance matrices. In this paper, we study the convergence rate of the VESD of sample covariance matrices to the deformed Mar\v{c}enko-Pastur (MP) distribution. Consider sample covariance matrices of the form $Σ^{1/2} X X^{*} Σ^{1/2}$ , where $X = (x_{ij})$ is an $M \times N$ random matrix whose entries are independent random variables with mean zero and variance $N^{- 1}$ , and $Σ$ is a deterministic positive-definite matrix. We prove that the Kolmogorov distance between the expected VESD and the deformed MP distribution is bounded by $N^{- 1 + ϵ}$ for any fixed $ϵ > 0$ , provided that the entries $N x_{ij}$ have uniformly bounded 6th moments and $∣ N / M - 1∣ \geq τ$ for some constant $τ > 0$ . This result improves the previous…

Figures10

Click any figure to enlarge with its caption.

Equations522

E x_{ij} = 0, E ∣ x_{ij} ∣^{2} = N^{- 1}, 1 \leq i \leq M, 1 \leq j \leq N,

E x_{ij} = 0, E ∣ x_{ij} ∣^{2} = N^{- 1}, 1 \leq i \leq M, 1 \leq j \leq N,

E x_{ij}^{2} = 0, 1 \leq i \leq M, 1 \leq j \leq N .

E x_{ij}^{2} = 0, 1 \leq i \leq M, 1 \leq j \leq N .

∥ F_{X X^{*}}^{(M)} - F_{M P} ∥ := x sup ∣ F_{X X^{*}}^{(M)} (x) - F_{M P} (x) ∣.

∥ F_{X X^{*}}^{(M)} - F_{M P} ∥ := x sup ∣ F_{X X^{*}}^{(M)} (x) - F_{M P} (x) ∣.

Σ^{1/2} X = 1 \leq k \leq N \land M \sum λ_{k} ξ_{k} ζ_{k}^{*}

Σ^{1/2} X = 1 \leq k \leq N \land M \sum λ_{k} ξ_{k} ζ_{k}^{*}

λ_{1} \geq λ_{2} \geq \dots \geq λ_{N \land M} \geq 0 = λ_{N \land M + 1} = \dots = λ_{N \lor M},

λ_{1} \geq λ_{2} \geq \dots \geq λ_{N \land M} \geq 0 = λ_{N \land M + 1} = \dots = λ_{N \lor M},

F_{Q_{1}, u}^{(M)} (x) = k = 1 \sum M ∣ ⟨ ξ_{k}, u ⟩ ∣^{2} 1_{{λ_{k} \leq x}}, F_{Q_{2}, v}^{(N)} (x) = k = 1 \sum N ∣ ⟨ ζ_{k}, v ⟩ ∣^{2} 1_{{λ_{k} \leq x}} .

F_{Q_{1}, u}^{(M)} (x) = k = 1 \sum M ∣ ⟨ ξ_{k}, u ⟩ ∣^{2} 1_{{λ_{k} \leq x}}, F_{Q_{2}, v}^{(N)} (x) = k = 1 \sum N ∣ ⟨ ζ_{k}, v ⟩ ∣^{2} 1_{{λ_{k} \leq x}} .

X_{M, u} (t) := \frac{M}{2} k = 1 \sum ⌊ M t ⌋ (∣ ⟨ ξ_{k}, u ⟩ ∣^{2} - M^{- 1}) .

X_{M, u} (t) := \frac{M}{2} k = 1 \sum ⌊ M t ⌋ (∣ ⟨ ξ_{k}, u ⟩ ∣^{2} - M^{- 1}) .

X_{M, u} (F_{X X^{*}}^{(M)} (x)) = \frac{M}{2} (F_{X X^{*}, u}^{(M)} (x) - F_{X X^{*}}^{(M)} (x)) .

X_{M, u} (F_{X X^{*}}^{(M)} (x)) = \frac{M}{2} (F_{X X^{*}, u}^{(M)} (x) - F_{X X^{*}}^{(M)} (x)) .

(⟨ ξ_{i_{1}}, u ⟩, \dots, ⟨ ξ_{i_{m}}, u ⟩) \sim N_{m} (0, ⟨ u, D_{i_{1}} u ⟩, \dots, ⟨ u, D_{i_{m}} u ⟩) .

(⟨ ξ_{i_{1}}, u ⟩, \dots, ⟨ ξ_{i_{m}}, u ⟩) \sim N_{m} (0, ⟨ u, D_{i_{1}} u ⟩, \dots, ⟨ u, D_{i_{m}} u ⟩) .

X_{M, u}^{Σ} (t) := \frac{M}{2} k = 1 \sum ⌊ M t ⌋ (∣ ⟨ ξ_{k}, u ⟩ ∣^{2} - ⟨ u, D_{k} u ⟩) .

X_{M, u}^{Σ} (t) := \frac{M}{2} k = 1 \sum ⌊ M t ⌋ (∣ ⟨ ξ_{k}, u ⟩ ∣^{2} - ⟨ u, D_{k} u ⟩) .

B_{u}^{Σ} (t) := \int_{0}^{t} ⟨ u, F_{1 c} u ⟩ \circ F_{1 c}^{- 1} d B_{t} conditioning on B_{u}^{Σ} (1) = 0,

B_{u}^{Σ} (t) := \int_{0}^{t} ⟨ u, F_{1 c} u ⟩ \circ F_{1 c}^{- 1} d B_{t} conditioning on B_{u}^{Σ} (1) = 0,

\frac{2}{M} X_{M, u}^{Σ} (F_{Q_{1}} (x)) = F_{Q_{1}, u} (x) - ⟨ u, F_{1 c} (x) u ⟩ + O (N^{- 1 + ϵ})

\frac{2}{M} X_{M, u}^{Σ} (F_{Q_{1}} (x)) = F_{Q_{1}, u} (x) - ⟨ u, F_{1 c} (x) u ⟩ + O (N^{- 1 + ϵ})

π \equiv π_{M} := M^{- 1} 1 \leq i \leq M \sum δ_{σ_{i}} .

π \equiv π_{M} := M^{- 1} 1 \leq i \leq M \sum δ_{σ_{i}} .

σ_{1} \leq τ^{- 1} and π_{M} ([0, τ]) \leq 1 - τ for all M .

σ_{1} \leq τ^{- 1} and π_{M} ([0, τ]) \leq 1 - τ for all M .

m_{2 c} (z) \equiv m_{2 c}^{(N)} (z) := \int_{R} \frac{d F _{2 c}^{(N)} ( x )}{x - z}, z = E + i η \in C_{+} .

m_{2 c} (z) \equiv m_{2 c}^{(N)} (z) := \int_{R} \frac{d F _{2 c}^{(N)} ( x )}{x - z}, z = E + i η \in C_{+} .

\frac{1}{m _{2 c} ( z )} = - z + d_{N}^{- 1} \int \frac{t}{1 + m _{2 c} ( z ) t} π (d t),

\frac{1}{m _{2 c} ( z )} = - z + d_{N}^{- 1} \int \frac{t}{1 + m _{2 c} ( z ) t} π (d t),

ρ_{2 c} (E) = π^{- 1} η ↓ 0 lim Im m_{2 c} (E + i η) .

ρ_{2 c} (E) = π^{- 1} η ↓ 0 lim Im m_{2 c} (E + i η) .

F_{1 c}^{(M)} = d_{N} F_{2 c}^{(N)} + (1 - d_{N}) 1_{[0, \infty)} .

F_{1 c}^{(M)} = d_{N} F_{2 c}^{(N)} + (1 - d_{N}) 1_{[0, \infty)} .

supp ρ_{2 c} \cap (0, \infty) = k = 1 ⋃ L [a_{2 k}, a_{2 k - 1}] \cap (0, \infty),

supp ρ_{2 c} \cap (0, \infty) = k = 1 ⋃ L [a_{2 k}, a_{2 k - 1}] \cap (0, \infty),

N_{k} := l : 2 l \leq k \sum N \int_{a_{2 l}}^{a_{2 l - 1}} ρ_{2 c} (x) d x .

N_{k} := l : 2 l \leq k \sum N \int_{a_{2 l}}^{a_{2 l - 1}} ρ_{2 c} (x) d x .

1 - F_{2 c} (γ_{j}) = \frac{j - 1/2}{N}, 1 \leq j \leq K,

1 - F_{2 c} (γ_{j}) = \frac{j - 1/2}{N}, 1 \leq j \leq K,

a_{k} \geq τ, l \neq = k min ∣ a_{k} - a_{l} ∣ \geq τ, i min ∣1 + m_{2 c} (a_{k}) σ_{i} ∣ \geq τ,

a_{k} \geq τ, l \neq = k min ∣ a_{k} - a_{l} ∣ \geq τ, i min ∣1 + m_{2 c} (a_{k}) σ_{i} ∣ \geq τ,

m_{1 c, u} (z) := - ⟨ u, z^{- 1} (1 + m_{2 c} (z) Σ)^{- 1} u ⟩ .

m_{1 c, u} (z) := - ⟨ u, z^{- 1} (1 + m_{2 c} (z) Σ)^{- 1} u ⟩ .

m_{1 c, u} (z) = \int_{R} \frac{d F _{1 c, u} ( x )}{x - z} = ⟨ u, \int_{R} \frac{d F _{1 c} ( x )}{x - z} u ⟩ .

m_{1 c, u} (z) = \int_{R} \frac{d F _{1 c, u} ( x )}{x - z} = ⟨ u, \int_{R} \frac{d F _{1 c} ( x )}{x - z} u ⟩ .

∣ E x_{ij} ∣

∣ E x_{ij} ∣

E ∣ x_{ij} ∣^{2} - N^{- 1}

E x_{ij}^{2}

E ∣ x_{ij} ∣^{4}

1 \leq i \leq M, 1 \leq j \leq N max ∣ x_{ij} ∣ \leq C_{1} N^{- ϕ} .

1 \leq i \leq M, 1 \leq j \leq N max ∣ x_{ij} ∣ \leq C_{1} N^{- ϕ} .

∥ E F_{Q_{1}, u}^{(M)} - F_{1 c, u}^{(M)} ∥ + ∥ E F_{Q_{2}, v}^{(N)} - F_{2 c}^{(N)} ∥ \leq N^{- 1 + ϵ}

∥ E F_{Q_{1}, u}^{(M)} - F_{1 c, u}^{(M)} ∥ + ∥ E F_{Q_{2}, v}^{(N)} - F_{2 c}^{(N)} ∥ \leq N^{- 1 + ϵ}

P (∥ F_{Q_{1}, u}^{(M)} - F_{1 c, u}^{(M)} ∥ + ∥ F_{Q_{2}, v}^{(N)} - F_{2 c}^{(N)} ∥ \geq N^{- a + ϵ}) \leq N^{- D} .

P (∥ F_{Q_{1}, u}^{(M)} - F_{1 c, u}^{(M)} ∥ + ∥ F_{Q_{2}, v}^{(N)} - F_{2 c}^{(N)} ∥ \geq N^{- a + ϵ}) \leq N^{- D} .

s \to \infty lim sup s^{a} i, j max P (∣ N x_{ij} ∣ \geq s) \leq A

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Convergence of eigenvector empirical spectral distribution of sample covariance matrices

Haokai Xi label=e1][email protected] [

Fan Yang label=e2][email protected] [

Jun Yin label=e3][email protected] [ University of Wisconsin-Madison\thanksmarkm1 and University of California, Los Angeles\thanksmarkm2

Department of Mathematics

University of California, Los Angeles,

Los Angeles, CA 90095,

USA

E-mail: e3

Department of Mathematics

University of Wisconsin-Madison

Madison, WI 53706

USA

Abstract

The eigenvector empirical spectral distribution (VESD) is a useful tool in studying the limiting behavior of eigenvalues and eigenvectors of covariance matrices. In this paper, we study the convergence rate of the VESD of sample covariance matrices to the deformed Marčenko-Pastur (MP) distribution. Consider sample covariance matrices of the form $\Sigma^{1/2}XX^{*}\Sigma^{1/2}$ , where $X=(x_{ij})$ is an $M\times N$ random matrix whose entries are independent random variables with mean zero and variance $N^{-1}$ , and $\Sigma$ is a deterministic positive-definite matrix. We prove that the Kolmogorov distance between the expected VESD and the deformed MP distribution is bounded by $N^{-1+\epsilon}$ for any fixed $\epsilon>0$ , provided that the entries $\sqrt{N}x_{ij}$ have uniformly bounded 6th moments and $|N/M-1|\geq\tau$ for some constant $\tau>0$ . This result improves the previous one obtained in [52], which gave the convergence rate $O(N^{-1/2})$ assuming $i.i.d.$ $X$ entries, bounded 10th moment, $\Sigma=I$ and $M<N$ . Moreover, we also prove that under the finite $8$ th moment assumption, the convergence rate of the VESD is $O(N^{-1/2+\epsilon})$ almost surely for any fixed $\epsilon>0$ , which improves the previous bound $N^{-1/4+\epsilon}$ in [52].

15B52,

62E20,

62H99,

Sample covariance matrix,

Empirical spectral distribution,

Eigenvector empirical spectral distribution,

Marčenko-Pastur distribution,

keywords:

[class=MSC]

keywords:

\startlocaldefs\endlocaldefs

,

and

T1Supported by NSF Career Grant DMS-1552192 and Sloan fellowship.

1 Introduction and main results

Sample covariance matrices are fundamental objects in multivariate statistics. The population covariance matrix of a centered random vector $\mathbf{y}\in\mathbb{R}^{M}$ is $\Sigma=\mathbb{E}\mathbf{y}\mathbf{y}^{*}$ . Given $N$ independent samples $(\mathbf{y}_{1},\cdots,\mathbf{y}_{N})$ of $\mathbf{y}$ , the sample covariance matrix $Q:=N^{-1}\sum_{i}\mathbf{y}_{i}\mathbf{y}_{i}^{*}$ is the simplest estimator for $\Sigma$ . In fact, if $M$ is fixed, then $Q$ converges almost surely to $\Sigma$ as $N\to\infty$ . However, in many modern applications, the advance of technology has led to high dimensional data where $M$ is comparable to or even larger than $N$ . In this setting, $\Sigma$ cannot be estimated through $Q$ directly, but some properties of $\Sigma$ can be inferred from the eigenvalue and eigenvector statistics of $Q$ . The large dimensional covariance matrices have more and more applications in various fields, such as statistics [13, 26, 27, 28], economics [40] and population genetics [41].

In this paper, we consider sample covariance matrices of the form $Q_{1}:=\Sigma^{1/2}XX^{*}\Sigma^{1/2}$ , where $X=(x_{ij})$ is an $M\times N$ real or complex data matrix whose entries are independent (but not necessarily identically distributed) random variables satisfying

[TABLE]

and the population covariance matrix $\Sigma:=\text{diag}(\sigma_{1},\sigma_{2},\ldots,\sigma_{M})$ , $\sigma_{1}\geq\cdots\geq\sigma_{M}\geq 0$ , is a deterministic positive-definite matrix. If the entries of $X$ are complex, then we assume in addition that

[TABLE]

Define the aspect ratio $d_{N}:={N}/{M}.$ We are interested in the high dimensional case with $\lim_{N\rightarrow\infty}d_{N}=d\in(0,\infty)$ . We will also consider the $N\times N$ matrix $Q_{2}:=X^{*}\Sigma X$ , which share the same nonzero eigenvalues with $Q_{1}$ .

A simple but important example is the sample covariance matrix with $\Sigma=\sigma^{2}I$ (i.e. the null case). In applications of spectral analysis of large dimensional random matrices, one important problem is the convergence rate of the empirical spectral distributions (ESD). It is well-known that the ESD $F^{(M)}_{XX^{*}}$ of $XX^{*}$ converges weakly to the Marčenko-Pastur (MP) law $F_{MP}$ [36]. One way to measure the convergence rate of the ESD is to use the Kolmogorov distance

[TABLE]

The convergence rate for sample covariance matrices was first established in [2], and later improved in [23] to $O(N^{-1/2})$ in probability under the finite 8th moment condition. In [44], the authors proved an almost optimal bound that $\|F^{(M)}_{XX^{*}}-F_{MP}\|=O(N^{-1+\epsilon})$ with high probability for any fixed $\epsilon>0$ under the sub-exponential decay assumption.

The research on the asymptotic properties of eigenvectors of large dimensional random matrices is generally harder and much less developed. However, the eigenvectors play an important role in high dimensional statistics. In particular, the principal component analysis (PCA) is now favorably recognized as a powerful technique for dimensionality reduction, and the eigenvectors corresponding to the largest eigenvalues are the directions of the principal components. The earlier work on the properties of eigenvectors goes back to Anderson [1], where the author proved that the eigenvectors of the Wishart matrix are asymptotically normal and isotropic when $M$ is fixed and $N\to\infty$ . For the high dimensional case, Johnstone [27] proposed the spiked model to test the existence of principal components. Then Paul [42] studied the directions of eigenvectors corresponding to spiked eigenvalues. In [35], Ma proposed an iterative thresholding approach to estimate sparse principal subspaces in the setting of a high-dimensional spiked covariance model. Using a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem, [11] established the optimal rates of convergence for estimating the principal subspace for a large class of spiked covariance matrices. One can see the references in [11, 35] for more literatures on sparse PCA and spiked covariance matrices.

For the test of the existence of spiked eigenvalues, we first need to study the properties of the eigenmatrices in the null case. If $\Sigma=\sigma^{2}I$ , then the eigenmatrix is expected to be asymptotically Haar distributed (i.e. uniformly distributed over the unitary group). However, formulating the terminology “asymptotically Haar distributed” is far from trivial since the dimension $M$ is increasing. Following the approach in [46, 47, 3, 51, 52], we will use the eigenvector empirical spectral distribution (VESD) to characterize the asymptotical Haar property. Suppose

[TABLE]

is a singular value decomposition of $\Sigma^{1/2}X$ , where

[TABLE]

$\{\xi_{k}\}_{k=1}^{M}$ are the left-singular vectors, and $\{\zeta_{k}\}_{k=1}^{N}$ are the right-singular vectors. Then for deterministic unit vectors $\mathbf{u}\in\mathbb{C}^{M}$ and $\mathbf{v}\in\mathbb{C}^{N}$ , we define the VESD of $Q_{1,2}$ as

[TABLE]

Now we apply the above formulations to the null case. Adopting the ideas of [46, 47], we define the stochastic process as

[TABLE]

If the eigenmatrix of $XX^{*}$ is Haar distributed, then the vector $\mathbf{y}:=(\langle\xi_{k},\mathbf{u}\rangle)_{k=1}^{M}$ is uniformly distributed over the unit sphere, and $X_{M,\mathbf{u}}(t)$ would converge to a Brownian bridge by Donsker’s theorem. Thus the convergence of $X_{M,\mathbf{u}}$ to a Brownian bridge characterizes the asymptotical Haar property of the eigenmatrix. For convenience, we can consider the time transformation

[TABLE]

Thus the problem is reduced to the study of the difference between the VESD and the ESD. It was already proved in [3, 9] that $F^{(M)}_{XX^{*},\mathbf{u}}$ also converges weakly to the MP law for any sequence of unit vectors $\mathbf{u}\in\mathbb{R}^{M}$ . On the other hand, compared with ESD, much less has been known about the convergence rate of the VESD. The best result so far was obtained in [52], where the authors proved that if $d_{N}>1$ and the entries of $X$ are $i.i.d.$ centered random variables, then $\|\mathbb{E}F^{(M)}_{XX^{*},\mathbf{u}}-F_{MP}\|=O(N^{-1/2})$ under the finite 10th moment assumption, and $\|F^{(M)}_{XX^{*},\mathbf{u}}-F_{MP}\|=O(N^{-1/4+\epsilon})$ almost surely under the finite 8th moment assumption. However, we find that both of these bounds are far away from being optimal, and can be improved with a different method. This is one of the purposes of this paper.

We will also extend the above formulation to include sample covariance matrices with general population $\Sigma$ . For a non-scalar $\Sigma$ , the eigenmatrix of $Q_{1}$ is not asymptotically Haar distributed anymore. For its distribution, we conjecture that the eigenvectors of $Q_{1}$ are asymptotically independent, and each $\xi_{k}$ is asymptotically normal with covariance matrix given by some $\mathbf{D}_{k}$ . In fact, our results in this paper suggest that $\mathbf{D}_{k}$ takes the form $\mathbf{F}_{1c}(\gamma_{k})-\mathbf{F}_{1c}(\gamma_{k+1})$ , where $\gamma_{k}$ is defined in (1.15) to denote the classical location for $\lambda_{k}$ , and $\mathbf{F}_{1c}$ is a matrix-valued function defined in (1.18) with the property that $\langle\mathbf{u},\mathbf{F}_{1c}\mathbf{u}\rangle$ is the asymptotic distribution of the VESD $F_{Q_{1},\mathbf{u}}$ for any $\mathbf{u}\in\mathbb{C}^{M}$ . Again, since the dimension $M$ increases to infinity, the above property is hard to formulate. One way is to consider the finite-dimensional restriction in the following sense: given $m\in\mathbb{N}$ , for any fixed unit vector $\mathbf{u}\in\mathbb{C}^{M}$ and $\{i_{1},\cdots,i_{m}\}\subseteq\{1,\cdots,N\wedge M\}$ , we should have asymptotically

[TABLE]

(In fact, for a nice choice of $\Sigma$ in the sense of Definition 1.2, $\langle\mathbf{u},\mathbf{D}_{k}\mathbf{u}\rangle$ is typically of order $N^{-1}$ .) We can also adopt the approach as above, that is to investigate the stochastic process

[TABLE]

If $M<N$ , we conjecture that $X^{\Sigma}_{M,\mathbf{u}}(t)$ converges to the following Gaussian process for $0\leq t\leq 1$ :

[TABLE]

where $B_{t}$ is a standard Brownian motion, $F_{1c}$ is the asymptotic ESD of $Q_{1}$ defined in (1.12), and $F_{1c}^{-1}$ denotes the quantile function. As before, we can study the process (1.6) through the time transformaton $X^{\Sigma}_{M,\mathbf{u}}(F_{Q_{1}}(x))$ , where $F_{Q_{1}}$ is the ESD of $Q_{1}$ . Due to the rigidity of eigenvalues (see Theorem 3.7), we have for all $x$ ,

[TABLE]

with very high proability for any fixed $\epsilon>0$ . Thus we need to study the convergence rate of $F_{Q_{1},\mathbf{u}}$ to $\langle\mathbf{u},\mathbf{F}_{1c}\mathbf{u}\rangle$ , and this is our main goal. In fact, we will prove that the convergence rate of $\mathbb{E}F_{Q_{1},\mathbf{u}}$ is $O(N^{-1+\epsilon})$ for any fixed $\epsilon>0$ , which shows that the limiting process is centered, and the convergence rate of $F_{Q_{1},\mathbf{u}}$ is $O(N^{-1/2+\epsilon})$ , which partially verify the $\sqrt{M}$ scaling.

1.1 Main results

We consider sample covariance matrices with a general diagonal $\Sigma$ , whose empirical spectral distribution is denoted by

[TABLE]

We assume that there exists a small constant $\tau>0$ such that

[TABLE]

The first condition means that the operator norm of $\Sigma$ is bounded, and the second condition means that the spectrum of $\Sigma$ cannot concentrate at zero. If $\pi_{M}$ converges weakly to some distribution $\hat{\pi}$ as $M\to\infty$ , then it was shown in [36] that the ESD of $Q_{2}$ converges in probability to some deterministic distribution, which is called the deformed Marčenko-Pastur law. For any $N$ , we describe the deformed MP law $F_{2c}^{(N)}$ through its Stieltjes transform

[TABLE]

We define $m_{2c}$ as the unique solution to the self-consistent equation

[TABLE]

subject to the conditions that ${\rm{Im}}\,m_{2c}(z)\geq 0$ and ${\rm{Im}}\,zm_{2c}(z)\geq 0$ for $z\in\mathbb{C}_{+}$ . It is well known that the functional equation (1.10) has a unique solution that is uniformly bounded on $\mathbb{C}_{+}$ under the assumption (1.9) [36]. Letting $\eta\downarrow 0$ , we can recover the asymptotic eigenvalue density $\rho_{2c}$ (which further gives $F_{2c}^{(N)}$ ) with the inverse formula

[TABLE]

Since $Q_{1}$ share the same nonzero eigenvalues with $Q_{2}$ and has $M-N$ more (or $N-M$ less) zero eigenvalues, we can obtain the asymptotic ESD for $Q_{1}$ :

[TABLE]

In the rest of this paper, we will often omit the super-indices $N$ and $M$ from our notations. The properties of $m_{2c}$ and $\rho_{2c}$ have been studied extensively; see e.g. [4, 5, 7, 24, 31, 45, 48]. The following Lemma 1.1 describes the basic structure of $\rho_{2c}$ . For its proof, one can refer to [31, Appendix A].

*Lemma 1.1** (Support of the deformed MP law).*

The density $\rho_{2c}$ is a disjoint union of connected components:

[TABLE]

where $L\in\mathbb{N}$ depends only on $\pi_{M}$ . Moreover, $N\int_{a_{2k}}^{a_{2k-1}}\rho_{2c}(x)dx$ is an integer for any $k=1,\ldots,L$ , which give the classical number of eigenvalues in the bulk component $[a_{2k},a_{2k-1}]$ .

We shall call $a_{k}$ the edges of $\rho_{2c}$ . For any $1\leq k\leq 2L$ , we define

[TABLE]

Then we define the classical locations $\gamma_{j}$ for the eigenvalues of $\mathcal{Q}_{2}$ through

[TABLE]

where we abbreviate $K:=M\wedge N$ . Note that (1.15) is well-defined since the $N_{k}$ ’s are integers. For convenience, we also denote $\gamma_{0}:=+\infty$ and $\gamma_{K+1}:=0$ .

To establish our main result, we need to make some extra assumptions on $\Sigma$ and $\pi_{M}$ , which takes the form of the following regularity conditions.

*Definition 1.2** (Regularity).*

(i) Fix a (small) constant $\tau>0$ . We say that the edge $a_{k}$ , $k=1,\ldots,2L$ , is $\tau$ -regular if

[TABLE]

where $m_{2c}(a_{k}):=m_{2c}(a_{k}+\operatorname{\rm{i}}0_{+})$ .

(ii) We say that the bulk components $[a_{2k},a_{2k-1}]$ is regular if for any fixed $\tau^{\prime}>0$ there exists a constant $c\equiv c_{\tau^{\prime}}>0$ such that the density of $\rho_{2c}$ in $[a_{2k}+\tau^{\prime},a_{2k-1}-\tau^{\prime}]$ is bounded from below by $c$ .

*Remark 1.3**.*

The edge regularity conditions (i) has previously appeared (in slightly different forms) in several works on sample covariance matrices [6, 15, 24, 31, 33, 39]. The condition (1.16) ensures a regular square-root behavior of $\rho_{2c}$ near $a_{k}$ . The bulk regularity condition (ii) was introduced in [31], and it imposes a lower bound on the density of eigenvalues away from the edges. These conditions are satisfied by quite general classes of $\Sigma$ ; see e.g. [31, Examples 2.8 and 2.9].

For any $\mathbf{u}\in\mathbb{C}^{M}$ and $z\in\mathbb{C}_{+}$ , we define

[TABLE]

Then $m_{1c,\mathbf{u}}$ is the Stieltjes transform of a distribution, which we shall denote by $F_{1c,\mathbf{u}}$ . From (1.17), it is easy to see that there exists a matrix-valued function $\mathbf{F}_{1c}$ depending on $\Sigma$ such that $F_{1c,\mathbf{u}}=\langle\mathbf{u},\mathbf{F}_{1c}\mathbf{u}\rangle$ , i.e., we have

[TABLE]

It was already proved in [31] that for any sequence of unit vectors $\mathbf{u}\in\mathbb{C}^{M}$ and $\mathbf{v}\in\mathbb{C}^{N}$ , $F^{(M)}_{Q_{1},\mathbf{u}}$ converges weakly to $F_{1c,\mathbf{u}}$ and $F^{(N)}_{Q_{2},\mathbf{v}}(x)$ converges weakly to $F_{2c}$ . Now we are ready to state our main results, i.e. Theorem 1.5. We first give the main assumptions.

*Assumption 1.4**.*

Fix a (small) constant $\tau>0$ .

(i) $X=(x_{ij})$ is an $M\times N$ real or complex matrix whose entries are independent random variables that satisfy the following moment conditions: there exist constants $C_{0},c_{0}>0$ such that for all $1\leq i\leq M$ , $1\leq j\leq N$ ,

[TABLE]

Note that (1.19)-(1.21) are slightly more general than (1.1) and (1.2).

(ii) $\tau\leq d_{N}\leq\tau^{-1}$ and $|d_{N}-1|\geq\tau$ .

(iii) $\Sigma=\text{diag}(\sigma_{1},\sigma_{2},\ldots,\sigma_{M})$ is a deterministic positive-definite matrix. We assume that (1.9) holds, all the edges of $\rho_{2c}$ are $\tau$ -regular, and all the bulk components of $\rho_{2c}$ are regular in the sense of Definition 1.2.

Theorem 1.5.

Suppose $d_{N}$ , $X$ and $\Sigma$ satisfy the Assumption 1.4. Suppose there exist constants $C_{1},\phi>0$ such that

[TABLE]

Let $\mathbf{u}\equiv\mathbf{u}_{M}\in\mathbb{C}^{M}$ and $\mathbf{v}\equiv\mathbf{v}_{N}\in\mathbb{C}^{N}$ denote sequences of deterministic unit vectors. Then for any fixed (small) $\epsilon>0$ and (large) $D>0$ , we have

[TABLE]

for sufficiently large $N$ , and for $\mathfrak{a}:=\min(2\phi,1/2)$ ,

[TABLE]

As an immediate corollary of Theorem 1.5, we have the following result.

Corollary 1.6.

Suppose $d_{N}$ and $\Sigma$ satisfy the Assumption 1.4. Let $X=(x_{ij})$ be an $M\times N$ random matrix whose entries are independent and satisfy (1.1) and (1.2). Suppose there exist constants $a,A>0$ such that

[TABLE]

for all $N$ . Let $\mathbf{u}\equiv\mathbf{u}_{M}\in\mathbb{C}^{M}$ and $\mathbf{v}\equiv\mathbf{v}_{N}\in\mathbb{C}^{N}$ denote sequences of deterministic unit vectors. Then for any fixed $\epsilon>0$ , if $a\geq 6$ , we have

[TABLE]

for sufficiently large $N$ ; if $a\geq 8$ , we have

[TABLE]

Proof of Corollary 1.6.

We use a standard cutoff argument. We fix $a>4$ and choose a constant $\phi>0$ small enough such that $\left(N^{1/2-\phi}\right)^{a}\geq N^{2+\omega}$ for some constant $\omega>0$ . Then we introduce the following truncation

[TABLE]

By the tail condition (1.26), we have

[TABLE]

Moreover, we have

[TABLE]

i.e. $\tilde{X}=X$ almost surely as $N\to\infty$ . Here in the above derivation, we regard $M=N/d_{N}$ as a function depending on $N$ .

Using (1.26) and integration by parts, it is easy to verify that

[TABLE]

which imply that

[TABLE]

Moreover, we trivially have

[TABLE]

Hence $\tilde{X}$ is a random matrix satisfying Assumption 1.4. Then using (1.24) and (1.29) with $a=6$ and $\phi=\epsilon/6$ , we conclude (1.27); using (1.25) and (1.30) with $\phi=(1-\epsilon)/4$ and $a=8$ , we conclude (1.28). ∎

*Remark 1.7**.*

The estimates (1.27) and (1.28) improve the bounds obtained in [52], and relax the assumptions on moments and $\Sigma$ as well. The convergence rates in (1.27) and (1.28) are optimal up to an $N^{\epsilon}$ factor. In fact, it was proved in [3] that for an analytic function $f$ ,

[TABLE]

where $\mathcal{N}(0,\sigma_{f,\mathbf{u}})$ denotes the Gaussian distribution with mean zero and variance $\sigma_{f,\mathbf{u}}$ . This shows that the fluctuation of $F_{Q_{1},\mathbf{u}}(x)$ is of order $N^{-1/2}$ and suggests the bound in (1.28). Taking expectation of (1.31), one can see that the order of $|\mathbb{E}F_{Q_{1},\mathbf{u}}(x)-F_{1c,\mathbf{u}}(x)|$ should be even smaller. Moreover, the fluctuation of eigenvalues on the microscopic scale will lead to an error of order at least $N^{-1}$ by the universality of eigenvalues [6, 33, 44]. This shows that the bound (1.27) should be close to being optimal. We check the bounds (1.27) and (1.28) below with some numerical simulations; see Fig. 1.

*Remark 1.8**.*

In [52], the authors only handle the $M<N$ (i.e. $d_{N}>1$ ) case for $Q_{1}$ , while our proof works for both the $d_{N}>1$ and $d_{N}<1$ cases. However, in the case with $d_{N}\to 1$ , we will encounter some difficulties near the leftmost edge $a_{2L}$ , which converges to [math] as $N\to\infty$ and violates the regularity condition (1.16). We will try to relax this assumption in the future.

*Remark 1.9**.*

In Theorem 1.5, we have assumed that $\Sigma$ is diagonal. But our results can be extended immediately to the case with a general non-diagonal population covariance matrix $\mathbf{C}$ for multivariate normal data. More precisely, let $X$ be a random matrix with $i.i.d.$ Gaussian entries and suppose $\mathbf{C}$ has eigendecomposition $\mathbf{C}=U^{*}\Sigma U$ . Then we have

[TABLE]

Hence for any unit test vector $\mathbf{u}$ , our results can be applied to the VESD of $\Sigma^{1/2}XX^{*}\Sigma^{1/2}$ with test vector $U\mathbf{u}$ .

For generally distributed data, under sufficiently strong moment assumptions, it is possible to prove the same results for the case with non-diagonal population covariance matrix $\mathbf{C}$ . In particular, if the entries of $\sqrt{N}X$ have arbitrarily high moments, it can be proved that (1.27) and (1.28) hold for the VESD of $\mathbf{C}^{1/2}XX^{*}\mathbf{C}^{1/2}$ . The main inputs for the proof will include: (a) the local law in [31, Theorem 3.6] (which generalizes the one in Theorem 3.4 to the non-diagonal $\mathbf{C}$ case with generally distributed data), (b) Theorem 1.5 (proved for the diagonal $\mathbf{C}$ case), (c) a comparison argument in [31, Section 7] (which extends Theorem 1.5 to the non-diagonal case through comparison with the diagonal case), and (d) the Helffer-Sjöstrand arguments in Section 3.2. However, under weaker moment assumptions as in Corollary 1.6, the proof will be much harder. For step (a), we need to use the local law proved in [53], which further generalizes the one in [31] to the heavy-tailed case. The main issue will be that the error bounds in steps (a) and (c) are not sharp enough, which does not give the optimal convergence rates as in (1.27) and (1.28). We would like to deal with this problem in the future, and focus on proving a sharp bound for the convergence rate of VESD in the diagonal $\mathbf{C}$ case in this article.

*Remark 1.10**.*

As discussed above, the convergence of the stochastic process $X^{\Sigma}_{M,\mathbf{u}}$ defined in (1.6) to the Gaussian process $\mathbf{B}^{\Sigma}_{\mathbf{u}}$ in (1.7) is also a very important question, which is complementary to the results in Corollary 1.6. The convergence of $X^{I}_{M,\mathbf{u}}$ to the Brownian bridge was first proved in the null case $\Sigma=I$ , for some special vectors of the form $\mathbf{u}=M^{-1/2}(\pm 1,\cdots,\pm 1)$ in [47]. The result was later extended to the case with a general fixed vector $\mathbf{u}$ in [3]. More precisely, it was proved in [3] that for any fixed vector $\mathbf{u}$ and analytic functions $g_{1},\cdots,g_{k}$ , the random vector

[TABLE]

converges to a Gaussian vector with mean zero and certain covariance function. We expect that combining the method in [3] and the new tools in this paper, one can prove a similar convergence result for $X^{\Sigma}_{M,\mathbf{u}}$ in the case with a non-scalar $\Sigma$ . This will be studied in a future paper.

The rest of this paper is organized as follows. In Section 2, we check the results in Corollary 1.6 with some numerical simulations, and then introduce some applications of our results in high-dimensional statistical inference. We prove Theorem 1.5 in Section 3 using Stieltjes transforms. In the proof, we mainly use Theorems 3.4-3.6, which give the desired anisotropic local laws for the resolvents of $Q_{1}$ and $Q_{2}$ . Theorem 3.5 constitutes the main novelty of this paper, and its proof will be given in Section 4. The proofs of Theorem 3.4 and Theorem 3.6 will be given in the supplementary material.

2 Simulations and applications

In this section, we first check the convergence rate of the (expected) VESD to the deformed MP law with some numerical simulations. Then we will discuss briefly the applications of our results in high-dimensional statistical inference procedures.

2.1 Simulations

The simulations are performed under the following setting: $M=2N$ , i.e. $d_{N}=0.5$ ; the entries $\sqrt{N}x_{ij}$ are drawn from a distribution $\xi$ with mean zero, variance 1 and tail $\mathbb{P}(|\xi|\geq s)\sim s^{-6}$ for large $s$ ; the unit vector $\mathbf{v}$ is randomly chosen for each $N$ . In Fig. 1, we plot the Kolmogorov distances $\|F_{Q_{2},\mathbf{v}}-F_{2c}\|$ and $\|\mathbb{E}F_{Q_{2},\mathbf{v}}-F_{2c}\|$ for the following two choices of $\Sigma$ : $\Sigma=I$ with ESD $\pi=\delta_{1}$ , and

[TABLE]

For each $N$ , we take an average over 10 repetitions to represent $F^{(N)}_{Q_{2},\mathbf{v}}$ and an average over $4N^{2}$ repetitions to approximate $\mathbb{E}F^{(N)}_{Q_{2},\mathbf{v}}$ . Under each setting, we choose an appropriate function $f(x)$ to fit the simulation data. It is easy to observe that the convergence rate of the VESD is bounded by $O(N^{-1/2})$ , while the convergence rate of the expected VESD has order $N^{-1}$ . This verifies the results in Corollary 1.6.

As discussed before, the convergence of $F_{Q_{2},\mathbf{v}}$ to $F_{2c}$ for any sequence of deterministic unit vectors $\mathbf{v}$ can be used to characterize the asymptotical Haar property of the eigenmatrix of $Q_{2}=X^{*}\Sigma X$ (which also implies the asymptotical Haar property of the eigenmatrix of $Q_{1}$ when $\Sigma=\sigma^{2}I$ ). On the other hand, for a general $\Sigma$ , the eigenmatrix of $Q_{1}$ is not asymptotically Haar distributed anymore and the VESD of $Q_{1}$ will depend on $\mathbf{v}$ . Moreover, (1.17) gives an explicit dependence of $\mathbf{F}_{1c}$ on $\Sigma$ , which should be of interest to statistical applications. (For more details on the application of this principle, the reader can refer to the discussions in Section 2.2.3.) In Fig. 2(a), we plot $F_{Q_{1},\mathbf{v}}$ for $\Sigma$ in (2.1) and different choices of $\mathbf{v}_{i}$ , $i=1,2,3$ . One can observe a transition of $F_{Q_{1},\mathbf{v}}$ when $\mathbf{v}$ changes from the direction corresponding to the smaller eigenvalues of $\Sigma$ to the direction corresponding to the larger eigenvalues of $\Sigma$ . In Fig. 2(b), we take $\Sigma=UDU^{*}$ , where $D$ is as in (2.1), $U$ is a randomly chosen unitary matrix, and $\mathbf{w}_{i}=U\mathbf{v}_{i}$ . One can see that even if $\Sigma$ is non-diagonal, the convergence of the VESD of $Q_{1}$ still holds (see Remark 1.9).

2.2 Statistical applications

2.2.1 Detection of signals in noise

Consider the following model:

[TABLE]

where $A$ is an $M\times k$ deterministic matrix, $\mathbf{s}$ is a $k$ -dimensional mean zero signal vector, and $\mathbf{z}$ is an $M$ -dimensional noise vector with $i.i.d.$ centered entries. Moreover, the signal vector and the noise vector are assumed to be independent. In practice, suppose we observe $N$ such $i.i.d.$ samples and set the matrix $X=(\mathbf{x}_{1},\cdots,\mathbf{x}_{N})$ . This signal-plus-noise model is a standard model in classic signal processing [29]. A fundamental task is to detect the signals via observed samples, and the very first step is to know whether there exists any such signal, i.e.,

[TABLE]

The model (2.2) is also widely used in various other fields, such as multivariate statistics, wireless communications, bioinformatics, and finance. For example, in multivariate statistics one wants to determine whether there exists any relation between two sets of variables. To test the independence, we can adopt the multivariate multiple regression model (2.2), where $\mathbf{x}$ and $\mathbf{s}$ are the two sets of variables for testing [25]. Then we can test the null hypothesis that these regression coefficients are all zero:

[TABLE]

Another example is from financial studies [19, 20, 21]. In the empirical research of finance, (2.2) is the factor model, where $\mathbf{s}$ is the common factor, $\Gamma$ is the factor loading matrix and $\mathbf{z}$ is the idiosyncratic component. In order to analyze the stock return $\mathbf{x},$ we first need to know if the factor $\mathbf{s}$ is significant for the prediction. Then a statistical test can be also constructed as (2.4).

For the above hypothesis testing problems (2.3) and (2.4), under the null hypothesis $\mathbf{H}_{0}$ , we have that $F_{Q_{1},\mathbf{u}}^{(M)}=F_{MP}+O(M^{-1/2+\epsilon})$ for any unit vector $\mathbf{u}$ independent of $X$ by our results. As an example, we perform a simulation under the following setting: $M=2000$ , $N=2M$ ; the entries $\sqrt{N}z_{ij}$ are $i.i.d.$ Gaussian with mean 0 and variance 1; the entries $\sqrt{N}s_{ij}$ are $i.i.d.$ Bernoulli $\pm 1$ random variables. We choose $A=DV$ , where $V$ is a randomly chosen unitary matrix and $D$ is an $M\times k$ matrix satisfying the following: all the entries of $D$ are zero except $D_{n(i)i}$ , and each $D_{n(i)i}$ is sampled uniformly from $[0.4,0.8]$ . Here $n(i)$ , $1\leq i\leq k$ , are $k$ values sampled uniformly at random from the integers 1 to $M$ . In Fig. 3, we plot the Kolmogorov distances $\|F_{Q_{1},\mathbf{e}_{i}}-F_{MP}\|$ with respecto to $i$ , where $\mathbf{e}_{i}$ denotes the standard unit vector along $i$ -axis. Comparing the $k=10$ case with the null case, we observe 10 obvious peaks. Moreover, the positions of the peaks correspond to the values of $n(i)$ , and the heights of the peaks give the strengths of the signals. Note that if one use the bound $M^{-1/4}$ in [52], then the estimated noise would be of order $0.15$ , which does not allow one to detect the smallest few signals.

For Gaussian noise, some classical statistical procedures to test the number of signals usually use the largest eigenvalue of the sample covariance matrix [8, 37, 38]. The key property is that the largest eigenvalue converges to the Gaussian distribution under the $N^{1/2}$ scaling if it is an outlier, and the Tracy-Widom distribution under the $N^{2/3}$ scaling otherwise. Onatski proposed to use the test statistic $R=(\lambda_{1}-\lambda_{2})/(\lambda_{2}-\lambda_{3})$ , which is asymptotically pivotal [40]. Our method is more general in the sense that it can be also applied in the case without outliers. For example, one can check numerically that the sample covariance matrices in Fig. 3 has no outliers.

2.2.2 Separable covariance matrices

Consider data matrices of the form

[TABLE]

where $X$ is an $M\times N$ random matrix as in Corollary 1.6, and $\Sigma_{1}$ and $\Sigma_{2}$ are $M\times M$ and $N\times N$ deterministic positive-definite matrices, respectively. Then $Q_{1}:=YY^{*}$ is called a separable covariance matrix, and it is widely used to model the spatio-temporal sampling data [16, 43, 49]. Without loss of generality, we shall call $\Sigma_{1}$ the spatial covariance matrix and $\Sigma_{2}$ the temporal covariance matrix. Suppose we want to determine whether the spatial identity holds, i.e.,

[TABLE]

For this hypothesis testing problem, under $\mathbf{H}_{0},$ we have that $F_{Q_{1},\mathbf{u}_{1}}^{(M)}=F_{Q_{1},\mathbf{u}_{2}}^{(M)}+O(M^{-1/2+\epsilon})$ for any unit vectors $\mathbf{u}_{1,2}$ independent of $X$ . More generally, we can test whether $\Sigma_{1}=\Sigma_{0}$ for some given positive definite matrix $\Sigma_{0}$ by using $\Sigma_{0}^{-1/2}Y$ . Similarly, the temporal identity can also be tested using the VESD of $Q_{2}:=Y^{*}Y$ . Note that our error bound allows us to test very weak signals up to order $M^{-1/2}$ with one sample. The precision can be further improved if one can take average over many samples.

We now illustrate this application with some numerical simulations. The simulations are performed under the following setting: $N=2M$ ; the entries $\sqrt{N}x_{ij}$ are $i.i.d.$ Gaussian with mean 0 and variance 1. We consider separable covariance matrices of the form (2.5) with

[TABLE]

where

[TABLE]

or for $i.i.d.$ sequence of random variables $b_{1},b_{2},\cdots\sim\text{unif}(-1,1)$ ,

[TABLE]

In Fig. 4, for $M=4000$ , $a=0.1$ and the above two choices of $A$ , we plot the Kolmogorov distances $\|F_{Q_{1},\mathbf{e}_{k}}-M^{-1}\sum_{k}F_{Q_{1},\mathbf{e}_{k}}\|$ with respecto to $k$ . We compare them with the results in the null case with $a=0$ , and observe very obvious signals. Note that if one use the bound $M^{-1/4}$ in [52], then the estimated noise would be of order $0.126$ , which does not allow one to detect such “weak” signals.

For this problem, [6] proposed to use the largest eigenvalue $\lambda_{1}(YY^{*})$ as a test statistic. But it has the disadvantage that the limiting distribution of $\lambda_{1}$ depends on the unknown matrices $\Sigma_{1}$ and $\Sigma_{2}$ , and hence is not asymptotically pivotal. Moreover, it was proved in [53] that the behavior of $\lambda_{1}$ in the non-identity $\Sigma_{1}$ case is similar to the one in the identity case, which is not good for test purpose. On the other hand, our procedure tests the isotropic property of $\Sigma_{1}$ directly.

2.2.3 Eigenvectors of population covariance matrices

Now we go back to consider the sample covariance matrices $Q_{1}=\Sigma^{1/2}XX^{*}\Sigma^{1/2}$ . By Corollary 1.6, we know that the VESD $F_{Q_{1},\mathbf{u}}^{(M)}$ converges to $F_{1c,\mathbf{u}}^{(M)}$ , which is defined through the Stieltjes transform (1.17). It is easy to observe that the matrix $\mathbf{F}_{1c}^{(M)}$ is diagonal in the eigenbasis of $\Sigma$ , and the diagonal entries depend on the eigenvalues of $\Sigma$ in an explicit way. This allows one to use the VESD of $Q_{1}$ to detect the leading eigenvectors (or eigenspaces) of $\Sigma$ . More precisely, if $\mathbf{u}_{i}$ is the eigenvector of $\Sigma$ with eigenvalue $\sigma_{i}$ , then with (1.17) and the inverse formula we can get that

[TABLE]

where $E\in\mathbb{R}$ , $\rho^{(M)}_{1c,\mathbf{u}_{i}}$ is the density of $F_{1c,\mathbf{u}_{i}}^{(M)}$ , and we abbreviate $m_{2c}(E)\equiv m_{2c}(E+\operatorname{\rm{i}}0_{+})$ . Near the right edge $\gamma_{1}$ , we know that $-\sigma_{1}^{-1}<m_{2c}(E)<0$ (see [31, Appendix A]). Hence it is easy to see that there exists a constant $c>0$ such that for $\gamma_{1}-c<E<\gamma_{1}$ , $\rho^{(M)}_{1c,\mathbf{u}_{i}}(E)$ is monotone with respect to $\sigma_{i}$ . In particular, $\rho^{(M)}_{1c,\mathbf{u}}(E)$ is maximized if $\mathbf{u}=\mathbf{u}_{1}$ . Thus our results shows that measuring the density (i.e. the slope) of $F_{Q_{1},\mathbf{u}}^{(M)}$ allows one to make some inference on the overlaps between the test vectors and the population eigenvectors corresponding to the leading eigenvalues of $\Sigma$ .

In Fig. 5, we give two examples of VESD of spiked covariance matrices. In the simulations, we take $M=1000$ and the entries $\sqrt{N}x_{ij}$ to be $i.i.d.$ Gaussian with mean 0 and variance 1. One can take the population covariance matrix to be a general positive definite matrix, but for simplicity we assume that it is diagonal by properly rotating the test vectors; see Remark 1.9 and (1.32). In Fig. 5(a), we take $N=2M$ , and

[TABLE]

In Fig. 5(b), we take $N=10M$ , and

[TABLE]

Moreover, we take the following test vectors (up to normalization):

[TABLE]

For each choice of $\mathbf{v}_{i}$ , we take an average over 10 repetitions to get $F^{(M)}_{Q_{1},\mathbf{v}_{i}}$ .

Note that the flat parts of the curves in Fig. 5 correspond to the gaps between different components of the eigenvalue spectrum of $Q_{1}$ . Hence the spectral densities in Fig. 5(a) and 5(b) have two and three components, respectively. The rightmost components can be formally regarded as the outlier component caused by the large eigenvalues of $\Sigma$ . It is easy to see that for $x$ near the right edge (e.g. the $x$ marked by the dashed line), the slope of the VESD $F_{Q_{1},\mathbf{v}_{i}}(x)$ increases as $i$ changes from 1 to 5. This verifies our previous conclusion, i.e. the density $\rho_{1c,\mathbf{v}}$ increases if $\mathbf{v}$ has more overlap with the leading eigenvectors of $\Sigma$ . Note that since all the VESD curves reach 1 at the right edge $\gamma_{1}$ , the lower curves have larger densities.

Here we have only considered examples with diagonal $\Sigma$ . However, our results is possible to be applied to more general and complicated sample covariance matrices with nonzero correlations between rows, i.e. non-diagonal population covariance matrix $\mathbf{C}$ (see Remark 1.9). This gives much more insight into future applications of our results in high-dimensional statistical inference. We also remark that in [32], the overlaps between sample eigenvectors and population eigenvectors are studied through certain functionals that are closely related to VESD (with test vectors being specified to be the population eigenvectors). Based on the results in [32], certain estimator was proposed to estimate the population covariance $\mathbf{C}$ [10]. However, this estimator does not provide much information about the population eigenvectors since it uses the same eigenvectors as the sample covariance matrix $Q_{1}$ .

3 Proof of Theorem 1.5

For definiteness, we will focus on real sample covariance matrices during the proof. However, our proof also applies, after minor changes, to the complex case if we include the extra assumption (1.2) or (1.21).

3.1 Anisotropic local Marčenko-Pastur law

A basic tool for the proof is the Stieltjes transform. For any $z=E+i\eta\in\mathbb{C}_{+}$ , we define the resolvents (the Green functions) of $Q_{1}$ and $Q_{2}$ as

[TABLE]

Then the Stieltjes transforms of the ESD of $Q_{1,2}$ are equal to

[TABLE]

and the Stieltjes transforms of $F^{(M)}_{Q_{1},\mathbf{u}}$ and $F^{(N)}_{Q_{2},\mathbf{v}}$ are equal to $\langle\mathbf{u},\mathcal{G}_{1}(X,z)\mathbf{u}\rangle$ and $\langle\mathbf{v},\mathcal{G}_{2}(X,z)\mathbf{v}\rangle$ , respectively. The main goal of this subsection is to establish the following asymptotic estimate for $z\in\mathbb{C}_{+}$ :

[TABLE]

By taking the imaginary part, it is easy to see that a control of the Stieltjes transforms $\langle\mathbf{u},\mathcal{G}_{1}(X,z)\mathbf{u}\rangle$ and $\langle\mathbf{v},\mathcal{G}_{2}(X,z)\mathbf{v}\rangle$ yields a control of the VESD on the scale of order ${\rm{Im}}\,z$ around $E$ . An anisotropic local law is an estimate of the form (3.2) for all ${\rm{Im}}\,z\gg N^{-1}$ . Such local law was first established in [30, 9, 31] for sample covariance matrices, assuming that the matrix entries have arbitrarily high moments. In Section 3.2, we will finish the proof of Theorem 1.5 with the (almost) optimal anisotropic local laws for $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ .

Our anisotropic local law can be stated in a simple and unified fashion using the following $(N+M)\times(N+M)$ self-adjoint matrix $H$ :

[TABLE]

We define the resolvent of $H$ as

[TABLE]

Using Schur complement formula, it is easy to check that

[TABLE]

Thus a control of $G$ yields directly a control of the resolvents $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ . For simplicity of notations, we define the index sets

[TABLE]

We shall consistently use the latin letters $i,j\in\mathcal{I}_{1}$ , greek letters $\mu,\nu\in\mathcal{I}_{2}$ , and $a,b\in\mathcal{I}$ . Then we label the indices of $X$ as $X=(X_{i\mu}:i\in\mathcal{I}_{1},\mu\in\mathcal{I}_{2}).$

We will use the following notion of stochastic domination, which was first introduced in [17] and subsequently used in many works on random matrix theory, such as [9, 31]. It simplifies the presentation of the results and their proofs by systematizing statements of the form “ $\xi$ is bounded with high probability by $\zeta$ up to a small power of $N$ ”.

*Definition 3.1** (Stochastic domination).*

(i) Let

[TABLE]

be two families of nonnegative random variables, where $U^{(N)}$ is a possibly $N$ -dependent parameter set. We say $\xi$ is stochastically dominated by $\zeta$ , uniformly in $u$ , if for any (small) $\epsilon>0$ and (large) $D>0$ ,

[TABLE]

for large enough $N\geq N_{0}(\epsilon,D)$ .

(ii) If $\xi$ is stochastically dominated by $\zeta$ , uniformly in $u$ , we use the notation $\xi\prec\zeta$ . Moreover, if for some complex family $\xi$ we have $|\xi|\prec\zeta$ , we also write $\xi\prec\zeta$ or $\xi=O_{\prec}(\zeta)$ .

(iii) We say that an event $\Xi$ holds with high probability if for any constant $D>0$ , $\mathbb{P}(\Xi)\geq 1-N^{-D}$ for large enough $N$ .

The following lemma collects basic properties of stochastic domination, which will be used tacitly throughout the proof .

*Lemma 3.2** (Lemma 3.2 in [9]).*

(i) Let $\xi$ and $\zeta$ be families of nonnegative random variables. Suppose that $\xi(u,v)\prec\zeta(u,v)$ uniformly in $u\in U$ and $v\in V$ . If $|V|\leq N^{C}$ for some constant $C$ , then $\sum_{v\in V}\xi(u,v)\prec\sum_{v\in V}\zeta(u,v)$ uniformly in $u$ .

(ii) If $0\leq\xi_{1}(u)\prec\zeta_{1}(u)$ and $0\leq\xi_{2}(u)\prec\zeta_{2}(u)$ uniformly in $u\in U$ , then $\xi_{1}(u)\xi_{2}(u)\prec\zeta_{1}(u)\zeta_{2}(u)$ uniformly in $u\in U$ .

(iii) Suppose that $\Psi(u)\geq N^{-C}$ is deterministic and $\xi(u)$ satisfies $\mathbb{E}\xi(u)^{2}\leq N^{C}$ for all $u$ . Then if $\xi(u)\prec\Psi(u)$ uniformly in $u$ , we have $\mathbb{E}\xi(u)\prec\Psi(u)$ uniformly in $u$ .

Throughout the rest of this paper, we will consistently use the notation $z=E+i\eta$ for the spectral parameter $z$ . In the following proof, we always assume that $z$ lies in the spectral domain

[TABLE]

for some small constant $\omega>0$ , unless otherwise indicated. Recall the condition (1.16), we can take $\omega$ to be sufficiently small such that $\omega\leq\gamma_{K}/2$ . Define the distance to the spectral edges as $\kappa:=\min_{1\leq k\leq 2L}|E-a_{k}|.$ Then we have the following estimates for $m_{2c}$ :

[TABLE]

and

[TABLE]

for $z\in\mathbf{D}$ . The reader can refer to [31, Appendix A] for the proof.

We define the deterministic limit

[TABLE]

and the control parameter

[TABLE]

Note that by (3.7) and (3.8), we have for $z\in\mathbf{D}$ ,

[TABLE]

*Definition 3.3** (Bounded support condition).*

We say a random matrix $X$ satisfies the bounded support condition with $q$ , if

[TABLE]

Here $q\equiv q(N)$ is a deterministic parameter and usually satisfies $N^{-{1}/{2}}\leq q\leq N^{-\phi}$ for some (small) constant $\phi>0$ . Whenever (3.12) holds, we say that $X$ has support $q$ . Obviously, if the entries of $X$ satisfy (1.23), then $X$ trivially satisfies the bounded support condition with $q=N^{-\phi}$ .

Now we are ready to state the local laws for the resolvent $G(X,z)$ . Here and throughout the following, whenever we say “uniformly in any deterministic vectors”, we mean that “uniformly in any deterministic vectors belonging to some fixed set of cardinality $N^{O(1)}$ ”.

Theorem 3.4 (Local MP law).

Suppose $d_{N}$ , $X$ and $\Sigma$ satisfy the Assumption 1.4. Suppose $X$ is real and satisfies (3.12) with $q\leq N^{-\phi}$ for some constant $\phi>0$ . Then the following estimates hold for $z\in\mathbf{D}$ :

(1) the averaged local law:

[TABLE]

(2) the anisotropic local law: for deterministic unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}}$ ,

[TABLE]

(3) for deterministic unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ or $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ ,

[TABLE]

All of the above estimates are uniform in the spectral parameter $z$ and the deterministic vectors $\mathbf{u},\mathbf{v}$ .

The proof for Theorem 3.4 will be given in the Supplementary material A. Here we make some brief comments on it. If we assume (1.1) (instead of (1.19) and (1.20)) and $q=N^{-1/2}$ , then (3.13) and (3.14) have been proved in [31]. If we have (1.1) and $q\leq N^{-\phi}$ , then it was proved in Lemma 3.11 and Theorem 3.14 of [14] that the averaged local law (3.13) and the entrywise local law

[TABLE]

hold uniformly in $z\in\mathbf{D}$ . With (3.16) and the moment assumption (1.22), one can repeat the arguments in [9, Section 5] or [50, Section 5] to get the anisotropic local law (3.14). The main novelty of this theorem is the bound (3.15), which is the main focus in the proof in supplementary material. Finally, if the variance assumption in (1.1) is relaxed to the one in (1.20), we can repeat the previous arguments to get the desired estimates (3.13)-(3.15). In fact, it is easy to check that the $O(N^{-2-c_{0}})$ term leads to a negligible error at each step, and the whole proof remains unchanged. The relaxation of the mean zero assumption in (1.1) to the assumption (1.19) can be handled with the centralization Lemma 4.4.

After taking expectation, we have the following crucial improvement from (3.15) to (3.17), which is the main reason why we can improve the bound in [52] to the almost optimal one in (1.24). In fact, the leading order terms of $(\langle\mathbf{u},\mathcal{G}_{1}\mathbf{u}\rangle-m_{1c,\mathbf{u}})$ and $(\langle\mathbf{v},\mathcal{G}_{2}\mathbf{v}\rangle-m_{2c})$ vanish after taking expectation, and hence leads to a bound that is one order smaller than the one in (3.15). The proof of Theorem 3.5 will be given in Sections 4, which constitutes the main novelty of this paper.

Theorem 3.5.

Suppose the assumptions in Theorem 3.4 hold. Then we have

[TABLE]

uniformly in $z\in\mathbf{D}$ and deterministic unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ or $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ .

If $q=N^{-1/4}$ , then (3.15) and (3.17) already give that

[TABLE]

which are sufficient to conclude Theorem 1.5. However, we find that the second bound on the expected VESD is still valid under a much weaker support assumption. More specifically, we have the following theorem, whose proof will be given in the supplementary material.

Theorem 3.6.

Suppose the assumptions in Theorem 3.4 hold. Then we have

[TABLE]

uniformly in $z\in\mathbf{D}$ and deterministic unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ or $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ .

As a corollary of (3.13), we have the following rigidity result for the eigenvalues. The reader can refer to [31, Theorem 3.12] for the proof. Recall the notations in (1.14) and (1.15).

Theorem 3.7 (Rigidity of eigenvalues).

Suppose Theorem 3.4 and the regularity condition (1.16) hold. Then for $\gamma_{j}\in[a_{2k},a_{2k-1}]$ , we have

[TABLE]

3.2 Convergence rate of the VESD

In this subsection, we finish the proof of Theorem 1.5 using Theorems 3.4-3.7. The following arguments have been used previously to control the Kolmogorov distance between the ESD of a random matrix and the limiting law. For example, the reader can refer to [22, Lemma 6.1] and [44, Lemma 8.1]. By the remark below (3.6), we can choose the constant $\omega>0$ such that $\gamma_{K}/2>\omega$ . Also for simplicity, we will only prove the bounds for $\|\mathbb{E}F_{Q_{2},\mathbf{v}}-F_{2c}\|$ and $\|F_{Q_{2},\mathbf{v}}-F_{2c}\|$ . The bounds for $\|\mathbb{E}F_{Q_{1},\mathbf{u}}-F_{1c,\mathbf{u}}\|$ and $\|F_{Q_{1},\mathbf{u}}-F_{1c,\mathbf{u}}\|$ can be proved in the same way.

Proof of (1.24).

The key inputs are the bounds (3.18) and (3.19). Suppose $\langle\mathbf{v},\mathcal{G}_{2}(X,z)\mathbf{v}\rangle$ is the Stieltjes transform of $\hat{\rho}_{\mathbf{v}}$ . Then we define

[TABLE]

and $\rho_{\mathbf{v}}:=\mathbb{E}\hat{\rho}_{\mathbf{v}}$ , $n_{\mathbf{v}}:=\mathbb{E}\hat{n}_{\mathbf{v}}$ . Hence we would like to bound

[TABLE]

For simplicity, we denote $\Delta\rho:=\rho_{\mathbf{v}}-\rho_{2c}$ and its Stieltjes transform by

[TABLE]

Let $\chi(y)$ be a smooth cutoff function with support in $[-1,1]$ , with $\chi(y)=1$ for $|y|\leq 1/2$ and with bounded derivatives. Fix $\eta_{0}=N^{-1+\omega}$ and $3\gamma_{K}/4\leq E_{1}<E_{2}\leq 3\gamma_{1}/2$ . Let $f\equiv f_{E_{1},E_{2},\eta_{0}}$ be a smooth function supported in $[E_{1}-\eta_{0},E_{2}+\eta_{0}]$ such that $f(x)=1$ if $x\in[E_{1}+\eta_{0},E_{2}-\eta_{0}]$ , and $|f^{\prime}|\leq C\eta_{0}^{-1}$ , $|f^{\prime\prime}|\leq C\eta_{0}^{-2}$ if $|x-E_{i}|\leq\eta_{0}$ . Using the Helffer-Sjöstrand calculus (see e.g. [12]), we have

[TABLE]

Then we obtain that

[TABLE]

By (3.18) with $\eta=\eta_{0}$ , we have

[TABLE]

Since $\eta{\rm{Im}}\,\mathbb{E}\langle\mathbf{v},\mathcal{G}_{2}(X,E+i\eta)\mathbf{v}\rangle$ and $\eta{\rm{Im}}\,m_{2c}(E+i\eta)$ are increasing with $\eta$ , we obtain that

[TABLE]

Moreover, since $G(X,z)^{*}=G(X,\bar{z})$ , the estimates (3.18) and (3.25) also hold for $z\in\mathbb{C}_{-}$ .

Now we bound the terms (3.21), (3.22) and (3.23). Using (3.18) and that the support of $\chi^{\prime}$ is in $1\geq|y|\geq 1/2$ , the term (3.21) can be bounded by

[TABLE]

Using $|f^{\prime\prime}|\leq C\eta_{0}^{-2}$ and (3.25), we can bound the terms in (3.22) by

[TABLE]

Finally, we integrate the term (3.23) by parts first in $x$ , and then in $y$ (and use the Cauchy-Riemann equation $\partial{\rm{Im}}(\Delta m)/\partial x=-\partial{\rm{Re}}(\Delta m)/\partial y$ ) to get

[TABLE]

We bound the term in (3.28) by $O_{\prec}(N^{-1})$ using (3.18) and $|f^{\prime}|\leq C\eta_{0}^{-1}$ . The first term in (3.29) can be estimated by $O_{\prec}(N^{-1})$ as in (3.26). For the second term in (3.29), we again use (3.18) and $|f^{\prime}|\leq C\eta_{0}^{-1}$ to get that

[TABLE]

Combining the above estimates, we obtain that

[TABLE]

Obviously, the same estimate also holds for the $y\leq-\eta_{0}$ part. Together with (3.26) and (3.27), we conclude that

[TABLE]

For any interval $I:=[E-\eta_{0},E+\eta_{0}]$ with $E\in[\gamma_{K}/2,2\gamma_{1}]$ , we have

[TABLE]

where in the last step we used the spectral decomposition

[TABLE]

which follows from (1.3). Then by (3.24) and Lemma 3.2, we get that

[TABLE]

On the other hand, since $\rho_{2c}$ is bounded, we trivially have

[TABLE]

Now we set $E_{2}=3\gamma_{1}/2$ . With (3.30), (3.32) and (3.33), we get that for any $E\in[3\gamma_{K}/4,E_{2}]$ ,

[TABLE]

Note that by (3.19), the eigenvalues of $Q_{2}$ are inside $\{0\}\cup[3\gamma_{K}/4,E_{2}]$ with high probability. Hence we have that with high probability,

[TABLE]

Together with (3.34), we get that

[TABLE]

This concludes (1.24) since $\omega$ can be arbitrarily small. ∎

Proof of (1.25).

The proof for (1.25) is similar except that we shall use the estimate (3.15) instead of (3.18). By (3.15), we have for any $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ ,

[TABLE]

uniformly in $z\in\mathbf{D}$ . Then we would like to bound (recall (3.20))

[TABLE]

where $\hat{n}_{\mathbf{v}}$ is defined in (3.20). We denote

[TABLE]

Then for $f_{E_{1},E_{2},\eta_{0}}$ defined above, we can repeat the Helffer-Sjöstrand argument with the estimate (3.37) to get that

[TABLE]

which, together with (3.31) and (3.35), implies that

[TABLE]

This concludes (1.25) by the Definition 3.1. ∎

4 Proof of Theorem 3.5

We first collect some useful identities from linear algebra and some simple resolvent estimates. For simplicity, we denote $Y:=\Sigma^{1/2}X$ .

*Definition 4.1** (Minors).*

For $\mathbb{T}\subseteq\mathcal{I}$ , we define the minor $H^{(\mathbb{T})}:=(H_{ab}:a,b\in\mathcal{I}\setminus\mathbb{T})$ obtained by removing all rows and columns of $H$ indexed by $a,b\in\mathbb{T}$ . Note that we keep the names of indices when defining $H^{(\mathbb{T})}$ , i.e. $(H^{(\mathbb{T})})_{ab}=\mathbf{1}_{\{a,b\notin\mathbb{{T}}\}}H_{ab}$ . Correspondingly, we define the Green function

[TABLE]

and the partial traces

[TABLE]

where we adopt the convention that $G^{(\mathbb{T})}_{ab}=0$ if $a\in\mathbb{T}$ or $b\in\mathbb{T}$ . For simplicity, we will abbreviate $(\{a\})\equiv(a)$ and $(\{a,b\})\equiv(ab)$ .

*Lemma 4.2** (Resolvent identities).*

(i)

For $a\in\mathcal{I}$ and $b,c\in\mathcal{I}\setminus\{a\}$ ,

[TABLE]

(ii)

For $i\in\mathcal{I}_{1}$ and $\mu\in\mathcal{I}_{2}$ , we have

[TABLE]

(iii)

For $i\neq j\in\mathcal{I}_{1}$ and $\mu\neq\nu\in\mathcal{I}_{2}$ , we have

[TABLE]

(iv)

All of the above identities hold for $G^{(\mathbb{T})}$ instead of $G$ for $\mathbb{T}\subset\mathcal{I}$ .

Proof.

These identities can be proved using Schur complement formula. The reader can refer to e.g. [9, Lemmas 3.6 and 3.8] or [31, Lemma 4.4]. ∎

*Lemma 4.3**.*

Suppose $\tilde{\Phi}(z)$ is a deterministic function on $\mathbf{D}$ satisfying $N^{-1/2}\leq\tilde{\Phi}(z)\leq N^{-c}$ for some constant $c>0$ . Suppose $\left|G_{ab}(z)-\Pi_{ab}(z)\right|\prec\tilde{\Phi}(z)$ uniformly in $a,b\in\mathcal{I}$ and $z\in\mathbf{D}$ . Then for any $\mathbb{T}\subseteq\mathcal{I}$ with $|\mathbb{T}|=O(1)$ , we have uniformly in $z\in\mathbf{D}$ ,

[TABLE]

Proof.

The bound (4.4) can be proved by repeatedly applying the first resolvent expansion in (4.1) with respect to the indices in $\mathbb{T}$ . ∎

For $X$ satisfying the assumptions in Theorem 3.4, we write $X=X_{1}+B,$ where $X_{1}:=X-\mathbb{E}X$ is a real random matrix satisfying (1.20), (1.22) and

[TABLE]

and $B:=\mathbb{E}X$ is a deterministic matrix such that

[TABLE]

The next lemma shows that $G(X,z)$ is very close to $G(X_{1},z)$ in the sense of anisotropic local law. Its proof will be given in the supplementary material.

*Lemma 4.4**.*

If (3.14) holds for $G(X_{1},z)$ , then we have

[TABLE]

uniformly in $z\in\mathbf{D}$ and deterministic unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}}$ .

4.1 Sketch of the proof for Theorem 3.5

In this subsection, we start proving our main resolvent estimate (3.17). For simplicity, we denote $\Phi:=q^{2}+(N\eta)^{-1/2}$ . By Lemma 4.4, we can assume that the entries of $X$ are centered without loss of generality. We will only prove (3.17) for $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ , while the proof in the case of $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ is exactly the same. Also by polarization, it suffices to prove the following estimate

[TABLE]

We can obtain the more general bound (3.17) by applying (4.8) to the vectors $\mathbf{u}+\mathbf{v}$ and $\mathbf{u}+i\mathbf{v}$ , respectively. Note that (3.15) gives the a priori bound

[TABLE]

We will show that after taking expectation, the leading order term in $\left(\mathcal{G}_{2}\right)_{\mu\nu}-m_{2c}\delta_{\mu\nu}$ vanishes and leads to the better estimate (4.8). We deal with the diagonal and off-diagonal parts separately:

[TABLE]

For any $\mathbb{T}\subseteq\mathcal{I}$ , we define the $Z$ variables

[TABLE]

where $\mathbb{E}_{\mu}[\cdot]:=\mathbb{E}[\cdot|H^{(\mu)}],$ i.e. it is the partial expectation in the randomness of the $\mu$ -th row and column of $H$ , and we used (4.2) in the second step. If $\mathbb{T}=\emptyset$ , we shall abbreviate $Z_{i}\equiv Z^{(\emptyset)}_{i}$ . Note that by (3.15), (4.4) (with $\tilde{\Phi}=q+\Psi$ by (3.14)), and Lemma 3.2, we have

[TABLE]

for any $\mathbb{T}\subseteq\mathcal{I}$ with $|\mathbb{T}|=O(1)$ . Then using (4.2) we get that

[TABLE]

where in the second step we used (3.13), (4.4), (A.26), and

[TABLE]

which follows from (3.9) and (1.10). So we can bound the diagonal part by

[TABLE]

For the off-diagonal part, we claim that for $\mu\neq\nu\in\mathcal{I}_{2}$ ,

[TABLE]

Then using (4.13) and $\|\mathbf{v}\|_{1}\leq\sqrt{N}$ , we obtain that

[TABLE]

This concludes (4.8) together with (4.12).

To prove (4.13), we extend the arguments in [9, Section 5] and [50, Section 5]. We illustrate the basic idea with some simplified calculations. Using the resolvent identities (4.3) and (4.1), we get

[TABLE]

We now focus on the first term. Applying (4.2) gives that

[TABLE]

where we have

[TABLE]

by (4.11), (3.13), (4.4) (with $\tilde{\Phi}=q+\Psi$ ) and (A.26). We now expand the fractions in (4.15) in order to take the expectation. Note that the $G^{(\mu\nu)}$ entries are independent of the $X$ entries in the $\mu,\nu$ -th rows and columns. Thus to attain a nonzero expectation, each $X$ entry must appear at least twice in the expression. Due to this reason, the leading and next-to-leading order terms in the expansion vanish. The “real” leading order term is

[TABLE]

where the constants $C_{i,j}$ depend on $\sigma_{i}$ , $\sigma_{j}$ and the 3rd moments of $X_{i\mu}$ and $X_{j\mu}$ (recall (1.22)). Here in the last step, we used $|G^{(\mu\nu)}_{ii}-\Pi_{ii}|\prec\Phi$ (by (3.15) and (4.4)) and $|\Pi_{ii}|=O(1)$ (by (3.8)), and bounded the $i=j$ terms by $O_{\prec}(N^{-2})=O_{\prec}(N^{-1}\Phi^{2})$ . Now applying (4.3) to $G^{(\mu\nu)}_{ij}$ , we get that

[TABLE]

where in the second step we used $|G_{ii}^{(\mu\nu)}-\Pi_{ii}|+|G^{(i\mu\nu)}_{jj}-\Pi_{jj}|\prec\Phi$ and

[TABLE]

which follow easily from (3.15) and (4.4), and in the last step the leading order term vanishes since the two $X$ entries are independent for $i\neq j$ . Then with (4.18), the terms in (4.17) can be bounded by $O_{\prec}(N^{-1}\Phi^{2})$ .

In general, after the expansion of the two fractions in (4.15), we get a summation of terms of the form

[TABLE]

up to some deterministic coefficients of order $O(1)$ . Since $|\epsilon_{\mu,\nu}|\prec\Phi\lesssim N^{-\omega/2}$ for $z\in\mathbf{D}$ (we can take $\omega$ small enough such that $N^{-\omega/2}\geq q^{2}$ ), we only need to include the terms with $m+n\leq 2+2/\omega$ and the tail terms will be smaller than $N^{-1}\Phi^{2}$ . Note that in $A_{m,n}$ , the $X_{*\mu}$ entries, $X_{*\nu}$ entries and $G^{(\mu\nu)}$ entries are mutually independent. Moreover, both the number of $X_{*\mu}$ entries and the number of $X_{*\nu}$ entries are odd. Thus to attain a nonzero expectation, we must pair the $X$ entries such that there are products of the forms $X_{i\mu}^{n_{1}}$ and $X_{j\nu}^{n_{2}}$ for some $n_{1},n_{2}\geq 3$ . As a result, we lose $(n_{1}-2)/2+(n_{2}-2)/2\geq 1$ free indices, and this contributes an $N^{-1}$ factor. On the other hand, for the product of $G$ entries, we have the following three cases: (1) if there are at least $2$ off-diagonal $G$ entries, then we bound them with $O_{\prec}(\Phi^{2})$ ; (2) if there is only $1$ off-diagonal $G$ entry, then we can use the trick in (4.17) and the bound (4.18); (3) if there is no off-diagonal $G$ entry, then we lose one more free index and get an extra $N^{-1}$ factor. This leads to the estimate (4.13) for the term in (4.15).

For the second term in (4.14), we again use Lemma 4.2 to expand the $G_{\mu\nu}$ , $G_{\nu\mu}$ and $G_{\nu\nu}^{-1}$ entries. Our goal is to expand all the $G$ entries into polynomials of the random variables

[TABLE]

so that the $X$ entries and $G^{(\mu\nu)}$ entries are independent in the resulting expression. In particular, the maximally expanded terms (see (4.20)) can be expanded into $S_{\alpha\beta}$ variables directly through (4.2) and (4.3). However, non-maximally expanded terms are also created along the expansions in (4.3) and (4.1). Then we need to further expand these newly appeared terms. In general, this process will not terminate. However, we will show in Lemma 4.8 that after sufficiently many expansions, the resulting expression either has enough off-diagonal terms, or is maximally expanded. In the former case, it suffices to bound each off-diagonal term by $O_{\prec}(\Phi)$ . In the latter case, the expression will only consist of $S_{\alpha\beta}$ variables. Following the argument in the previous paragraph, the expectation over the $X$ entries produces an $N^{-1}$ factor, while the expectation over the $G$ entries produces a $\Phi^{2}$ factor.

Next we give a rigorous proof based on the above arguments.

4.2 Resolvent expansion

To perform the resolvent expansion in a systematic way, we introduce the following notions of string and string operator.

*Definition 4.5** (Strings).*

Let $\mathfrak{A}$ be the alphabet containing all symbols that will appear during the expansion:

[TABLE]

We define a string $\mathbf{s}$ to be a concatenation of the symbols from $\mathfrak{A}$ , and we use $\left\llbracket\bf s\right\rrbracket$ to denote the random variable represented by $\mathbf{s}$ . We denote an empty string by $\emptyset$ with value $\left\llbracket\emptyset\right\rrbracket=0$ .

*Remark 4.6**.*

It is important to distinguish a string $\mathbf{s}$ from its value $\left\llbracket\bf s\right\rrbracket$ . For example, $``G_{\mu\nu}"$ and $``G_{\mu\mu}G_{\nu\nu}^{(\mu)}S_{\mu\nu}"$ are different strings, but they represent the same random variable by (4.3).

We shall call the following symbols the maximally expanded symbols:

[TABLE]

A string $\mathbf{s}$ is said to be maximally expanded if all of its symbols are in $\mathfrak{A}_{\max}$ . We shall call $G_{\mu\nu},G_{\nu\mu},S_{\mu\nu},S_{\nu\mu}$ the off-diagonal symbols and all the other symbols diagonal. By (3.15) and (4.4), we have $\left\llbracket\mathbf{a}_{o}\right\rrbracket\prec\Phi$ if $\mathbf{a}_{o}$ is off-diagonal (we have $S_{\mu\nu}\prec\Phi$ using (4.3)) and $\left\llbracket\mathbf{a}_{d}\right\rrbracket\prec 1$ if $\mathbf{a}_{d}$ is diagonal. We use ${\cal F}_{n{\text{-}}max}(\mathbf{s})$ and ${\cal F}_{\rm{off}}(\mathbf{s})$ to denote the number of non-maximally expanded symbols and the number of off-diagonal symbols, respectively, in $\mathbf{s}$ .

*Definition 4.7** (String operators).*

Let $\alpha\neq\beta\in\{\mu,\nu\}$ .

(i)

We define an operator $\tau_{0}$ acting on a string $\bf s$ in the following sense. Find the first $G_{\alpha\alpha}$ or $G_{\alpha\alpha}^{-1}$ in $\bf s$ . If $G_{\alpha\alpha}$ is found, replace it with $G_{\alpha\alpha}^{(\beta)}$ ; if $G_{\alpha\alpha}^{-1}$ is found, replace it with $(G_{\alpha\alpha}^{(\beta)})^{-1}$ ; if neither is found, set $\tau_{0}(\bf s)=\bf s$ and we say that $\tau_{0}$ is trivial for $\bf s$ .

(ii)

We define an operator $\tau_{1}$ acting on a string $\bf s$ in the following sense. Find the first $G_{\alpha\alpha}$ or $G_{\alpha\alpha}^{-1}$ in $\bf s$ . If $G_{\alpha\alpha}$ is found, replace it with $\frac{G_{\alpha\beta}G_{\beta\alpha}}{G_{\beta\beta}}$ ; if $G_{\alpha\alpha}^{-1}$ is found, replace it with $-\frac{G_{\alpha\beta}G_{\beta\alpha}}{G_{\alpha\alpha}G_{\alpha\alpha}^{(\beta)}G_{\beta\beta}}$ ; if neither is found, set $\tau_{1}(\bf s)=\emptyset$ and we say that $\tau_{1}$ is null for $\bf s$ .

(iii)

The operator $\rho$ replaces each $G_{\alpha\beta}$ in the string $\bf s$ with $G_{\alpha\alpha}G_{\beta\beta}^{(\alpha)}S_{\alpha\beta}$ .

By Lemma 4.2, it is clear that for any string $\bf s$ ,

[TABLE]

Moreover, a string $\mathbf{s}$ is trivial under $\tau_{0}$ and null under $\tau_{1}$ if and only if $\mathbf{s}$ is maximally expanded. Given a string $\bf s$ , we abbreviate ${\mathbf{s}}_{0}:=\tau_{0}(\mathbf{s})$ and ${\bf s}_{1}:=\rho(\tau_{1}(\bf s))$ . For any sequence $w=a_{1}a_{2}\ldots a_{m}$ with $a_{i}\in\{0,1\}$ , we denote

[TABLE]

Then by (4.21) we have

[TABLE]

where the summation is over all binary sequences $w$ with length $|w|=m$ .

*Lemma 4.8**.*

Consider the string $\mathbf{s}=``G_{\mu\mu}G_{\nu\nu}^{(\mu)}S_{\mu\nu}"$ . Let $w$ be any binary sequence with $|w|=4l_{0}$ and such that $\mathbf{s}_{w}\neq\emptyset$ . Then either ${\cal F}_{\rm{off}}(\mathbf{s}_{w})\geq 2l_{0}$ or $\mathbf{s}_{w}$ is maximally expanded.

Proof.

It suffices to show that any nonempty string $\mathbf{s}_{w}$ with ${\cal F}_{\rm{off}}(\mathbf{s}_{w})<2l_{0}$ is maximally expanded. By Definition 4.7, a nontrivial $\tau_{0}$ reduces the number of non-maximally expanded symbols by $1$ , and keeps the number of off-diagonal symbols the same; a $\rho\tau_{1}$ increases the number of non-maximally expanded symbols by $2$ or $3$ , and increases the number of off-diagonal symbols by $2$ . Hence ${\cal F}_{\rm{off}}(\mathbf{s}_{w})<2l_{0}$ implies that there are at most $(l_{0}-1)$ $1$ ’s in $w$ . Those $\rho\tau_{1}$ operators increase ${\cal F}_{n{\text{-}}max}$ at most by $3(l_{0}-1)$ in total. On the other hand, there are at least $3l_{0}$ [math]’s in $w$ , which is sufficient to eliminate all the non-maximally expanded symbols (whose number is at most $3(l_{0}-1)+1=3l_{0}-2$ in total since ${\cal F}_{n{\text{-}}max}(\mathbf{s})=1$ for the initial string). ∎

Now we choose $l_{0}=1+1/\omega$ . Then using $\Phi=O(N^{-\omega/2})$ , we have

[TABLE]

By Lemma 4.8, we see that to prove (4.13), it suffices to show that

[TABLE]

for any maximally expanded string $\mathbf{s}_{w}$ with $|w|=4l_{0}$ . Note that the maximally expanded string $\mathbf{s}_{w}$ thus obtained consists only of the symbols

[TABLE]

By (4.2), we can replace $(G_{\alpha\alpha}^{(\beta)})^{-1}$ with

[TABLE]

Note that $|S_{\alpha\alpha}-N^{-1}\sum_{i}\sigma_{i}\Pi_{ii}|\prec\Phi$ by (4.16). Then we can expand $G_{\alpha\alpha}^{(\beta)}$ as

[TABLE]

We apply the expansions (4.24) and (4.25) to the $G$ symbols in $\mathbf{s}_{w}$ , disregard the sufficiently small tails, and denote the resulting polynomial (in terms of the symbols $S_{\alpha\beta}$ ) by $P_{w}$ . Then $P_{w}$ can be written as a finite sum of maximally expanded strings (or monomials) consisting of the $S_{\alpha\beta}$ symbols. Moreover, the number of such monomials depends only on $l_{0}$ . Hence we only need to prove that for any such monomial $M_{w}$ ,

[TABLE]

Let $N_{\mu}$ ( $N_{\nu}$ ) be the number of times that $\mu$ $(\nu)$ appears as a lower index of the $S$ symbols in $M_{w}$ . We have $N_{\mu}=N_{\nu}=3$ for the initial string $\mathbf{s}=``G_{\mu\mu}G_{\nu\nu}^{(\mu)}S_{\mu\nu}"$ . From Definition 4.7, it is easy to see that the operators $\tau_{0},\tau_{1}$ and $\rho$ do not change the parity of $N_{\mu}$ and $N_{\nu}$ . The expansions (4.24) and (4.25) also do not change the parity of $N_{\mu}$ and $N_{\nu}$ . This leads to the following key observation:

[TABLE]

4.3 A graphical proof

In this subsection, we finish the proof of (4.26). Suppose $M_{w}=C(z)(S_{\mu\mu})^{m_{1}}(S_{\nu\nu})^{m_{2}}(S_{\mu\nu})^{m_{3}}(S_{\nu\mu})^{m_{4}}$ , where $C(z)$ denotes a deterministic function of order $1$ for all $z\in\mathbf{D}$ . Then we write

[TABLE]

To avoid heavy expressions, we introduce the following graphical notations. We use a connected graph $(V,E)$ to represent the string $M_{w}$ , where the vertex set $V$ consists of the indices in (4.28) and the edge set $E$ consists of the $X$ and $G$ variables. The indices $\mu,\nu$ are represented by the black vertices in the graph, while the $i,j$ indices are represented by the white vertices. The $X$ edges are represented by the zig-zag lines and the $G$ edges are represented by the straight lines. One can refer to Fig. 6 for an example of such a graph.

We organize the summation in (4.28) in the following way. We first partition the white vertices into blocks by requiring that any pair of white vertices take the same value if they are in the same block, and take different values otherwise. Then we take the summation over the white blocks which take values in $\mathcal{I}_{2}$ . Finally, we sum over all possible partitions. Note that the number of different partitions depends only on the total number of $S$ variables in $M_{w}$ , which in turn depends only on $l_{0}$ .

Fix a partition $\Gamma$ of the white vertices. We denote its blocks by $b_{1},...,b_{k}$ , where $k$ gives the number of distinct blocks in $\Gamma$ . We denote by $n^{\mu}_{l}$ ( $n_{l}^{\nu}$ ) the number of white vertices in $b_{l}$ that are connected to the vertex $\mu$ ( $\nu$ ). Let $G(\Gamma)$ be the product of all the $G$ edges in the graph. Then we have

[TABLE]

where $\sum^{*}$ denotes the summation subject to the condition that $b_{1},...,b_{k}$ all take distinct values. Note that $k$ , $b_{l}$ , $n_{l}^{\mu}$ and $n_{l}^{\nu}$ all depend on $\Gamma$ , and we have omitted the $\Gamma$ dependence for simplicity of notations.

From (4.28), it is easy to observe that the $X$ edges are independent of $G(\Gamma)$ . Thus taking expectation of (4.29) gives that

[TABLE]

Note that we must have $n_{l}^{\mu}+n_{l}^{\nu}\geq 2$ for $1\leq l\leq k$ , because we only consider nonempty blocks. On the other hand, if all $n_{l}^{\mu}$ are even, then $N_{\mu}=\sum_{l=1}^{k}n_{l}^{\mu}$ must be even, which contradicts (4.27). Hence we can find some $1\leq l_{1}\leq k$ such that $n^{\mu}_{l_{1}}$ is odd and $n^{\mu}_{l_{1}}\geq 3$ . Similarly, we can also find some $1\leq l_{2}\leq k$ such that $n^{\nu}_{l_{2}}$ is odd and $n^{\nu}_{l_{2}}\geq 3$ . We abbreviate $\hat{n}_{l}^{\mu}:=n_{l}^{\mu}\wedge 3$ and $\hat{n}_{l}^{\nu}:=n_{l}^{\nu}\wedge 3.$ From the above discussions, we see that

[TABLE]

Now using the moment assumption (1.22), we can bound (4.30) by

[TABLE]

Next we deal with $|\mathbb{E}G(\Gamma)|$ . We consider the following $3$ cases separately: (i) there are at least $2$ off-diagonal $G$ -edges in $G(\Gamma)$ ; (ii) there is only $1$ off-diagonal $G$ -edge in $G(\Gamma)$ ; (iii) there is no off-diagonal $G$ -edge in $G(\Gamma)$ .

In case (i), we trivially have $|\mathbb{E}G(\Gamma)|\prec\Phi^{2}$ . In case (ii), we use the same trick as in (4.17). Let the off-diagonal $G$ -edge be $G_{ij}^{(\mu\nu)}$ . For each diagonal $G^{(\mu\nu)}_{kk}$ , we replace it with $(G^{(\mu\nu)}_{kk}-\Pi_{kk})+\Pi_{kk}=\Pi_{kk}+O_{\prec}(\Phi).$ Plugging these expansions into $\mathbb{E}G(\Gamma)$ , we obtain that $|\mathbb{E}G(\Gamma)|\prec\Phi^{2}+|\mathbb{E}G_{ij}^{(\mu\nu)}|\prec\Phi^{2},$ where we used (4.18) in the second step. Finally, in case (iii), we have $|\mathbb{E}G(\Gamma)|\prec 1$ . Moreover, $n_{l}^{\mu}+n_{l}^{\nu}$ is even for any $1\leq l\leq k$ . Take $1\leq l_{1},l_{2}\leq k$ such that $n^{\mu}_{l_{1}},n^{\nu}_{l_{2}}$ are odd and $n^{\mu}_{l_{1}},n^{\nu}_{l_{2}}\geq 3$ . If $l_{1}\neq l_{2}$ , then we must have $\hat{n}^{\mu}_{l_{1}}+\hat{n}^{\nu}_{l_{1}}\geq 4,\ \hat{n}^{\mu}_{l_{2}}+\hat{n}^{\nu}_{l_{2}}\geq 4$ , and hence

[TABLE]

Otherwise, if $l_{1}=l_{2}$ , then

[TABLE]

Now applying the above estimates and (4.31) to (4.32), we obtain that

[TABLE]

This concludes the proof of (4.26), and hence finishes the proof of (4.13).

Acknowledgements

The authors would like to thank Zongming Ma for valuable suggestions on statistical applications, which have significantly improved this paper. We are also grateful to the editors and referees for carefully reading our manuscript and suggesting several improvements.

Appendix A Supplementary Material

In the supplementary material, we would like to give the proof of Theorem 3.4, Theorem 3.6 and Lemma 4.4.

For $\mathbf{v},\mathbf{w}\in\mathbb{C}^{\mathcal{I}}$ , $a\in\mathcal{I}$ and an $\mathcal{I}\times\mathcal{I}$ matrix $A$ , we abbreviate

[TABLE]

where $\mathbf{e}_{a}$ denotes the standard unit vector in the coordinate direction $a$ . We shall call them the generalized matrix entries. We sometimes identify vectors $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ and $\mathbf{w}\in\mathbb{C}^{\mathcal{I}_{2}}$ with their natural embeddings $\left({\begin{array}[]{*{20}c}{\mathbf{v}}\\ 0\\ \end{array}}\right)$ and $\left({\begin{array}[]{*{20}c}0\\ \mathbf{w}\\ \end{array}}\right)$ in $\mathbb{C}^{\mathcal{I}}$ . The exact meanings will be clear from the context.

*Lemma A.1**.*

Given any $M\times N$ matrix $Y$ , the following estimates and identities hold for $G\equiv G(Y,z)$ :

[TABLE]

for some constant $C>0$ , and for $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ and $\mathbf{w}\in\mathbb{C}^{\mathcal{I}_{2}}$ ,

[TABLE]

These estimates remain true for $G^{(\mathbb{T})}$ instead of $G$ for any $\mathbb{T}\subseteq\mathcal{I}$ .

Proof.

These estimates and identities can be proved through simple calculations using the spectral decomposition of $G$ . The reader can also refer to, for example, [31, Lemma 4.6], [50, Lemma 3.5] and [14, Lemma A.3]. ∎

A.1 Proof of Lemma 4.4

For $z\in\mathbf{D}$ , we have

[TABLE]

where we abbreviate $G_{1}(z):=G(X_{1},z)$ and $V:=\left({\begin{array}[]{*{20}c}{0}&B\\ B^{*}&{0}\\ \end{array}}\right)$ . Then we expand $G$ using the resolvent expansion

[TABLE]

We need to estimate the last three terms of the right-hand side. First, note that by (A.2)-(A.5) and (3.14), we have for $z\in\mathbf{D}$ ,

[TABLE]

for any $\mathbf{v}\in\mathbb{C}^{\mathcal{I}}$ and $\mathbb{T}\subseteq\mathcal{I}$ with $|\mathbb{T}|=O(1)$ .

For any unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}}$ , we have

[TABLE]

where in the second step we used (3.14) for $G_{1}$ , in the third step the Cauchy-Schwarz inequality and (4.6), and in the last step (A.7). With a similar argument, we obtain that

[TABLE]

Combining (A.9) with the rough bound (A.1) for $G$ , we get that

[TABLE]

where we used $\eta\geq N^{-1}$ for $z\in\mathbf{D}$ in the last step. Plugging the estimates (A.8)-(A.10) into (A.6), we conclude that

[TABLE]

for all deterministic unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}}$ .

A.2 Proof of Theorem 3.4

By Lemma 4.4, we can assume that the entries of $X$ are centered without loss of generality. According to the comments below Theorem 3.4, we can repeat the proof in [14] to get the entrywise local law (3.16) and the averaged local law (3.13). Then combining (3.16), the moment assumption (1.22) for $X$ and the arguments in Section A.4 below, we can obtain the anisotropic local law (3.14) for $G(X,z)$ . Hence we focus on proving the bound (3.15). In fact, (3.15) clearly follows from Lemma 4.4 and the next two lemmas combined with the polarization identity.

*Lemma A.2**.*

Let $X$ be an $M\times N$ real random matrix whose entries are independent random variables satisfying (4.5), (1.22), and the bounded support condition (3.12) with $q\leq N^{-\phi}$ for some constant $\phi>0$ . If (3.16) and (3.13) hold uniformly in $z\in\mathbf{D}$ , then the following local law also holds uniformly in $z\in\mathbf{D}$ :

[TABLE]

*Lemma A.3**.*

Suppose the assumptions in Lemma A.2 hold. Let $\Phi(z)$ be a deterministic function on $\mathbf{D}$ satisfying $c_{1}(N^{-1/2}+q^{2})\leq\Phi(z)\leq N^{-c_{1}}$ for some constant $c_{1}>0$ . If we have

[TABLE]

uniformly in $z\in\mathbf{D}$ , then

[TABLE]

uniformly in $z\in\mathbf{D}$ and deterministic unit vectors $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ or $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ .

In next two subsections, we give the proof of Lemma A.2 and Lemma A.3. Note that if we suppose (3.16) holds, then using (A.2)-(A.5) and (4.4), it is easy to verify that for $z\in\mathbf{D}$ ,

[TABLE]

for any $a\in{\mathcal{I}}$ and $\mathbb{T}\subseteq\mathcal{I}$ with $|\mathbb{T}|=O(1)$ .

A.3 Proof of Lemma A.2

We only prove

[TABLE]

where $\Pi_{ij}=-(1+m_{2c}\sigma_{i})^{-1}\delta_{ij}$ . The proof for (A.12) with $a,b\in\mathcal{I}_{2}$ is exactly the same. First, we recall the following large deviation bounds proved in [18].

*Lemma A.4** (Lemma 3.8 of [18]).*

Let $(x_{i})$ , $(y_{i})$ be independent families of centered and independent random variables, and $(A_{i})$ , $(B_{ij})$ be families of deterministic complex numbers. Suppose the entries $x_{i}$ and $y_{j}$ have variance $O(N^{-1})$ and satisfy (3.12) with $N^{-1/2}\leq q\leq N^{-\phi}$ for some fixed $\phi>0$ . Then for $K=O(N)$ , we have the following bounds:

[TABLE]

where $B_{d}:=\max_{i}|B_{ii}|$ and $B_{o}:=\max_{i\neq j}|B_{ij}|.$

In fact, these bounds are stated in slightly stronger forms in [18] with a different notion for high probability events. Here we choose to present (A.17)-(A.19) in the form of stochastic domination, which is more convenient for our use. Moreover, if we assume the fourth moment of $x_{i}$ is bounded for all $i$ as in (1.22), then we have a better bound for the LHS of (A.19).

*Lemma A.5**.*

Suppose the assumptions in Lemma A.4 hold and $x_{i}$ , $1\leq i\leq K$ , satisfy (1.22). Then we have

[TABLE]

Proof.

We abbreviate $z_{i}:=\left(|x_{i}|^{2}-\mathbb{E}|x_{i}|^{2}\right)B_{ii}/B_{d}$ . By Markov’s inequality, it suffices to prove that for any fixed $p\in\mathbb{N}$ ,

[TABLE]

Note that by the assumption, we have

[TABLE]

Now we expand the LHS of (A.21) as

[TABLE]

where we denote $y_{i_{l}}:=z_{i_{l}}$ for $1\leq l\leq p$ and $y_{i_{l}}:=\bar{z}_{i_{l}}$ for $p+1\leq l\leq 2p$ . To organize the summation over the indices $i_{1},\ldots,i_{2p}$ , we look at the partitions $\Gamma$ of the set of the labels $\{1,...,2p\}$ according to the equivalence relation that $k,l$ are in the same class if and only if $i_{k}=i_{l}$ . We use $b_{l}$ , $1\leq l\leq k$ , to denote the equivalence classes of $\Gamma$ and $n_{l}$ to denote the size of $b_{l}$ . Obviously, $k$ , $b_{l}$ and $n_{l}$ all depend on $\Gamma$ , but we will omit this dependence in the following expressions. Moreover, since the random variables are centered, we must have $n_{l}\geq 2$ for all $l$ to attain a nonzero expectation. Hence we have

[TABLE]

where $\sum^{*}$ denotes the summation subject to the conditions that $b_{1},\ldots,b_{k}$ are all distinct, $n_{l}\geq 2$ for all $l$ , and $\sum_{l=1}^{k}n_{l}=2p$ . Note that under these conditions, we trivially have $k\leq p$ .

Using (A.22), we obtain that

[TABLE]

Since the number of partitions of $\{1,...,2p\}$ is finite and depends only on $p$ , (A.23) can then be bounded by

[TABLE]

where in the last step, $q^{4p}$ and $N^{-p}$ can be obtained from the extreme cases $k=0$ and $k=p$ , respectively. This concludes (A.21). ∎

Now using (4.3) and (A.17), we get that for $i\neq j\in\mathcal{I}_{1}$ ,

[TABLE]

where we used (3.16), (4.4) and the bound (A.15). For the diagonal estimate, we need to control the $Z$ variables

[TABLE]

Using (A.18) and (A.20), we get that

[TABLE]

where we used (3.16), (4.4) and (A.15) again. Then with (4.2), we get that

[TABLE]

where in the second step we used (A.26), (3.13) and (4.4) (with $\tilde{\Phi}=q+\Psi$ ). Together with (A.24), we conclude (A.16).

A.4 Proof of Lemma A.3

We only prove (A.14) for $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}$ . The proof for the case with $\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{2}}$ is exactly the same. Note that by (A.13), we immediately get $\sum_{i}|v_{i}|^{2}\left(G_{ii}-\Pi_{ii}\right)\prec\Phi$ . Hence it remains to prove that

[TABLE]

By Markov’s inequality, it suffices to show that

[TABLE]

for any fixed $p\in\mathbb{N}$ . The proof of (A.27) is similar to the ones in [9, Section 5] and [50, Section 5]. The main difference is that in [9, 50], the matrix entries are assumed to have arbitrarily high moments, while here we assume that the $X$ entries have finite third moment and support bounded by $q$ . In particular, for any fixed $n\geq 3$ , we have

[TABLE]

(Note that we have a stronger moment assumption in (1.22). However, the finite fourth moment condition will not be used in the proof below. We only need the weaker bound (A.28).) We remark that some of the basic ideas have been illustrated in Section 4 of the main article.

We first rewrite the product in (A.27) as

[TABLE]

where $\Gamma$ ranges over all partitions of the set of labels $\{i_{1},...,i_{2p},j_{1},...,j_{2p}\}$ with the restriction that $i_{k},j_{k}$ cannot be in the same equivalence class for all $k$ , $\{b_{1},...,b_{n}\}$ is the set of equivalence classes for a fixed $\Gamma$ , $\Gamma(\cdot)$ is regarded as a mapping from the set of labels to the set of equivalence classes, and $\sum^{*}$ denotes the summation subject to the condition that $b_{1},\ldots,b_{n}$ all take distinct values and $\Gamma(i_{k})\neq\Gamma(j_{k})$ for all $k$ . Since the number of such partitions $\Gamma$ is finite and depends only on $p$ , it suffices to prove that for any fixed $\Gamma$ ,

[TABLE]

We abbreviate

[TABLE]

For simplicity, we shall omit the overline for complex conjugate in the following proof. In this way, we can avoid a lot of immaterial notational complexities that do not affect the proof.

For $k=1,...,n$ , we denote by $\deg(b_{k},P)$ the number of times that $b_{k}$ appears as an index of the $G$ entries in $P$ , i.e. $\deg(b_{k},P):=|\Gamma^{-1}(b_{k})|$ . We define $h:=\#\{1\leq k\leq n:\deg(b_{k},P)=1\}$ , i.e. $h$ is the number of $b_{k}$ ’s that only appear once in the indices of $P$ . Without loss of generality, we assume these $b_{k}$ ’s are $b_{1},...,b_{h}$ . Then we have the following properties:

[TABLE]

Now we claim that

[TABLE]

Note that by $\|\mathbf{v}\|_{2}=1$ and Cauchy-Schwarz inequality, we have $\sum_{i}|v_{i}|\leq\sqrt{M}$ and $\sum_{i}|v_{i}|^{n}\leq 1$ for $n\geq 2$ . Then if (A.31) holds, we can bound the left hand side of (A.29) by

[TABLE]

Hence it suffices to prove (A.31).

We define the $S$ variables as

[TABLE]

for $i,j\in\mathcal{I}_{1}$ and $L:=\{b_{1},...,b_{n}\}$ . As in (A.24) and (A.26), we can verify that $|S_{ij}-\sigma_{i}m_{2c}\delta_{ij}|\prec\Phi$ for $i,j\in\mathcal{I}_{1}$ using (A.13), (4.4) and Lemmas A.4-A.5. Then as in Section 3.3 of the main article, we keep expanding the $G$ entries in $P$ using the resolvent expansions in Lemma 4.2, until each monomial in the expression either consists of $S$ variables only or has sufficiently many off-diagonal terms. The following lemma has been proved in [9, Lemma 5.9] and [50, Lemma 5.9].

*Lemma A.6**.*

After finitely many expansions, we can write $P$ as

[TABLE]

where $A\in\mathbb{N}$ depends only on $p$ and $c_{1}$ (recall that $\Phi(z)\leq N^{-c_{1}}$ by our assumption), $c_{\alpha}$ ’s are constants of order $O(1)$ , and $Q_{\alpha}$ are monomials of $S$ variables only, where the number of $S$ variables in each $Q_{\alpha}$ depends only on $p$ and $c_{1}$ . Moreover, we have that

[TABLE]

for $k=1,...,n$ and $\alpha=1,...,A$ , and the number of off-diagonal $S$ variables in $Q_{\alpha}$ is at least $2p$ . Here $\deg_{o}(b_{k},Q_{\alpha})$ denotes the number of times that $b_{k}$ appears as an index of the off-diagonal $S$ variables in $Q_{\alpha}$ , and $\deg_{o}(b_{k},P):=\deg(b_{k},P)$ (which is consistent with the previous definition since $P$ only contains off-diagonal entries).

Now given the expansion in (A.33), we see that to conclude (A.31), it suffices to show that for any $Q_{\alpha}$ ,

[TABLE]

In the following proof, we fixe one such $Q\equiv Q_{\alpha}$ and write

[TABLE]

where $J$ is the number of $S$ -variables in $Q$ , $W$ ranges over all partitions of the set of indices $\{\mu_{1},...,\mu_{J},\nu_{1},...,\nu_{J}\}$ , $\{w_{1},...,w_{m}\}$ denotes the set of equivalence classes for a particular $W$ , $W(\cdot)$ is regarded as a symbolic mapping from the set of indices to the set of equivalence classes, and $\sum^{*}$ denotes the summation subject to the condition that $w_{1},\ldots,w_{m}$ all take distinct values. Note that the number of partitions $W$ depends only on $J$ . For a fixed partition $W$ , we denote

[TABLE]

Then to prove (A.35), it suffices to show that

[TABLE]

for any partition $W$ .

To facilitate the proof, we introduce the graphical notations as in Section 3.4 of the main article. We use a connected graph $(V,E)$ to represent $R$ , where the vertex set $V$ consists of black vertices $b_{1},\ldots,b_{n}$ and white vertices $w_{1},\ldots,w_{m}$ , and the edge set $E$ consists of $(k,\alpha)$ edges representing $X_{b_{k}w_{\alpha}}$ and $(\alpha,\beta)$ edges representing $G^{(L)}_{w_{\alpha}w_{\beta}}$ . We denote

[TABLE]

Note that to attain a nonzero expectation, we must have

[TABLE]

We also define

[TABLE]

Then we have

[TABLE]

By (A.30), (A.37) and the parity conservation due to (A.34), there exist edges $(1,\alpha_{1}),...,(h,\alpha_{h})$ such that $e_{k\alpha_{k}}$ is odd and $e_{k\alpha_{k}}\geq 3$ , $1\leq k\leq h$ . Let $H:=\{(1,\alpha_{1}),...,(h,\alpha_{h})\}$ be the set of these edges. Denote by $F$ the set of $(k,\alpha)$ edge such that $e_{k\alpha}\geq 2$ and $(k,\alpha)\notin H$ . Denote

[TABLE]

for all $k=1,...,n$ and $\alpha=1,...,m$ . By the above definitions, we have $s_{\alpha}\geq 2$ and $h_{\alpha}+f_{\alpha}>0$ (since the classes $w_{\alpha}$ are nonempty), $s_{\alpha}\geq 2d_{\alpha}$ , and

[TABLE]

Note that there are $\frac{1}{2}\sum_{k,\alpha}e_{k\alpha}-\sum_{\alpha}d_{\alpha}$ off-diagonal $G$ edges in $R$ . Hence by (A.13) and (A.28), we have

[TABLE]

Now we consider the following four cases for $R_{\alpha}$ .

(i)

$d_{\alpha}=0$ . In this case we have

[TABLE]

where in the third step we used $h_{l}+f_{l}>0$ , and in the fourth step we used

[TABLE]

where we used that $e_{k\alpha}^{(o)}\geq h_{k\alpha}$ for $1\leq k\leq h$ (recall that if $(k,\alpha)\in H$ , then $e_{k\alpha_{k}}$ is odd and hence at least one of the edges must come from the off-diagonal $S$ ).

(ii)

$d_{\alpha}\neq 0$ , $h_{\alpha}=1$ and $f_{\alpha}=0$ . Then there is only one $k$ such that $e_{k\alpha}>0$ and $s_{\alpha}=e_{k\alpha}$ is odd. Hence we have $s_{\alpha}/2\geq d_{\alpha}+1/2$ and we can bound $R_{\alpha}$ as

[TABLE]

where in the last step we used

[TABLE]

since all the summands except one $h_{k\alpha}$ are [math].

(iii)

$d_{\alpha}\neq 0$ , $h_{\alpha}=0$ and $f_{\alpha}=1$ . Then there is only one $k$ such that $e_{k\alpha}>0$ and $s_{\alpha}=e_{k\alpha}$ . Thus the $(\alpha,\alpha)$ edges are expanded from the diagonal $S$ variables (otherwise $\alpha$ must connect to at least two different $k$ ’s), which implies $\frac{1}{2}s_{\alpha}-d_{\alpha}=\frac{1}{2}e_{k\alpha}^{(o)}$ . Then we can bound $R_{\alpha}$ by

[TABLE]

where, as in Case (i), we used $e_{k\alpha}^{(o)}\geq h_{k\alpha}$ for $1\leq k\leq h$ .

(iv)

$d_{\alpha}\neq 0$ and $h_{\alpha}+f_{\alpha}\geq 2$ . Then using $s_{\alpha}\geq 2d_{\alpha}$ , $q\prec\Phi^{1/2}$ and $N^{-1/2}\prec\Phi$ , we get that

[TABLE]

where in the last step we used the definitions of $s_{\alpha}$ and $h_{\alpha}$ , $e_{k\alpha}\geq 2h_{k\alpha}$ for $1\leq k\leq h$ (since $e_{k\alpha}\geq 3$ whenever $h_{k\alpha}=1$ ), and $h_{k\alpha}=0$ for $k\geq h+1$ .

Combining the above four cases, we obtain that

[TABLE]

Recall that $\sum_{\alpha}h_{\alpha}=h$ . Then to prove (A.36), it remains to show that

[TABLE]

For $k=1,...,h$ , using (A.39) and (A.30) we get that

[TABLE]

For $k=h+1,...,n$ , using (A.38) and (A.34) we get that

[TABLE]

With (A.30), we then conclude (A.40), which finishes our proof.

A.5 Proof of Theorem 3.6

In this subsection, we prove Theorem 3.6. By Lemma 4.4, we can assume that the entries of $X$ are centered without loss of generality.

Our main strategy for the proof is a resolvent comparison method that was developed in [34, Section 6]. Given $X$ satisfying the assumptions in Theorem 3.6, we first construct a random matrix $\tilde{X}$ whose entries have the same first four moments as those of $X$ but have size of order $N^{-1/2}$ .

*Lemma A.7** (Lemma 5.1 of [34]).*

Suppose $X$ satisfies the assumptions in Theorem 3.6. Then there exists another matrix $\tilde{X}=(\tilde{X}_{i\mu})$ such that $\mathbb{P}(\max_{i,\mu}|\tilde{X}_{i\mu}|\leq CN^{-1/2})=1$ for some constant $C>0$ and the first four moments of the entries of $X$ and $\tilde{X}$ match, i.e.

[TABLE]

Taking $q=N^{-1/2}$ in (3.17), we see that (3.18) holds for $G(\tilde{X},z)$ . Then due to (A.41), we expect that $G(X,z)$ has “similar” properties as $G(\tilde{X},z)$ , so that (3.18) also holds for $G(X,z)$ . This will be proved through a resolvent comparison approach that is developed in [34, Sections 6] and [14, Section 6]. More specifically, we will apply the Lindeberg replacement strategy, i.e., we change $\tilde{X}$ to $X$ entry by entry and show that the error (due to the resolvent expansion) appears at each step is negligible. In this subsection, we introduce some notations that will simplify the presentation of the proof.

Fix a bijective ordering map $\Phi$ on the index set of $X$ ,

[TABLE]

For any $1\leq\gamma\leq\gamma_{\max}$ , we define the matrix $X^{\gamma}=(X^{\gamma}_{i\mu})$ such that $X_{i\mu}^{\gamma}=X_{i\mu}$ if $\Phi(i,\mu)\leq\gamma$ , and $X_{i\mu}^{\gamma}=\tilde{X}_{i\mu}$ otherwise. Note that $X^{0}=\tilde{X}$ , $X^{\gamma_{\max}}=X$ , and $X^{\gamma}$ has bounded support $q\leq N^{-\phi}$ for all $0\leq\gamma\leq\gamma_{\max}$ . Correspondingly, we define

[TABLE]

where $Y^{\gamma}:=\Sigma^{1/2}X^{\gamma}$ . Note that $H^{\gamma}$ and $H^{\gamma-1}$ differ only at $(i,\mu)$ and $(\mu,i)$ entries, where $\Phi(i,\mu)=\gamma$ . Then we define two $\mathcal{I}\times\mathcal{I}$ matrices $V$ and $W$ by

[TABLE]

such that $H^{\gamma}$ and $H^{\gamma-1}$ can be written as

[TABLE]

for some $\mathcal{I}\times\mathcal{I}$ matrix $Q$ satisfying $Q_{i\mu}=Q_{\mu i}=0$ .

For simplicity, for any $1\leq\gamma\leq\gamma_{\max}$ , we denote the resolvents by

[TABLE]

We often omit the superscript if $\gamma$ is fixed. By (A.43), we can write

[TABLE]

Thus we can expand $S$ using the resolvent expansion

[TABLE]

On the other hand, we can also expand $R$ in terms of $S$ :

[TABLE]

We can get similar expansions for $T$ and $R$ by replacing $V$ , $S$ with $W$ , $T$ in (A.48) and (A.49).

By the bounded support conditions for $X$ and $\tilde{X}$ , we have

[TABLE]

Note that $S$ , $R$ and $T$ satisfy the following deterministic bounds by (A.1):

[TABLE]

Then using expansion (A.49) in terms of $T,W$ with $m=3$ , the isotropic local law (3.14) for $T$ , and the bound (A.51) for $R$ , we can get that for any fixed unit vectors $\mathbf{u},\mathbf{v}\in\mathbb{C}^{\mathcal{I}}$ , $|R_{\mathbf{u}\mathbf{v}}|=O(1)$ with high probability. Thus there exists a uniform constant $C_{1}>0$ such that with high probability,

[TABLE]

From the definitions of $V$ and $W$ , one can see that it is helpful to introduce the following notations to simplify the expressions.

*Definition A.8** (Matrix operators $_{\gamma}$ ).

For $\mathcal{I}\times\mathcal{I}$ matrices $A$ and $B$ , we define $A*_{\gamma}B$ as

[TABLE]

We denote the $m$ -th power of $A$ under $*_{\gamma}$ -product by $A^{*_{\gamma}m}$ , i.e.,

[TABLE]

*Definition A.9** ( $\mathcal{P}_{\gamma,\mathbf{k}}$ and $\mathcal{P}_{\gamma,k}$ ).*

For $k\in\mathbb{N}$ , $\mathbf{k}=(k_{1},\cdots,k_{s})\in\mathbb{N}^{s}$ and $1\leq\gamma\leq\gamma_{\max}$ , we define

[TABLE]

where we abbreviate $G_{\mathbf{u}\mathbf{v}}^{*_{\gamma}(k+1)}\equiv(G^{*_{\gamma}(k+1)})_{\mathbf{u}\mathbf{v}}$ . If $\mathfrak{G}_{1}$ and $\mathfrak{G}_{2}$ are products of resolvent entries as above, then we define

[TABLE]

Note that $\mathcal{P}_{\gamma,k}$ and $\mathcal{P}_{\gamma,\mathbf{k}}$ are not linear operators, but just notations we use for simplification.

Using Definition A.9, we may write, for example,

[TABLE]

For $k,s\in\mathbb{N}$ and $\mathbf{k}\in\mathbb{N}^{s+1}$ , it is easy to verify that

[TABLE]

where $|\mathbf{k}|=\sum_{t=1}^{s}k_{t}$ . For the second equality, note that $\mathcal{P}_{{\gamma},s}G_{\mathbf{u}\mathbf{v}}$ is a sum of the products of the $G$ entries, where each product contains $s+1$ entries.

Proof of Theorem 3.6.

Now we prove (3.18) with the resolvent comparison method. The basic idea is that we expand $S$ and $T$ in terms of $R$ by repeatedly applying the expansions (A.48) and (A.49), and then compare the resulting expressions. The main terms will cancel since $X_{i\mu}$ and $\tilde{X}_{i\mu}$ have the same first four moments, and the error terms are small since $X_{i\mu}$ and $\tilde{X}_{i\mu}$ have support bounded by $N^{-\phi}$ .

The proof of Lemma A.10 is almost the same as the one for [34, Lemma 6.5]. In fact, we can copy their arguments almost verbatim, except for some notational differences. Hence we omit the details. In the following expressions, for any $\mathbf{k}=(k_{1},\ldots,k_{p})\in\mathbb{N}^{p}$ , we use $|\mathbf{k}|=\sum k_{i}$ to denote its $l^{1}$ -norm.

*Lemma A.10**.*

Suppose $z\in\mathbf{D}$ and $\gamma=\Phi(i,\mu)$ . Fix any $p\in\mathbb{N}$ and $r>0$ . Then for $S,R$ in (A.44), we have

[TABLE]

where $A_{k}$ , $0\leq k\leq 4$ , depend only on $R$ , $\mathcal{A}_{\mathbf{k}}$ ’s are independent of $(\mathbf{u}_{t},\mathbf{v}_{t})$ , $1\leq t\leq s$ , and we have the bound

[TABLE]

It is obvious that a result similar to Lemma A.10 also holds for the product of $T$ entries. As in (A.58), we define the notation $\mathcal{A}^{\gamma,a}$ , $a=0,1$ as follows:

[TABLE]

Since $A_{k}$ , $0\leq k\leq 4$ , depend only on $R$ and $X_{i\mu}$ , $\tilde{X}_{i\mu}$ have the same first four moments, we get from (A.60) and (A.61) that

[TABLE]

where we abbreviate $G:=G(X,z)$ and $\tilde{G}:=G(\tilde{X},z)$ .

Applying (A.62) with $p=1$ , $r=3$ and fixed unit vector $\mathbf{u}_{t}=\mathbf{v}_{t}=\mathbf{v}\in\mathbb{C}^{\mathcal{I}_{1}}\ \text{or}\ \mathbb{C}^{\mathcal{I}_{2}}$ , we obtain that

[TABLE]

Using (A.52) and (A.59), we can bound the sum in (A.63) by

[TABLE]

(Here we need to apply the Lemma 2.2 (iii) of the main article, and hence need a second moment bound for $|\mathcal{P}_{\gamma,{k}}G^{\gamma-a}_{\mathbf{v}\mathbf{v}}|$ . This follows easily from (A.51).) Recall that $\mathcal{P}_{\gamma,k}G^{\gamma-a}_{\mathbf{v}\mathbf{v}}$ is also a sum of the products of $G$ entries. Then applying (A.62) to $|\mathbb{E}\mathcal{P}_{\gamma,k}G^{\gamma-a}_{\mathbf{v}\mathbf{v}}|$ and replacing $\gamma_{\max}$ with $\gamma-a$ , we obtain that

[TABLE]

Together with (A.63) and (A.59), we get that

[TABLE]

Again using (A.52) and (A.59), we obtain that

[TABLE]

where we used that $k+|\mathbf{k}^{\prime}|\geq 10$ . Repeating this process, we can make the remainder term smaller and smaller. At the end, we obtain that

[TABLE]

where

[TABLE]

Using (A.59), we obtain that

[TABLE]

Now we complete the proof of (3.18) using the estimate (A.68) and the bound (3.18) for $G^{0}=\tilde{G}$ . We see that it suffices to control the terms

[TABLE]

for $\mathbf{k}_{1},\ldots,\mathbf{k}_{n}$ satisfying (A.67). By definition of ${\mathcal{P}}$ , (A.69) is a sum of at most $C^{\sum|\mathbf{k}_{i}|}$ products of $G_{\mathbf{v}b}$ , $G_{b\mathbf{v}}$ and $G_{ab}$ entries, where the total number of $G$ entries in each product is at most $\sum{|\mathbf{k}_{i}|}+1=O(\phi^{-2})$ . Due to the deterministic bound (A.51), (A.69) is always bounded by $N^{O(\phi^{-2})}$ , and hence Lemma 2.2 (iii) of the main article can be applied.

For each product in (A.69), there are two $\mathbf{v}$ ’s in the indices of $G$ . These two $\mathbf{v}$ ’s appear as $G_{\mathbf{v}a}G_{b\mathbf{v}}$ in the product, where $a,b$ come from some $\gamma_{k}$ and $\gamma_{l}$ ( $1\leq k,l\leq n$ ) via ${\mathcal{P}}$ . Thus after taking the average $N^{-2}\sum_{\gamma_{k}}$ and $N^{-2}\sum_{\gamma_{l}}$ , the term $G_{\mathbf{v}a}G_{b\mathbf{v}}$ contributes a factor $O_{\prec}((N\eta)^{-1})$ by (A.15) and Cauchy-Schwarz inequality. For all other $G$ factors in the product with no $\mathbf{v}$ ’s, we control them by $O_{\prec}(1)$ using (A.52). Thus for any fixed $\gamma_{1},\ldots,\gamma_{n}$ , $\mathbf{k}_{1},\ldots,\mathbf{k}_{n}$ , we have proved that

[TABLE]

Then using (A.68) and (3.18) for $\tilde{G}$ , we obtain that

[TABLE]

This then concludes the proof of Theorem 3.6 by polarization. ∎

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. W. Anderson. Asymptotic theory for principal component analysis. The Annals of Mathematical Statistics , 34(1):122–148, 1963.
2[2] Z. D. Bai. Convergence rate of expected spectral distributions of large random matrices. part II. sample covariance matrices. Ann. Probab. , 21(2):649–672, 1993.
3[3] Z. D. Bai, B. Q. Miao, and G. M. Pan. On asymptotics of eigenvectors of large sample covariance matrix. Ann. Probab. , 35(4):1532–1572, 2007.
4[4] Z. D. Bai and J. W. Silverstein. No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. , 26:316–345, 1998.
5[5] Z. D. Bai and J. W. Silverstein. Spectral Analysis of Large Dimensional Random Matrices , volume 2 of Mathematics Monograph Series . Science Press, Beijing, 2006.
6[6] Z. Bao, G. Pan, and W. Zhou. Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Statist. , 43:382–421, 2015.
7[7] Z. G. Bao, G. M. Pan, and W. Zhou. Local density of the spectrum on the edge for sample covariance matrices with general population. Preprint , 2013.
8[8] P. Bianchi, M. Debbah, M. Maida, and J. Najim. Performance of statistical tests for single-source detection using random matrix theory. IEEE Trans. Inform. Theory , 57:2400–2419, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Convergence of eigenvector empirical spectral distribution of sample covariance matrices

Abstract

keywords:

keywords:

1 Introduction and main results

1.1 Main results

Lemma 1.1* (Support of the deformed MP law).*

Definition 1.2* (Regularity).*

Remark 1.3*.*

Assumption 1.4*.*

Theorem 1.5**.**

Corollary 1.6**.**

Proof of Corollary 1.6.

Remark 1.7*.*

Remark 1.8*.*

Remark 1.9*.*

Remark 1.10*.*

2 Simulations and applications

2.1 Simulations

2.2 Statistical applications

2.2.1 Detection of signals in noise

2.2.2 Separable covariance matrices

2.2.3 Eigenvectors of population covariance matrices

3 Proof of Theorem 1.5

3.1 Anisotropic local Marčenko-Pastur law

Definition 3.1* (Stochastic domination).*

Lemma 3.2* (Lemma 3.2 in [9]).*

Definition 3.3* (Bounded support condition).*

Theorem 3.4** (Local MP law).**

Theorem 3.5**.**

Theorem 3.6**.**

Theorem 3.7** (Rigidity of eigenvalues).**

3.2 Convergence rate of the VESD

Proof of (1.24).

Proof of (1.25).

4 Proof of Theorem 3.5

Definition 4.1* (Minors).*

Lemma 4.2* (Resolvent identities).*

Proof.

Lemma 4.3*.*

Proof.

Lemma 4.4*.*

4.1 Sketch of the proof for Theorem 3.5

4.2 Resolvent expansion

Definition 4.5* (Strings).*

Remark 4.6*.*

Definition 4.7* (String operators).*

Lemma 4.8*.*

Proof.

4.3 A graphical proof

Acknowledgements

Appendix A Supplementary Material

Lemma A.1*.*

Proof.

A.1 Proof of Lemma 4.4

A.2 Proof of Theorem 3.4

Lemma A.2*.*

Lemma A.3*.*

A.3 Proof of Lemma A.2

Lemma A.4* (Lemma 3.8 of [18]).*

Lemma A.5*.*

Proof.

A.4 Proof of Lemma A.3

Lemma A.6*.*

A.5 Proof of Theorem 3.6

Lemma A.7* (Lemma 5.1 of [34]).*

Definition A.8* (Matrix operators ∗γ*_{\gamma}∗γ​).*

Definition A.9* (Pγ,k\mathcal{P}_{\gamma,\mathbf{k}}Pγ,k​ and Pγ,k\mathcal{P}_{\gamma,k}Pγ,k​).*

Proof of Theorem 3.6.

Lemma A.10*.*

*Lemma 1.1** (Support of the deformed MP law).*

*Definition 1.2** (Regularity).*

*Remark 1.3**.*

*Assumption 1.4**.*

Theorem 1.5.

Corollary 1.6.

*Remark 1.7**.*

*Remark 1.8**.*

*Remark 1.9**.*

*Remark 1.10**.*

*Definition 3.1** (Stochastic domination).*

*Lemma 3.2** (Lemma 3.2 in [9]).*

*Definition 3.3** (Bounded support condition).*

Theorem 3.4 (Local MP law).

Theorem 3.5.

Theorem 3.6.

Theorem 3.7 (Rigidity of eigenvalues).

*Definition 4.1** (Minors).*

*Lemma 4.2** (Resolvent identities).*

*Lemma 4.3**.*

*Lemma 4.4**.*

*Definition 4.5** (Strings).*

*Remark 4.6**.*

*Definition 4.7** (String operators).*

*Lemma 4.8**.*

*Lemma A.1**.*

*Lemma A.2**.*

*Lemma A.3**.*

*Lemma A.4** (Lemma 3.8 of [18]).*

*Lemma A.5**.*

*Lemma A.6**.*

*Lemma A.7** (Lemma 5.1 of [34]).*

*Definition A.8** (Matrix operators $_{\gamma}$ ).

*Definition A.9** ( $\mathcal{P}_{\gamma,\mathbf{k}}$ and $\mathcal{P}_{\gamma,k}$ ).*

*Lemma A.10**.*