Joint CLT for top eigenvalues of sample covariance matrices of separable high dimensional long memory processes
Peng Tian

TL;DR
This paper derives the joint central limit theorem for the top eigenvalues of sample covariance matrices from high-dimensional long memory processes with separable dependence, extending previous single-eigenvalue results.
Contribution
It extends prior work by establishing the joint CLT for multiple top eigenvalues in high-dimensional long memory settings with Toeplitz dependence structures.
Findings
Joint CLT for top eigenvalues of sample covariance matrices.
Spectral gap properties for largest eigenvalues.
Delocalization of eigenvectors associated with top eigenvalues.
Abstract
For , consider the sample covariance matrix from a data set , where is a matrix having i.i.d. entries with mean zero and variance one, and are deterministic positive semi-definite Hermitian matrices of dimension and , respectively. We assume that is bounded in spectral norm, and is a Toeplitz matrix with its largest eigenvalues diverging to infinity. The matrix can be viewed as a data set of an -dimensional long memory stationary process having separable dependence structure. As and , we establish the asymptotics and the joint CLT for where denotes the th largest eigenvalue of , and is a fixed integer. For the CLT, we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Joint CLT for top eigenvalues of sample covariance matrices of separable high dimensional long memory processes
Peng TIAN 111Department of Statistics and Actuarial Science, The University of Hong Kong. Email: [email protected]
Abstract
For , consider the sample covariance matrix
[TABLE]
from a data set , where is a matrix having i.i.d. entries with mean zero and variance one, and are deterministic positive semi-definite Hermitian matrices of dimension and , respectively. We assume that is bounded in spectral norm, and is a Toeplitz matrix with its largest eigenvalues diverging to infinity. The matrix can be viewed as a data set of an -dimensional long memory stationary process having separable dependence structure.
As and , we establish the asymptotics and the joint CLT for where denotes the th largest eigenvalue of , and is a fixed integer. For the CLT, we first study the case where the entries of are Gaussian, and then we generalize the result to some more generic cases. This result substantially extends our previous result in [28], where we studied in the case where and with having Gaussian entries.
In order to establish this CLT, we are led to study the first order asymptotics of the largest eigenvalues and the associated eigenvectors of some deterministic Toeplitz matrices. We are specially interested in the autocovariance matrices of long memory stationary processes. We prove multiple spectral gap properties for the largest eigenvalues and a delocalization property for their associated eigenvectors.
1 Introduction
Background and related work.
For , we set , where is a matrix having i.i.d. entries with mean zero and variance one, is a deterministic positive semi-definite Hermitian matrix, and be a positive semi-definite Toeplitz matrix. Then models a sample data set of an -dimensional stationary process with a separable dependence structure. If the entries of satisfies the following power decay condition:
[TABLE]
or the spectral density of has a power singularity at [math]:
[TABLE]
where and are positive, locally bounded functions which are slowly varying at , then the process is long range dependent (LRD) or has long memory (LM) (see for example [31]). Note that the conditions (1.1) and (1.2) are not equivalent but tightly related, see Section 2.2.5 of [31].
From the sample data matrix , we construct the sample covariance matrix
[TABLE]
Let be a fixed integer. We will study the asymptotics and joint CLT of the largest eigenvalues of as and .
For convenience of further discussion, we define the notation by
[TABLE]
where is a deterministic positive semi-definite Hermitian matrix. Then in (1.3) will be denoted as , and if is identity, is the classical sample covariance matrix.
The classical sample covariance model has been intensively studied in the last decades. These studies are mostly concentrated on the global behaviors of the spectrum, including limiting spectral distribution (LSD) ([27, 37, 21, 40, 35, 34]) and CLT for linear spectral statistics ([3, 4, 18, 29]); and also the local behaviors of individual eigenvalues ([7, 19, 20, 15, 24, 8, 23, 5, 6]).
Recently several models of matrices with having a small number of divergent eigenvalues have been considered, in the context of principal component analysis (PCA) [22, 33, 38, 11] and long memory processes [28]. Although the assumptions in these various works differ, many results coincide with the degenerated case of Bai and Yao [6] after normalization (see for example [28]).
The model assumes that the columns of the data matrix are i.i.d. However this is not always the case in the practical applications. The separable model introduces a special type of correlations between columns, or different weights on columns, achieving certain balance between generality and simplicity. So it attracts more and more attention nowadays. A first result on this model is due to Zhang [43] on the LSD of . She proved that if the empirical spectral distribution (ESD) of and the ESD of converge weakly to and respectively, then as , the ESD of will converge weakly to a non-random probability measure for which if or , then ; otherwise for each , the Cauchy-Stieltjes transform of , together with another two functions, denoted by and , is the unique solution in the set
[TABLE]
to the following system of equations
[TABLE]
Later, Paul and Silverstein [30] proved in the case where is diagonal, that almost surely, for large enough , there is no eigenvalue of in any closed interval outside the support of the limiting spectral distribution (LSD). This is an extension of the results of [2] for . Couillet and Hachem [12] studied the analytical properties of the LSD when and , including the determination of the support of , extending the work of Choi and Silverstein [36] for to the separable model . The CLT for linear spectral statistics has also been studied by Bai et al. [1] and Li et al. [25].
Regarding the extreme eigenvalues of , far less is known compared to the classical model . In [39], Yang proved the edge universality under the condition that the densities of the LSD’s have a regular square-root behavior at the rightmost edge (soft edge). With this result, if we find the fluctuations of the largest eigenvalue at a soft edge in the case where the entries are Gaussian, then the fluctuations at a soft edge in general cases will be determined. However, even in the Gaussian case, these fluctuations are still unknown.
The general spiked eigenvalues of are also a new topic in the recent studies. In [42], a very particular case of this problem has been touched. One can refer to Remark 2.4 for the relations between the concerned results of [42] and the ours. During the preparation of the present paper, we learned that a newly submitted paper [14] treated the general spike separable model with general Hermitian matrices having a finite number of spikes. The authors studied the asymptotics and large deviations (instead of joint CLT) of spiked eigenvalues of , and the associated eigenvectors. The main restriction of [14] is that they assumed that and are all bounded in spectral norm, and that the spectrums of both and do not concentrate at zero, also that the number of spikes is finite. These assumptions exclude our model from applying their results.
Introduction to the results.
The present paper aims at studying the asymptotics and the joint CLT of (with an arbitrary fixed integer) largest eigenvalues of . The basic idea is analogous to the previous article [28] but we extend substantially the results of that paper.
In order to study the largest eigenvalues of , the asymptotics of largest eigenvalues and the associated eigenvectors of satisfying (1.1) or (1.2) will be studied. In [28] we proved that the ESD of converges weakly to a non-compact supported measure, thus the largest eigenvalues of diverges to infinity. We also studied the asymptotic behavior of largest eigenvalues of if it satisfies (1.1), and proved that as and is fixed, where is a compact operator defined in (2.1) below. By proving the simplicity of , we proved also the spectral gap property for the largest eigenvalue of , that is, is bounded by a constant smaller than . However, it is well known that the two conditions (1.1) and (1.2) are not equivalent without the quasi-monotone conditions on or . For example, Gubner [17] gave counterexamples in both directions of implication. In this paper, we will prove that if satisfies (1.2), without assuming quasi-monotonicity of , we have
[TABLE]
as , where is the Gamma-function. We will also prove that all nonzero eigenvalues of are simple, using a method totally different from [28]. In consequence, the spectral gap property holds for any finite number of largest eigenvalues of . Furthermore, we will study the relation between the eigenvectors of and the eigenfunctions of , and prove that the eigenvectors associated with the largest eigenvalues of are delocalized. These results may have independent interests.
Using the results on Toeplitz matrices, we study the asymptotics and fluctuations of largest eigenvalues of as with . We first prove the following asymptotic behavior of for any fixed . Assume that the entries of have finite fourth moment, and some other conditions, then for any , in probability,
[TABLE]
Moreover, if the entries ’s are Gaussian, (1.7) holds almost surely.
Then we will build the joint CLT of largest eigenvalues for the generic model which implies in particular the following results: suppose that satisfies 1.1 or 1.2, and has i.i.d. standard Gaussian entries, with satisfies some mild conditions, then as with ,
[TABLE]
where when the entries of are real Gaussian, and when the entries of are complex Gaussian, and are determined by some equations. We then generalize the result to the non-Gaussian case.
If the entries of are not Gaussian, it is more complicated. The CLT of eigenvalues of depends also on the eigenvectors of and . We will prove a general theorem implying that, when is diagonal, and the parameter of defined in (1.1) or (1.2) belongs to , that is, the decay of the correlation of the process is sufficiently slow, then (1.8) still holds with the same and , where is the entry of in the first row and first column. Note that if is real with variance one, or if is complex with , we will get the same CLT as the Gaussian case. This phenomenon is due to the delocalization of the largest eigenvectors of Toeplitz matrices .
Organizations.
This paper is organized as follows. In Section 2 we state our main theorems. This section is divided in three parts. In 2.1, we state the results on Toeplitz matrices ; in 2.2, we state the asymptotics of the largest eigenvalues of ; in 2.3 we state the CLT for largest eigenvalues of in the case where are diagonal, or where are Gaussian; in 2.4, we present some generalizations of the CLT with non-diagonal . The other sections contains the proofs of these results.
Notations.
For a Hermitian operator or matrix , we denote its real eigenvalues by decreasing order as
[TABLE]
For a matrix or a vector , we use to denote the transpose of , and the conjugate transpose of . For a matrix , we denote the ESD of by , which is defined by , where is the Dirac measure at .
The kernel of a linear operator is denoted by . The spectrum of is denoted by . We denote the or norm by . For a matrix or a linear operator , the operator norm of induced by vector norm is denoted by , and we recall that . The or norm will be abbreviated as . We say that a function or a vector is "normalized" or "unit length" when or . When functions or vectors are said to be "orthonormal", they will be implicitly considered as elements of a Hilbert space.
For two probability measures and on , we denote their Lévy-Prokhorov distance by which is defined by
[TABLE]
where is defined by
[TABLE]
It is well known that this distance metrizes the weak convergence. For two random variables with distributions , respectively, we sometimes write , or which all mean .
Given , we denote by the integer satisfying . Given two sequences of non-negative numbers , means that . If are random variables, the notation means that in probability. The notations and denote convergence in distribution and in probability, respectively. If are measures, we denote with a slight abuse of notation for the weak convergence of to .
In order to estimate some quantities we need sometimes split it into several parts. When we use to denote these parts, their definition is limited in the same proof, the same section or subsection.
Definition 1**.**
We say that a sequence of events hold with high probability, if ; with low probability, if ; with overwhelming probability, if for any , ; with tiny probability, if for any , .
The cardinal of a set is denoted by . In the proofs we use to denote a constant that may take different values from one place to another. If the constant depends on some parameter , we denote the constant by .
Acknowledgement.
Part of this work was completed during my PhD study, and was supported financially by Université Paris-Est and Labex Bézout. I would like to thank gratefully my Ph-D advisers Professor Florence MÈRLEVEDE and Professor Jamal NAJIM. Also thanks to Professor Jianfeng YAO in the University of Hong Kong for fruitful discussions.
2 Main theorems
2.1 Spectral properties of Toeplitz matrices
We collect our results on Toeplitz matrices in this section. Let satisfy (1.1). Let be the operator defined on by
[TABLE]
In [28], we have established the relation between the eigenvalues of and the eigenvalues of . From the proof of Theorem 2.3 of [28] we know that the operator is compact and positive semi-definite. It has infinitely many positive eigenvalues. And if is defined by (1.1), then for any , we have
[TABLE]
Using the min-max formula for the largest eigenvalue and an argument by absurd, we have also proved in [28] that is simple, so that we proved the spectral gap property for the largest two eigenvalues of :
[TABLE]
In this paper, using a different method, we will prove that all non-zero eigenvalues of are simple. As a consequence we prove the multiple spectral gap property for any th largest eigenvalue of :
[TABLE]
Proposition 2.1**.**
All non-zero eigenvalues of the operator defined by (2.1) with are simple, and the associated eigenfunctions are continuous in .
We note that is self-adjoint, so for any non-zero eigenvalue , its algebraic multiplicity equals to its geometric multiplicity, which is defined as . For more information about algebraic multiplicity, see [26]. So here to say that a non-zero eigenvalue is simple, means
[TABLE]
In the next proposition, we provide a quantitative description of the eigenvectors associated with for any fixed .
Proposition 2.2**.**
Let satisfy (1.1). For any , let be the normalized eigenfunction of associated with , and be a normalized eigenvector of associated with . Then, up to a change of sign, we have
[TABLE]
From this proposition we deduce the delocalization of eigenvector associated with for any fixed . Indeed by (2.4), for large enough , we have
[TABLE]
and because is continuous on , we have . Thus we conclude that
[TABLE]
The above propositions also applies to some Toeplitz matrices satisfying (1.2). It is well known that if satisfies (1.2) with quasi-monotonic (see Section 2.2.5 of [31] for definition), then also satisfies (1.1) with
[TABLE]
where is the Gamma-function. In particular, if , then tends to a constant as , and (2.2), Proposition 2.2 hold.
Without the condition of quasi-monotonicity on or , the conditions (1.1) and (1.2) are not equivalent. See [17] for counterexamples in both directions. However, thanks to the following theorem, the above results can be extended to defined by (1.2) with general slowly varying function .
Theorem 2.3**.**
Let and be Toeplitz matrices both satisfying (1.2) with the same , with an arbitrary slowly varying function for , and with for . Then
[TABLE]
In consequence, for any fixed , we have
[TABLE]
Moreover, if (resp. ) is the eigenvector of (resp. ) associated with the th largest eigenvalue, then
[TABLE]
Let . The following proposition provides the decay of moments of the ESD for satisfying (1.1) or (1.2). This result shows that the with parameter satisfies A9 or A10 below, and is needed in the proof of Theorem 2.10.
Proposition 2.4**.**
Let be Toeplitz matrix satisfying (1.1) or (1.2), and let .
If , then
[TABLE] 2. 2.
If , then
[TABLE]
2.2 Convergence of largest eigenvalues of separable sample covariance matrix
For , let
[TABLE]
where are and deterministic positive semi-definite Hermitian matrices, and is a matrix having i.i.d. entries . Let
[TABLE]
be the eigenvalues of and respectively. Let be a fixed integer. We assume that the following assumptions hold:
- A1
The entries satisfy
[TABLE] 2. A2
The spectral norm is bounded in , and the ESD of converges weakly as , to a probability measure . 3. A3
There exists a decreasing sequence of positive numbers
[TABLE]
converging to [math] such that for any , we have
[TABLE]
Note that under A3, we have , and for any ,
[TABLE]
For further use, we will prove a concentration inequality for the largest eigenvalues of with the following conditions.
- A4
The matrices are diagonal:
[TABLE] 2. A5
The matrices are diagonal:
[TABLE] 3. A6
(Bound condition) There exists a sequence of positive numbers such that almost surely for large enough ,
[TABLE]
Remark 2.1*.*
We take two examples for which the bound condition A6 holds. The first case is where for some . In this case, we have
[TABLE]
where we have assumed that the convergence rate of to [math] is slower than any preassigned rate. Then by Borel-Cantelli’s Lemma, the bound condition holds. The second case is where , and does not depend on for any fixed . In other words, are all from an infinite double array . In this case, by the truncation lemma 2.2 of [41], the bound condition holds.
Recall that we use to denote .
Proposition 2.5**.**
Let be defined as (2.11). Under A1, A2 and A3, for any , we have
[TABLE]
Moreover, if ’s are real or complex Gaussian, or if A4, A5 and A6 hold, then the above convergence holds almost surely.
Remark 2.2*.*
The almost sure convergence under A4, A5 and A6 is in fact a byproduct of Lemma 4.1 which is needed in the proof of CLT 2.7. However this does not allow to conclude the a.s. convergence when ’s are Gaussian. Indeed if the entries of are i.i.d real Gaussian variables, and if or are complex and non-diagonal, then we cannot diagonalize or because the real Gaussian vectors are not unitary invariant. Thus we will proceed an independent proof for Gaussian case with help of a Gaussian concentration inequality.
Applying the above generic result to the special case of , we obtain the following result:
Corollary 2.6**.**
Let be a sequence of Toeplitz matrices satisfying (1.1) or (1.2). Let be defined as before. Then if A1, A2 hold, for any fixed we have
[TABLE]
Moreover, if ’s are standard real or complex Gaussian, then the above convergence holds almost surely.
2.3 CLT for largest eigenvalues: Diagonal & Gaussian case
In this section, we assume that , are diagonal, and study the CLT for largest eigenvalues of . As a corollary, we obtain the result for Gaussian case.
- A7
The sixth moment of the entries is finite:
[TABLE] 2. A8
The largest eigenvalues of satisfy the multiple spectral gap property:
[TABLE]
For we define
[TABLE]
For , let be the largest solution of the equation
[TABLE]
Remark 2.3*.*
Note that if not all ’s are [math], and if , then from the graph of the function , we see that the equation on admits real solutions.
Moreover, we prove that the largest solution of (2.13) tends to . Indeed, we know that under the assumptions A3 and A8, we have
[TABLE]
and the assumption A2 ensures that
[TABLE]
Also note that for every fixed ,
[TABLE]
Thus for any , let , , then we can see that asymptotically the largest solution of the equation (2.13) is between and .
We define
[TABLE]
Theorem 2.7**.**
Under A1, A2, A3, A4, A5 and A7, A8, we have
[TABLE]
For general non-diagonal and , note that if are standard complex Gaussian, or if are standard real Gaussian and are both real, then the eigenvalues of have the same joint distribution with the eigenvalues of
[TABLE]
Therefore the CLT 2.7 applies to the Gaussian case no matter whether are diagonal. More particularly, applying the above result to with the Toeplitz matrix defined as before, we get the following corollary.
Corollary 2.8**.**
Let be a sequence of Toeplitz matrices satisfying (1.1) or (1.2). Let be defined as before. Let be defined by replacing with in (2.14). Assume that A1 and A2 hold. Then, if are standard complex Gaussian, or if are standard real Gaussian and is real, we have
[TABLE]
where in real Gaussian case, and in complex Gaussian case.
If , then we can see that
[TABLE]
In general, has not a closed expression. However, can be expressed as a power series of .
Proposition 2.9**.**
For a fixed , let , let the coefficients be defined by the recurrent formula
[TABLE]
Suppose that not all the eigenvalues of are zero. Then the power series
[TABLE]
is the solution of the equation
[TABLE]
Its radius of convergence satisfies
[TABLE]
where we make the convention that .
From this proposition, we have, under the conditions A2, A3 and A8, for large enough ,
[TABLE]
By the recurrent formula, we obtain
[TABLE]
So we have
[TABLE]
Remark 2.4*.*
In [28, Example 2.3], we have given the various orders of when , so in general can not be replaced by a finite form. However in some particular cases, we can replace by a partial sum of its Taylor’s expansion. For example, when satisfies A9 below, we have . Thus we can replace by . One can see that the model in [42] is in this case when their (Theorem 3 of [42]), because their major spiked population eigenvalues are asymptotically as , where denotes the dimension (Lemma 1 and 2 of [42]). That is, with our notations,
[TABLE]
And by calculating where is defined in (2.4) of [42], we have
[TABLE]
Similarly, when satisfies A10 below, can be replaced by
[TABLE]
2.4 CLT for largest eigenvalues: Some generalizations
In this section we generalize the CLT to non-diagonal . We continue to assume the other assumptions, and moreover, we assume that one of the two following assumptions holds:
- A9
[TABLE] 2. A10
[TABLE]
Remark 2.5*.*
Under A3, because almost all the eigenvalues of (except for at most a finite number of them) are smaller than , the condition A9 is stronger than A10. They are some indicators who measure the degree of concentration of the eigenvalues near zero. If satisfies A9, then its eigenvalues are more concentrated near [math] than the case where it just satisfies A10.
From Proposition 2.4, we can see that for , the normalized Toeplitz matrix satisfies A10, and for , satisfies A9.
Theorem 2.10**.**
Under A1, A2, A3, A4, A8, and either A9 or A10, we have
[TABLE]
Where with
[TABLE]
and is a normalized eigenvector associated with .
Remark 2.6*.*
In view of the expression (2.16), it is not clear that the covariance matrix converges. In order to avoid any cumbersome assumption enforcing this convergence, we express the CLT with the help of Lévy-Prokhorov’s distance. If however it happens that converges to some matrix , then we conclude the CLT in the following usual form
[TABLE]
From Proposition 2.2 and Theorem 2.3, if is a Toeplitz matrix satisfying (1.1) or (1.2), we can see that the eigenvectors of are delocalized, i.e. . So we have
[TABLE]
Also because are real, we have
[TABLE]
So we get the following result:
Corollary 2.11**.**
Let be a sequence of Toeplitz matrices satisfying (1.1) or (1.2). Let be defined as before. Suppose that A1, A2, A4 hold. If one of the following is satisfied:
The parameter belongs to and ; 2. 2.
The parameter belongs to and ,
then we have
[TABLE]
3 Proofs of the theorems on Toeplitz matrices
In Section 3.1, 3.2 and 3.3 we focus on Toeplitz matrices satisfying (1.1). In Section 3.4 we treat the Toeplitz matrices satisfying (1.2). And in Section 3.5 we prove Proposition 2.4 for Toeplitz matrices satisfying either (1.1) or (1.2).
3.1 Some preparation
Let satisfy (1.1). Note that by the definition of slowly varying function, is positive for sufficiently large.
For , and for sufficiently large such that , we define a finite-rank operator acting on by
[TABLE]
The operator in (2.1) is also well-defined for any by the integral formula:
[TABLE]
The operators and acting on are bounded, see [28, Lemma 5.4]. Moreover, from Lemma 5.4 of [28], we have the convergence
[TABLE]
The convergence (3.3) has many useful consequences in this proof. The first consequence is that the operator is compact on for any .
For each , (resp. ) has its spectrum as an operator acting on . The following proposition shows that its non-zero eigenvalues and the associated eigenfunctions are invariant as changes.
Proposition 3.1**.**
The non-zero eigenvalues and the associated eigenfunctions of and do not change when runs across .
Proof.
We only prove the result for . The same argument applies to .
We only need to prove that, for any , the operator has the same non-zero eigenvalues and associated eigenfunctions as . As we have already noticed that is compact on and on , the desired result is a direct application of Theorem 4.2.15 in [13].
Indeed, we recall that two Banach spaces and or their associated norms are said to be compatible if is dense in each of them, and the following condition is satisfied: if , and , then . The operators with and two Banach spaces, are said to be consistent if for all . Then we can verify that and are compatible, and defined by an integral formula is obviously consistent. Then Theorem 4.2.15 in [13] applies. ∎
According to the above proposition, when we talk about the non-zero eigenvalues and the associated eigenfunctions of these operators, we do not need to specify the space.
3.2 Proof of Proposition 2.1
Let be an eigenvalue of and be an associated eigenfunction. We now prove that is continuous on .
Note that satisfies the equation
[TABLE]
and from Proposition 3.1, also belongs to . So for any , one has
[TABLE]
and the integral on the RHS tends to [math] when .
We now prove that all non-zero eigenvalues of are simple. We need the following key lemma. It says that any normalized eigenfunction of associated with a non-zero eigenvalue, taken at , has the absolute value .
Lemma 3.2**.**
Let be a non-zero eigenvalue of , and let be a normalized eigenfunction associated with . Then satisfies
[TABLE]
A result similar to the above lemma first appeared in [32] for a general but square integrable kernel , see Theorem 3 of [32]. Note that thanks to the explicit formula of , the result of Lemma 3.2 is stronger than [32]. Directly using Theorem 3 of [32], we can only conclude that for , for any non-zero eigenvalue of , there exists a group of orthonormal eigenfunctions associated with , where is the multiplicity of , such that
[TABLE]
However we will notice that this result is not sufficient to prove the simplicity of eigenvalues.
Whenever Lemma 3.2 is proved, we can prove the simplicity of any non-zero eigenvalue of by contradiction. Assume to the contrary that had multiplicity , then we could choose two orthonormal eigenfunctions associated with . From Lemma 3.4, without loss of generality we can assume that . Then the function
[TABLE]
is also a normalized eigenfunction of . But this function satisfies
[TABLE]
which is a contradiction to Lemma 3.2.
Thus it remains to prove Lemma 3.2.
Proof of Lemma 3.2.
We follow the outline of the proof in [32]. For any , we define the operator on by
[TABLE]
By a change of variable, it is easy to see that a function is an eigenfunction of associated with an eigenvalue if and only if is an eigenfunction of associated with the eigenvalue . By this fact, a positive number is an eigenvalue of with multiplicity if and only if is an eigenvalue of with the same multiplicity for all .
Suppose that is a normalized eigenfunction of associated with non-zero eigenvalue . Then for any we have the following two equations
[TABLE]
and
[TABLE]
We define the function on by
[TABLE]
then is a continuous extension of on . Multiply the two sides of (3.6) by , and integrate for , we get
[TABLE]
Note that by the boundedness of , Fubini Theorem applies to the RHS, thus changing the order of two integrations and taking into account the definition of , we get
[TABLE]
It is easy to see from (3.9) that
[TABLE]
Letting on the two sides of (3.10), and noting that the continuity of on implies the uniform convergence of to , we get
[TABLE]
and the result follows. ∎
3.3 Proof of Proposition 2.2
Let be an integer. By the proof of Theorem 2.3 in [28], we have . Also by Proposition 2.1, we have . Let be a normalized eigenfunction associated with . In the sequel, we shall rely on the spectral projections (to be defined later) to construct an eigenvector of associated with and prove that such an eigenvector approximates in the sense of (2.4).
Let and be the circle centered at and of radius on complex plane. We take sufficiently large such that . So we have for all , which implies that only the eigenvalues and are enclosed by and all the other eigenvalues are outside . We define the spectral projections
[TABLE]
By Riesz decomposition Theorem (c.f. for example [13, Theorem 1.5.4 and Theorem 4.3.19]), (resp. ) is a projection onto the eigenspace of (resp. )) corresponding to (resp. ). To those who are unfamiliar with Riesz’ Theorem, we explain the arguments with and . Indeed, from Riesz’ Theorem, is a finite rank projection which commutes with . Let be the range of , then is an invariant space of (due to the commutativity of the projection and ), and the restriction of to is self-adjoint (because is self-adjoint) and has spectrum , then from the finite dimensional linear algebra, is spanned by the eigenfunctions of associated with . Therefore, recall that is a normalized eigenfunction of associated with , we have . The same argument shows that is a projection to the eigenspace of and thus is an eigenfunction of associated with , in condition that .
We prove that . Indeed we have
[TABLE]
Thus the main task is to uniformly control in term of for . As is analytic outside of , there exists such that . Let be sufficiently large such that . Then we have
[TABLE]
Thus as we have
[TABLE]
From this convergence we have
[TABLE]
Then from (3.13) we obtain
[TABLE]
Combining (3.13) and (3.14) we conclude
[TABLE]
Notice that the range of consists of step functions
[TABLE]
so the eigenfunctions of must also have this form. Notice also that a -dimensional normalized vector is an eigenvector of associated with if and only if the normalized function
[TABLE]
is an eigenfunction of associated with . Since is a normalized eigenfunction of , by the relation (3.16), up to a change of sign, we have
[TABLE]
From (3.15) we get the desired result (2.4).
3.4 Proof of Theorem 2.3
Let be a Toeplitz matrix with spectral density satisfying (1.2). Let be a Toeplitz matrix with spectral density . Let be the Dirichlet kernel. From the theory of Toeplitz matrices, we have
[TABLE]
By the inequality (2.2.1) of [31], for a certain , there exists such that
[TABLE]
Then because is locally bounded, we have
[TABLE]
The same argument also gives
[TABLE]
Combining the last two inequalities, and using the triangle inequality, we get
[TABLE]
Thus in order to prove that (3.18) tends to [math], we only need to prove that
[TABLE]
By changing variables we write
[TABLE]
Let
[TABLE]
From the properties of Dirichlet kernels, there exists a constant such that for any and ,
[TABLE]
For any , let be a large enough positive number to be determined afterwards. Then
[TABLE]
Because is bounded by , and also by (3.19), we have
[TABLE]
Using (3.19) and Young’s convolution inequality, let , we have
[TABLE]
Let , then there exists such that
[TABLE]
By the uniform convergence theorem for slowly varying functions (see for example (2.2.4) of [31]), for large enough , we have
[TABLE]
Then using Young’s inequality again, we have
[TABLE]
Therefore, there exists such that, for any , for large enough ,
[TABLE]
and the proof of (2.6) is complete.
The convergence (2.7) is an immediate consequence of (2.6), (2.2) and (2.5). To prove (2.8), using the spectral projections and repeat the same procedure as in the proof of Proposition 2.2 with norm, the result then follows.
3.5 Proof of Proposition 2.4
First we prove Item 1. Let satisfies (1.1) or (1.2) and assume . From (2.2) or (2.7), there exists a constant such that
[TABLE]
with . Since , we have
[TABLE]
Then we prove Item 2. Let satisfy (1.1) with . Note that
[TABLE]
Also from (2.2), we have for some , where for two sequences of positive numbers and , the notation means that . We then have
[TABLE]
Since , it follows that
[TABLE]
Hence
[TABLE]
Now let satisfy (1.2) with . From Theorem 2.3, there exists such that . From the formula
[TABLE]
we have
[TABLE]
where is the Féjer kernel. Let , and be such that . Then we have . From the form of , we have . By Young’s inequality,
[TABLE]
Note that
[TABLE]
Then
[TABLE]
4 Proofs of Proposition 2.5 and 2.9
4.1 Proof of Proposition 2.5
Let be a fixed integer. We first prove the convergence of in probability. Suppose that where is a unitary matrix whose columns are . Recall that A3 holds. For any sufficiently small, let be the smallest integer such that , where . let be large enough such that for .
Let . Then we have
[TABLE]
As , from [41] we know that for large , with high probability,
[TABLE]
Thus from the stability of spectrum of Hermitian matrices, with high probability, we have
[TABLE]
So we only need to prove that
[TABLE]
The matrix has the same non-zero eigenvalues with the matrix
[TABLE]
Then because is a fixed-dimensional matrix, it suffices to prove that each of its entries converges in probability. Note that
[TABLE]
where are the diagonal entries of . Because is uniformly bounded, we have
[TABLE]
Thus we have
[TABLE]
Combine with the equality
[TABLE]
we obtain the convergence in probability.
Assume that are standard real or complex Gaussian and prove the almost sure convergence. We argue similarly as the proof of Proposition 4.1 of [28]. Precisely we will prove that for any ,
[TABLE]
where is a constant. Indeed using [10, Theorem 5.6] we only need to prove that the function
[TABLE]
is -Lipschitz with respect to the Frobenius norm . Let be two matrices, and let
[TABLE]
Then by Wielandt-Hoffmann inequality for singular values, we have
[TABLE]
Thus we have
[TABLE]
This proves the Lipschitz property and the concentration inequality (4.1) holds. Then by Borel-Cantelli’s lemma, we have
[TABLE]
Together with the convergence in probability
[TABLE]
the almost sure convergence in the Gaussian case follows.
We now assume that the bound condition A6 holds and prove the following lemma which will be useful in Section 5. As a byproduct, this lemma implies the almost sure convergence of .
Lemma 4.1**.**
Under A1, A2, A3, A4, A5 and A6, for any and any , with overwhelming probability,
[TABLE]
For the definition of "overwhelming probability" or "tiny probability", refer to Definition 1.
Proof.
We can repeat the first part of the proof of Proposition 2.5 and we just need to verify that each "high probability" can be replaced by "overwhelming probability" under the assumptions of this lemma. Let be defined as above. From Theorem 3.1 of [40], we know that with overwhelming probability,
[TABLE]
Then we only need to prove that under the assumptions of this lemma, for any , with overwhelming probability,
[TABLE]
As are diagonal, the above inequality is actually
[TABLE]
Let
[TABLE]
Note that if , and if . We assume that , because otherwise we have and almost surely, then (4.2) holds almost surely, and there is nothing to prove.
Using Bennett’s inequality (8b) of [9], for any , one has
[TABLE]
As , let be such that , then we have
[TABLE]
where are some positive constants. Because is an almost sure upper bound of , we can assume that . Then for any fixed , and for large enough ,
[TABLE]
Then the result follows. ∎
5 Proof of Theorem 2.7
In this section and Section 6, in order to simplify the notation, we omit the subscription and of and , so they are just denoted as and . We also simplify the notation as , and denote the eigenvalues of by
[TABLE]
We prove the CLT for largest eigenvalues of in the following steps. First we truncate, recenter and rescale the entries of so that where is a sequence of positive numbers tending to [math]. The truncation step is identical to the approach used in the proof of Theorem 1.1 of [3], from where we know that this does not affect the result. So from now on we assume that A6 holds.
Then in order to prove the weak convergence of , it suffices to prove that for any fixed vector , we have
[TABLE]
where the random vector follows the limiting distribution . For each , we prove that the inequality
[TABLE]
is equivalent to
[TABLE]
for some random variable which is expressed by the entries of . Then we determine the limiting distribution of . Then the result follows by using Slutsky’s Theorem.
Reformulation of the eigenvalue inequality (5.2).
We begin to rewrite the inequality (5.2). For further use we temporarily do not suppose that and are diagonal. So this part is shared with Section 6. We suppose that where and is unitary. By normalizing and , we assume without loss of generality that and . We set
[TABLE]
Then satisfies the equation
[TABLE]
Under A8, applying Proposition 2.5 to both and , for a small enough , with high probability, for we have
[TABLE]
and since the th largest eigenvalue of is , which tends to , we have
[TABLE]
We denote the above evenement by . Suppose that happens. Then is not the eigenvalue of , and the equation (5.3) is equivalent to
[TABLE]
Note that the matrix is of rank one, so the equation is in fact
[TABLE]
Moreover, note that for large enough. Note also that
[TABLE]
for , and with holds, the denominator and the numerator of change sign times respectively on . So we deduce that
[TABLE]
and changes sign in exactly at . Thus for large enough such that
[TABLE]
the inequality
[TABLE]
is equivalent to
[TABLE]
If it happens that , then and , and the inequality (5.5) is in fact
[TABLE]
Let the LHS of the above inequality be , then the procedure of rewriting (5.2) is complete.
We now assume that , then we have . We recall some results from [12]. By Proposition 1.1 of [12], for any , the system of equations
[TABLE]
has a unique solution such that . Define
[TABLE]
By plugging the first equation of (5.7) into the second one, and replacing the discrete integrals by sums, we can see that is the unique solution of the equation
[TABLE]
such that . Let
[TABLE]
Then (resp. ) is the Cauchy-Stieltjes transform of the probability measure (resp. ), which is the asymptotic equivalent of (resp. ). See also (1.6)-(1.10) of [1]. Moreover by Lemma 3.3 of [12], for any , the limit exists. Let be defined as
[TABLE]
[TABLE]
Then a main result of [12] says that a non-zero real number is outside the support of , if and only if
[TABLE]
Now we relate the function with . By the definition of , we have
[TABLE]
Because of the assumptions A2, A3, also because the distance between and is bounded away from [math], there is a complex neighborhood of such that
[TABLE]
uniformly for and . So there is a neighborhood of ( may take different values from one place to another) such that for large enough, the function is holomorphic in . Some calculation shows that
[TABLE]
which is also holomorphic in for large enough. Moreover we have
[TABLE]
uniformly for as . So for large enough and for , we have
[TABLE]
From Remark 2.3, we have . Whenever , by holomorphic implicit function theorem [16, Ch. 1, Th 7.6], there exists a holomorphic function , defined in a complex neighborhood of , such that
[TABLE]
Some calculations similar to those between (6)-(8) of [12] gives
[TABLE]
for . Then by the unicity of solution of the function for , we have for . This proves that .
Moreover from (5.9) and Proposition 3.2 of [12] we deduce that for large enough, the point is in an interval who lies outside the support of for large enough. For any , we have
[TABLE]
Note that , we rewrite the inequality (5.5) as
[TABLE]
By Lemma 3.4 of [12], for any . Then we can apply Lagrange’s Mean Value Theorem and get
[TABLE]
where is a number between and . As and both tend to , we also have . Then from the formula (5.10), as ,
[TABLE]
Plugging (5.12), (5.13) into (5.11), and multiplying the two sides by , also note that , the inequality (5.11) can be written as
[TABLE]
Using the formula , and letting
[TABLE]
we can rewrite (5.14) as
[TABLE]
Let
[TABLE]
and
[TABLE]
In the following, we prove the CLT for , and prove that , under the diagonal condition A4, A5 in this section, and under A9 or A10 in Section 6, respectively.
CLT for and estimation of the remainder .
We now assume that are diagonal. Then we have
[TABLE]
and by the CLT for independent random variables, we have
[TABLE]
Now we prove that . Let denote the th column of . As are diagonal, we have
[TABLE]
where is defined in (5.15). Note that and are independent, then by Lemma B.1 in [1], we have
[TABLE]
By Proposition 2.2, for any , with high probability, there is only a finite number of eigenvalues of larger than , and the distance between and the spectrum of is bounded away from [math]. So one has
[TABLE]
where denotes the Frobenius norm, and recall that for any matrices . This proves that . We also conclude that , because for any ,
[TABLE]
and by Dominated Convergence Theorem,
[TABLE]
Then we prove that . We set
[TABLE]
Let . We note that for large enough, is analytic in the complex disc of , and with high probability, is also analytic in this disc. If we can prove that for ,
[TABLE]
Then, because
[TABLE]
with high probability, and note that , the result follows. The proof of (5.17) uses some techniques of [1, Section 3], and is postponed to Appendix A.
6 Proof of Theorem 2.10
In this section we extend our result to some non-diagonal ’s. We prove the CLT (2.15) under the condition A9 or A10.
We can prove that in any subsequence of the sequence
[TABLE]
there is still a subsequence converging to [math]. Note that the entries of are bounded, we can assume that they converge, i.e.
[TABLE]
Then we are led to proving that
[TABLE]
Let and . Let be the th column of . From the last section, the proof of the theorem can be done by proving that
[TABLE]
where
[TABLE]
and that for any ,
[TABLE]
where is defined in (5.15).
We use Cramér-Wold device to prove the CLT of the -dimensional vector . By a direct calculation, the covariance matrix of is exactly which tends to as we have assumed. Then for any fixed vector , we prove that
[TABLE]
If , it means that . Then and hence (6.1) holds. Now we assume that and prove that
[TABLE]
Note that the rows of are i.i.d., then
[TABLE]
is a weighted sum of i.i.d. random variables. We can use Lindeberg’s CLT to prove (6.2). To do so, we need to verify the Lindeberg condition
[TABLE]
as for any . Since the quantities in the expectations are identically distributed for different , and since
[TABLE]
we only need to prove
[TABLE]
Since and since is uniformly bounded, for any , the events
[TABLE]
occur with low probability. By Minkowski’s inequality, we have
[TABLE]
Since is a fixed number, we only need to prove that for each ,
[TABLE]
Since and from the uniform boundedness of we have , then (6.3) is equivalent to
[TABLE]
This is a corollary of the following lemma.
Lemma 6.1**.**
Let such that . Let be a sequence of i.i.d. random variables satisfying , , . Let be a sequence of events such that , then
[TABLE]
Proof.
We only prove the case where and are real. For the complex case, it can be easily proved from the real case by separating real and complex parts, and then using Minkowski’s inequality.
As all the random variables are identically distributed and integrable, they are uniformly integrable. Thus we have
[TABLE]
Let be a sequence of positive numbers tending to [math] such that
[TABLE]
Note that from we have
[TABLE]
We write
[TABLE]
By Minkowski’s inequality, we have
[TABLE]
For the first part, using again Minkowski’s inequality and noting that , we get
[TABLE]
For the second part, it suffices to prove that from any subsequence of we can extract a subsequence tending to [math]. From any subsequence of , there exists a convergent subsequence. So we can assume that
[TABLE]
Then if we can prove that
[TABLE]
the proof of the lemma will be complete due to the inequality
[TABLE]
To prove (6.4), by the equality for any and , we have
[TABLE]
Let , since , by Lindeberg’s Theorem we have . Let , then we have
[TABLE]
where the first convergence is from direct calculation, the second and the third are from the fact that and that the function is continuous and bounded. So we have
[TABLE]
Finally we take and see that (6.4) holds. ∎
Next we prove that
[TABLE]
under the conditions A4 and A9 or A10, where is defined in (5.15).
We assume that A4 and A9 hold. From the equation satisfied by , we have
[TABLE]
Now it suffices to prove that
[TABLE]
Note that with high probability, we have
[TABLE]
for some . Then the matrices
[TABLE]
are both positive semi-definite. Then we have
[TABLE]
To prove that this is , it suffices to prove that
[TABLE]
Denote the entries of by . By a simple calculation, one has
[TABLE]
Note that
[TABLE]
so the second term of the above sum is zero. Also, if we use to denote the conjugate of the vector , we have
[TABLE]
Therefore by A2 and A9, the limit (6.5) is proved.
We now assume that A10 and A4 hold. Using the formula , and the inequality
[TABLE]
we can write
[TABLE]
The calculation (6.6) also shows that
[TABLE]
To prove , it suffices to prove that
[TABLE]
Recall that and . Note that the rows of are independent, and that are decorrelated. Let . Then by some calculation, we have
[TABLE]
Using Lemma 2.1 in [2] and Holder’s inequality, is uniformly bounded. Also we have , so the first term of the above sum is negligible. We only need to prove that
[TABLE]
We now prove the first convergence. By simple algebra, we have
[TABLE]
By the elementary inequality , we only need to prove
[TABLE]
To prove the first convergence in (6.10), we have
[TABLE]
To prove the second convergence in (6.10), we take an arbitrary small . Then by the assumption A3, the number of larger than is finite. So we have
[TABLE]
Thus we have proved (6.10), implying the first convergence of (6.9).
We then prove the second part of (6.9). We have
[TABLE]
where is the Kronecker symbol. Using the inequality we only need to prove
[TABLE]
and
[TABLE]
The first of (6.12) can be proved similarly as (6.11), the second of (6.12) is a consequnce of A10. To prove (6.13), we have
[TABLE]
Then is proved.
To prove that , using the same argument leading to (6.5), we only need to prove that
[TABLE]
Using the same notations as proving (6.8), by simple algebra, we have
[TABLE]
Then we have .
Remark 6.1*.*
As we have seen, the main difficulty of this proof is the convergence to [math] in probability of a quadratic form where and are not independent. In the expansion e.g. (6.7) or other expansions afterwards, we can see that the orthogonal relation between and is crucial for the result. Up to the date of submission of this article and to our best knowledge, we have not found any method other than moment expansions which can achieve the same or stronger result. The method used in [11, Theorem 2.4] could be a possible option. But instead of the orthogonality between and , the proof in [11] relies on the clear separation between spiked and non-spiked population eigenvalues and the quicker speed of the non-spiked eigenvalues converging to [math], which is not satisfied by our model.
Potentially our method works also in the case of non-diagonal , and more general , especially for the Toeplitz matrices with parameter . But restricted to the complexity of proof and the length of the paper we may proceed in this direction in the forthcoming works.
Appendix A Proof of (5.17)
We write
[TABLE]
To prove that , we use the martingale decomposition. By reassigning , we continue to denote the eigenvalues of by . Let be the th column of . Let denote and . We denote
[TABLE]
[TABLE]
and
[TABLE]
[TABLE]
[TABLE]
Lemma A.1**.**
Under the conditions of Theorem 2.7 with satisfying A6, for any , there exists a constant such that for large enough , we have
[TABLE]
[TABLE]
Proof.
The first two inequalities are due to the fact that with overwhelming probability the distance from to the spectra of and are bounded away from zero uniformly in (Lemma 4.1), so and are uniformly bounded with overwhelming probability; and with tiny probability, we use the general bound
[TABLE]
Therefore the expectations , are uniformly bounded.
For the third one, the proof is identical to the proof of Lemma A.3 in [1], up to some adaptions. The first steps of the proof of Lemma A.3 in [1] give
[TABLE]
Then we have
[TABLE]
To prove that is uniformly bounded for any fixed , with overwhelming probability, we can use the bound
[TABLE]
and with tiny probability we use the general bound
[TABLE]
Then we have
[TABLE]
For the fourth one, we write
[TABLE]
Note that for large enough , has a positive distance to . We now prove that with overwhelming probability, the term is small enough so that the denominator of is bounded away from [math].
Let be the eigenvalues of , with eigenvectors where . Recall that . Then
[TABLE]
Then by Lemma 4.1, with overwhelming probability, the above quantity is smaller than any fixed .
With tiny probability, the above estimation does not hold. For the estimation of expectation, we should find a new estimation of . Note that if , then ; else if we have
[TABLE]
where we estimate as in the proof of Lemma A.1 in [1]. Thus finally we have
[TABLE]
By the same arguments, one can also verify the boundedness of and . ∎
Then
[TABLE]
Then we have
[TABLE]
By Lemma A.1, the expectation is uniformly bounded; by Lemma B.1 in [1], we have, for any ,
[TABLE]
So we have
[TABLE]
For , note that
[TABLE]
then we have
[TABLE]
Using the same arguments, we get
[TABLE]
Therefore in fact .
To prove that , we consider two cases:
; 2. 2.
, where is a positive number or .
By extracting subsequences we can assume that one of these two cases holds. Indeed, we want to prove that for any , the following limit holds:
[TABLE]
We can prove that from any subsequence of
[TABLE]
there exists a subsequence converging to [math]. We know that from any subsequence of , one can extract a subsequence satisfying one of the above two cases.
Suppose that Condition 1 holds. Then we have
[TABLE]
One can see that
[TABLE]
is bounded, and
[TABLE]
thus . For , by (A.1) and the arguments afterwards, we have, for an arbitrary ,
[TABLE]
where are the eigenvalues of , and we have used the equality
[TABLE]
Now we suppose that Condition 2 holds. We define
[TABLE]
Then
[TABLE]
where . On the other hand, following the calculation for (3.41) of [1], we have
[TABLE]
By the definition of and the system of equations (5.7), we have
[TABLE]
Taking the difference of the last two equalities, we obtain
[TABLE]
Combining the equations (A.4) and (A.5), we obtain
[TABLE]
Next we prove that , , and that the multiplier
[TABLE]
is bounded away from [math].
Repeating the calculations in Section 3.3 of [1], one can check that and . The proof is similar so we omit the details. We just point out some adaptions due to the differences between the models. We define
[TABLE]
Because , the matrix is invertible for large enough, and is uniformly bounded. According to (3.33) of [1] and the estimations afterwards, the result can be similarly deduced. We also remind that in Section 3.3 of [1] the proof is made for Gaussian entries. Here we only assume that the entries have finite sixth moment, so according to (A.2), we have for example the following estimation of which is correspondingly defined in the equation next to (3.36) of [1]:
[TABLE]
thus by the formula next to (3.36) and the formula (3.38) of [1], we have
[TABLE]
To prove that the multiplier (A.6) is bounded from below, we recall that , , , , and , thus for any , as ,
[TABLE]
[TABLE]
The above two limits are uniform in , so we have, for any , for large enough,
[TABLE]
Therefore
[TABLE]
which is lower bounded because we are just in the case where .
The proof of (5.17) is complete.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Z. Bai, H. Li, and G. Pan. Central limit theorem for linear spectral statistics of large dimensional separable sample covariance matrices. ar Xiv preprint ar Xiv:1611.08979 , 2016.
- 2[2] Z. Bai and J.W. Silverstein. No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Annals of probability , pages 316–345, 1998.
- 3[3] Z. Bai and J.W. Silverstein. CLT for linear spectral statistics of large-dimensional sample covariance matrices. The Annals of Probability , 32(1A):553–605, 2004.
- 4[4] Z. Bai, X. Wang, W. Zhou, et al. Functional clt for sample covariance matrices. Bernoulli , 16(4):1086–1113, 2010.
- 5[5] Z. Bai and J. Yao. Central limit theorems for eigenvalues in a spiked population model. In Annales de l’IHP Probabilités et statistiques , volume 44, pages 447–474, 2008.
- 6[6] Z. Bai and J. Yao. On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis , 106:167–177, 2012.
- 7[7] J. Baik, G. Ben Arous, and S. Péché. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Annals of Probability , pages 1643–1697, 2005.
- 8[8] Z. Bao, G. Pan, and W. Zhou. Universality for the largest eigenvalue of sample covariance matrices with general population. The Annals of Statistics , 43(1):382–421, 2015.
