On spectral properties of high-dimensional spatial-sign covariance matrices in elliptical distributions with applications
Weiming Li, Wang Zhou

TL;DR
This paper studies the spectral behavior of the spatial-sign covariance matrix in high-dimensional elliptical distributions, deriving a generalized Marčenko-Pastur law and a CLT for spectral statistics, with applications to covariance matrix spectrum estimation and testing.
Contribution
It introduces a new asymptotic spectral analysis of SSCM in high dimensions, including a CLT for linear spectral statistics and explicit formulas for polynomial cases, extending robust covariance estimation methods.
Findings
Empirical spectral distribution converges to a generalized Marčenko-Pastur law.
Established a CLT for linear spectral statistics of SSCM.
Provided explicit formulas for mean and covariance in polynomial spectral statistics.
Abstract
Spatial-sign covariance matrix (SSCM) is an important substitute of sample covariance matrix (SCM) in robust statistics. This paper investigates the SSCM on its asymptotic spectral behaviors under high-dimensional elliptical populations, where both the dimension of observations and the sample size tend to infinity with their ratio . The empirical spectral distribution of this nonparametric scatter matrix is shown to converge in distribution to a generalized Mar\v{c}enko-Pastur law. Beyond this, a new central limit theorem (CLT) for general linear spectral statistics of the SSCM is also established. For polynomial spectral statistics, explicit formulae of the limiting mean and covarance functions in the CLT are provided. The derived results are then applied to an estimation procedure and a test procedure for the spectrum of the shape component of…
| Mean | St. D. | C. P. | Mean | St. D. | C. P. | Mean | St. D. | C. P. | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4839 | 0.1145 | 0.9375 | 0.4960 | 0.0550 | 0.9491 | 0.5000 | 0.0269 | 0.9486 | |||
| 0.4915 | 0.1135 | 0.9137 | 0.4968 | 0.0588 | 0.9423 | 0.4997 | 0.0292 | 0.9488 | |||
| 1.5030 | 0.1330 | 0.9288 | 1.4990 | 0.0668 | 0.9426 | 1.4998 | 0.0329 | 0.9487 | |||
| 0.5085 | 0.1135 | 0.9137 | 0.5032 | 0.0588 | 0.9423 | 0.5003 | 0.0292 | 0.9488 | |||
| Mean | St. D. | C. P. | Mean | St. D. | C. P. | Mean | St. D. | C. P. | |||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.1887 | 0.0429 | 0.9227 | 0.1988 | 0.0147 | 0.9358 | 0.2003 | 0.0071 | 0.9367 | |||
| 0.2824 | 0.0447 | 0.9403 | 0.2956 | 0.0184 | 0.9525 | 0.2990 | 0.0090 | 0.9483 | |||
| 0.9960 | 0.1347 | 0.9345 | 0.9924 | 0.0661 | 0.9486 | 0.9991 | 0.0337 | 0.9433 | |||
| 0.4064 | 0.0373 | 0.9453 | 0.4012 | 0.0209 | 0.9239 | 0.4002 | 0.0110 | 0.9351 | |||
| 1.7824 | 0.0856 | 0.9236 | 1.7919 | 0.0440 | 0.9413 | 1.7960 | 0.0227 | 0.9392 | |||
| 0.3113 | 0.0696 | 0.9221 | 0.3031 | 0.0365 | 0.9429 | 0.3008 | 0.0189 | 0.9420 | |||
| under Model 3 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.02 | 0.04 | 0.06 | 0.08 | 0.10 | 0.12 | 0.14 | 0.16 | 0.18 | |
| 5.24 | 5.81 | 9.13 | 17.91 | 34.86 | 62.30 | 87.31 | 98.01 | 99.90 | 100 | |
| 5.33 | 5.92 | 8.43 | 18.09 | 35.62 | 63.12 | 88.14 | 98.69 | 99.96 | 100 | |
| 4.76 | 6.39 | 9.69 | 17.39 | 35.23 | 63.57 | 88.15 | 98.67 | 99.97 | 100 | |
| under Model 4 | ||||||||||
| 0 | 0.05 | 0.10 | 0.15 | 0.20 | 0.25 | 0.30 | 0.35 | 0.40 | 0.45 | |
| 4.75 | 7.19 | 17.49 | 43.96 | 79.28 | 97.06 | 99.87 | 100 | 100 | 100 | |
| 5.05 | 6.31 | 12.22 | 26.78 | 53.74 | 80.74 | 95.07 | 99.52 | 99.97 | 100 | |
| 4.88 | 5.65 | 8.56 | 16.33 | 30.09 | 49.17 | 71.60 | 86.54 | 95.20 | 98.61 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Statistical Methods and Bayesian Inference · Point processes and geometric inequalities
On spectral properties of high-dimensional spatial-sign covariance matrices in elliptical distributions with applications
Weiming Li Wang Zhou
School of Statistics and Management, Shanghai University of Finance and Economics, Guoding Road No. 777, Shanghai, 200433, China.
Department of Statistics and Applied Probability, National University of Singapore, Singapore
Abstract.
Spatial-sign covariance matrix (SSCM) is an important substitute of sample covariance matrix (SCM) in robust statistics. This paper investigates the SSCM on its asymptotic spectral behaviors under high-dimensional elliptical populations, where both the dimension of observations and the sample size tend to infinity with their ratio . The empirical spectral distribution of this nonparametric scatter matrix is shown to converge in distribution to a generalized Marčenko-Pastur law. Beyond this, a new central limit theorem (CLT) for general linear spectral statistics of the SSCM is also established. For polynomial spectral statistics, explicit formulae of the limiting mean and covarance functions in the CLT are provided. The derived results are then applied to an estimation procedure and a test procedure for the spectrum of the shape component of population covariance matrices.
Key words and phrases:
Spatial-sign, Covariance matrix, High-dimensional data, Elliptical distribution.
2010 Mathematics Subject Classification:
Primary 62H10; Secondary 62H15
Li’s work was partially supported by National Natural Science Foundation of China, No. 11401037 and Program of IRTSHUFE.Zhou’s work was partially supported by the MOE Tier 2 grant MOE2015-T2-2-039 (R-155-000-171-112) at the National University of Singapore.
1. Introduction
Elliptical family of distributions, originally introduced in [20], is an important extension of the multivariate normal distribution and has been broadly applied in biology, finance and economics, signal and image processing, etc. [14, 17]. A random vector with zero mean is said to be elliptically distributed if it has a stochastic representation [14]:
[TABLE]
where is a matrix with , is a scalar random variable representing the radius of , and is the random direction, independent of and uniformly distributed on the unit sphere in . Besides the normal distribution, this family includes many other celebrated distributions, such as multivariate -distribution, Kotz-type distributions, and Gaussian scale mixture. In general, the radius needs not be independent of the direction but can be a function of the chosen direction [35].
Let be a sequence of independent and identically distributed (i.i.d.) random vectors from the elliptical model in (1.1). Many statistical procedures for this model prefer to transform the original observations into spatial-sign samples for the purpose of robustness, which are defined as
[TABLE]
One can refer to [26] and [29] for a comprehensive review. When an inference is concerned with the shape matrix , assuming so that and can be identified in the model (1.1), one of the most important statistics is the so-called spatial-sign covariance matrix (SSCM), i.e.
[TABLE]
which is actually the sample covariance matrix (SCM) of . As a robust alternative to the SCM , this nonparametric scatter matrix is a fast computed and orthogonally equivariant statistic with high breakdown point, and thus is highly recommended in applications, such as principle component analysis and structural test for covariance matrices, see [23], [16], [39], [31], to name a few. Despite its merits, the SSCM is also a controversial statistic in “ small , large ” scenarios due to its lack of affine equivariance [27]. However, the pursuit of this property seems not advisable for high-dimensional situations, as claimed in [38] that any well-defined affine equivariant scatter matrix must be proportional to the SCM whenever . Therefore, it is of great interests to discover behaviors of the SSCM in high-dimensional robust statistics.
In this paper, using tools of random matrix theory, we investigate asymptotic spectral behaviors of the SSCM in high-dimensional frameworks where both the dimension and the sample size tend to infinity with their ratio , a positive constant in . Specifically, let be the eigenvalues of , then the empirical spectral distribution (ESD) of is by definition
[TABLE]
where denotes the Dirac mass at . Our aim is to study the limiting properties of and the central limit theorem (CLT) for linear spectral statistics (LSS) of the form for a class of smooth test functions . These properties may become powerful tools to recover spectral features of the population SSCM, i.e. , and then those of the shape matrix since the matrices and share the same eigenvectors and their eigenvalues have a one-to-one correspondence [9]. Moreover, as , the two matrices coincide in the sense that the spectral norm , as long as (or ) is uniformly bounded, see Lemma 4.1.
Spectral properties of high-dimensional SCM have been extensively studied in random matrix theory since the pioneer work of [25]. The standard model in the literature has the form
[TABLE]
where is as before, is a constant, and is a set of i.i.d. random variables satisfying E, E, and E. Let be i.i.d. copies of and be the corresponding SCM. It has been known that the ESD of converges to the celebrated Marčenko-Pastur (MP) law when , and generalized MP law for general matrix , as with . One can refer to [25] and [36]. The CLT for LSS of was first studied in [19] by assuming the population to be standard multivariate normal. One breakthrough on the CLT was obtained by [3], where the population is allowed to be general with E. This fourth moment condition was then weakened to be E in [30]. For more references, one can refer to [4], [2], [15], and references therein. However, these results do not apply to general elliptical populations since the two underlying models in (1.1) and (1.2) have little in common, except for normal distributions. In fact, for general elliptical populations, it has been reported that the ESD of the SCM converges to a deterministic distribution that is not a generalized MP law, but has to be characterized by both the distribution of and the limiting spectrum of through a system of implicit equations [11, 24]. The involvement of seriously interferes with our understanding of the spectrum of from the ESD of . This again motivates us to shift our attention to the SSCM which discards the random radiuses and focus only on the directions .
The main contributions of this paper are as follows. First in Section 2, asymptotic results on the eigenvalues of are derived, including the limit of the ESD and a new CLT for LSS of . As a corollary, polynomial spectral statistics are fully addressed with explicit limiting mean and covariance functions in the CLT. Then in Section 3, relying on these results, we develop two statistical applications on the spectrum of , the population SSCM, under a setting that the spectrum forms a discrete distribution with finite support. One is to estimate the spectrum of through moment methods and the other is to test the hypothesis that there are no more than distinct eigenvalues of . Technical proofs of the main theorems are gathered in Section 4. Some lemmas and their necessary proofs are postponed to the last section.
2. High-dimensional theory for eigenvalues of
2.1. Limiting spectral distribution of
We consider here the limit of the ESD sequence in high-dimensional regimes, namely limiting spectral distribution (LSD). Our main assumptions are listed below.
Assumption (a). Both the sample size and population dimension tend to infinity in such a way that .
Assumption (b). Sample observations are , where is a matrix with and are i.i.d. random vectors, uniformly distributed on the unit sphere in .
Assumption (c). The spectral norm of is bounded and its spectral distribution converges weakly to a probability distribution , called population spectral distribution (PSD).
From Lemma 4.1, it is clear that the spectral distributions of and are asymptotically identical. So one can certainly replace with in Assumption (c), which does not affect the LSD of . However we keep because it is easy to describe the CLT for LSS using the spectral distribution of .
For the characterization of the LSD of , we need to introduce the Stieltjes transform of a measure on the real line, which is defined as
[TABLE]
where denotes the support of .
Theorem 2.1**.**
Suppose that Assumptions (a)-(c) hold. Then, almost surely, the empirical spectral distribution converges weakly to a probability distribution , whose Stieltjes transform is the unique solution to the equation
[TABLE]
in the set where .
The LSD defined in (2.1) agrees with that in [25]. Let denote the Stieltjes transform of . Then (2.1) can also be represented as
[TABLE]
See [36]. For procedures on finding the density function and the support set of from (2.1) and (2.2), one is referred to [4].
2.2. CLT for linear spectral statistics of
Let be the LSD as defined in (2.2) with the parameters replaced by . Writing , we next study the fluctuation of
[TABLE]
which is a centralized linear spectral statistic with analytic .
Theorem 2.2**.**
Suppose that Assumptions (a)-(c) hold. Let be functions analytic on an open interval containing
[TABLE]
Then the random vector
[TABLE]
converges weakly to a Gaussian vector , whose mean function is
[TABLE]
and covariance function is
[TABLE]
, where the contours and are non-overlapping, closed, counter-clockwise orientated in the complex plane, and each encloses the support of the LSD .
When the underlying population is multivariate normal, the elliptical model in (1.1) and the linear transformation model in (1.2) hold simultaneously. In this case, it is interesting to compare the limiting distribution in Theorem 2.2 based on SSCM with the classical result in [3] based on SCM. It turns out that there are some additional terms in our new CLT: the second contour integral in the mean function and the second to fourth summands in the covariance function.
Among all LSS, polynomial spectral statistics are of fundamental importance. The bases of these statistics are moments of ESD , i.e.
[TABLE]
The first order moment is 1 since . Other moments , , are random. Their limiting behavior can be described through the following two quantities
[TABLE]
as well as their limits, denoted by and , respectively, From [28], the quantities and are connected through the recursive formulae:
[TABLE]
and , where the sum runs over the following partitions of :
[TABLE]
and The joint limiting distribution of moments can be derived from Theorem 2.2 by taking functions . For this particular case, the mean and covariance functions in the limiting distribution can be explicitly formulated.
Corollary 2.1**.**
Suppose that Assumptions (a)-(c) hold. Then the random vector
[TABLE]
The mean vector satisfies
[TABLE]
where , , and denotes the th derivative of with respect to . The covariance matrix has entries
[TABLE]
where .
3. Applications to spectral inference
Inference on PSD is fundamentally important in many high-dimensional statistical analysis, such as the principal component analysis [18, 8, 40], factor models [12, 13], and covariance matrix estimation [21].
In this section, we illustrate two statistical applications of the theoretical results developed in Section 2: one is estimating a PSD and the other is testing the order of a PSD. The family of PSDs under study is a class of parameterized discrete distributions with finite support on , that is,
[TABLE]
where Here the restriction is due to the fact that . For the model (3.1), the order of refers to the cardinality of its support, which is equal to . This model for PSDs can be viewed as the spectral structure of noise covariance matrices in factor models [12], and extensions of the spiked model [18] which allows the number of leading eigenvalues to grow with the dimension . More discussions on this model can be found in [10], [34], [1], [22], etc. Similar to [10], we adopt the setting of fixed PSDs in this section, i.e. for all large.
3.1. Estimation of a PSD
For the model in (3.1), [1] introduced a moment method for the PSD estimation. By assuming the order to be known, their method first estimates the moments of through the recursive formulae in (2.3), and then solve a system of moment equations, to get a consistent estimator of .
In our situation, with notation and for , we denote
[TABLE]
as the mappings between the corresponding vectors. These two mappings are both one-to-one and the determinants of their Jacobian matrices are all nonzero. See [1]. Therefore, applying Theorem 2.1, which is followed by , as . However, as shown by the CLT in Corollary 2.1, the estimator is biased by the order of . So it’s natural to modify by subtracting its limiting mean in the CLT to obtain a better estimator of . Beyond this correction, the CLT can also provide confidence regions for the parameter .
Denote the modified estimators of , , and by
[TABLE]
respectively, where with defined in Corollary 2.1 for From Theorem 2.1, Corollary 2.1, and a standard application of the Delta method, one may easily get asymptotic properties of these estimators.
Theorem 3.1**.**
Suppose that Assumptions (a)-(c) hold and the true value is an inner point of . Then we have , , , and moreover
[TABLE]
where and represent the Jacobian matrices and , respectively, and is defined in Corollary 2.1 with .
3.2. Test for the order of a PSD
The aforementioned estimation procedure requires that the order of the PSD be pre-specified. In general, this prior knowledge should be testified in advance. To deal with this problem, we consider the hypotheses
[TABLE]
where is a known constant. These hypotheses can also be regarded as a generalization of the well-known sphericity hypotheses on covariance matrices, i.e. the case .
In [32], a test procedure was outlined based on a moment matrix and its estimator which can be formulated as
[TABLE]
Here we set and , as defined in (3.2), for . It has been proved that the determinant of is zero if the null hypothesis in (3.4) holds, otherwise is strictly positive [22]. Therefore, the determinant can serve as a test statistic for (3.4) and the null hypothesis shall be rejected if the statistic is significantly greater than zero. Applying Theorem 3.1 and the main theorem in [32], the asymptotic distribution of is obtained immediately.
Theorem 3.2**.**
Suppose that Assumptions (a)-(c) hold. Then the statistic is asymptotically normal, i.e.
[TABLE]
where with , the vectorization of the adjugate matrix of . The first two rows and columns of the matrix consist of zero and the remaining submatrix is defined in (3.3). The matrix is a 0-1 matrix with only , , , where denotes the greatest integer not exceeding .
From Theorem 3.1, the limiting variance in (3.5) is a continuous function of . While, under the null hypothesis, this variance is a function of , denoted by . Let . Then it is a strongly consistent estimator of .
Corollary 3.1**.**
Suppose that Assumptions (a)-(c) hold. Then, under the null hypothesis,
[TABLE]
as . In addition, the asymptotic power of tends to 1.
Corollary 3.1 follows directly from Theorem 3.2 and its proof is thus omitted. This corollary includes as a particular case the sphericity test. For this case, the test statistic reduces to and its null distribution is consistent with that in [31].
3.3. Simulation experiments
Simulations are carried out to evaluate the performance of proposed estimation and test for discrete PSDs in (3.1). Samples of are drawn from and all statistics are calculated from 10,000 independent replications.
The estimation procedure are conducted for two PSDs, Models 1 and 2: Model 1 is of order 2 with the dimension to sample size ratio and Model 2 is of order 3 with the ratio .
- •
Model 1: and .
- •
Model 2: and .
The sample size is for Model 1 and for Model 2, respectively. In addition to empirical means and standard deviations of all estimators, we also calculate 95% confidence intervals for all parameters and report their coverage probabilities. Results are collected in Tables 1 and 2, which clearly demonstrate the consistency of all estimators as the sample size become large.
Next we examine the test for the order of a PSD. Two models are employed for this experiment:
- •
Model 3: ,
- •
Model 4: ,
where the parameter represents the distance between the null and alternative hypotheses. In particular, Model 3 is used for testing (sphericity test) with ranging from 0 to 0.2 by a step 0.18 and Model 4 is for testing with ranging from 0 to 0.45 by a step 0.05. The sample size is taken as , the dimension-sample size ratio is , and the significance level is fixed at . Results summarized in Table 3 show that the proposed test has accurate empirical size and its power tends to 1 as the parameter increases under the two models. Different from the sphericity test, the power for Model 2 declines significantly as the ratio increases. This phenomenon is consistent with that based on SCM depicted in [32].
4. Proofs
4.1. Some key lemmas
We present three lemmas which form the core basis for the proofs of Theorems 2.1 and 2.2.
Lemma 4.1**.**
Let where is a diagonal matrix with the spectral norm bounded. Write , . Then we have for ,
[TABLE]
Proof.
As three expectations can be evaluated through a similar way, we only present the details for the second one as an illustration. Replacing the denominator of the quantity inside the expectation by and making their difference yields
[TABLE]
where
[TABLE]
Taking expectations of and , we get
[TABLE]
which combined with (4.1) gives
[TABLE]
∎
Lemma 4.2**.**
Let where is as defined in Lemma 4.1 such that . For any complex matrices and with bounded spectral norms,
[TABLE]
where .
Proof.
By symmetry, for . Write and , we thus get
[TABLE]
From Lemma 1, we have
[TABLE]
From the above quantities and (4.2), we obtain
[TABLE]
On the other hand, from the first conclusion of Lemma 1, one may derive that
[TABLE]
for any matrix with bounded spectral norm, which implies
[TABLE]
Therefore,
[TABLE]
Finally, from (4.3), we may replace with and replace with in the above expression and then obtain the result of the Lemma. ∎
Let be arbitrary, any number greater than , and any negative number if , otherwise choose . Define a contour as
[TABLE]
Let and be the Stieltjes transforms of and . Our next aim is to study the fluctuation of the random process
[TABLE]
For this, we define a truncated version of as
[TABLE]
where and the sequence decreasing to zero satisfying for some .
Lemma 4.3**.**
Under Assumptions (a)-(c), the random process converges weakly to a two-dimensional Gaussian process satisfying for ,
[TABLE]
and covariance function
[TABLE]
Proof.
Split into two parts, , where
[TABLE]
Following the strategy in [3], we prove the convergence of by three steps:
- Step 1: Finite dimensional convergence of in distribution;
- Step 2: Tightness of on ;
- Step 3: Convergence of .
Without loss of generality, we assume for all . Constants appearing in inequalities will be denoted by which may take different values from one expression to the next.
Step 1: Finite dimensional convergence of in distribution. We show in this part, for any complex numbers , the random vector
[TABLE]
converges in distribution to a Gaussian vector. We begin with introducing some notation which will be frequently used in the sequel.
[TABLE]
Note that, for any , the last three quantities are bounded in absolute value by .
Let denote expectation and denote conditional expectation with respect to the -field generated by . From the martingale decomposition and the identity
[TABLE]
we have
[TABLE]
Writing , we have
[TABLE]
Note that
[TABLE]
which is from Lemma 5.1. Similarly, . Thus we get
[TABLE]
which implies that we need only to consider the limiting distribution of
[TABLE]
in finite dimensional situations. For any ,
[TABLE]
which tends to zero according to Lemma 5.1 and thus verifies the Lyapunov condition. Therefore, from the martingale CLT (Lemma 5.4), the random vector in (4.8) will tend to a Gaussian vector with covariance function
[TABLE]
provided this limit exits. By the same arguments in page 571 of [3], it is sufficient to show that
[TABLE]
converges in probability. Since
[TABLE]
where the last inequality is from
[TABLE]
for any matrix , see Lemma 2.6 in [37]. Moreover, from the definition of and discussions in Page 439 in [5], we also have
[TABLE]
It is hence sufficient to study the convergence of
[TABLE]
whose second mixed partial derivative yields the limit of (4.11). From Lemma 2, we know that
[TABLE]
where
[TABLE]
Now we consider the limit of . Let
[TABLE]
Note that
[TABLE]
From the equality , we get
[TABLE]
where
[TABLE]
For any matrix , let denote a non-random upper bound for the spectral norm of . From Lemma 5.1, (4.14), and (4.18), we get
[TABLE]
where the matrix in the first two inequalities is assumed nonrandom.
Using the equality (4.9) we write
[TABLE]
where
[TABLE]
From (4.14) and (4.18) we get and . Using Lemma 5.1 we have, for ,
[TABLE]
and by (4.14),
[TABLE]
These imply that
[TABLE]
Therefore, from (4.19)-(4.24),
[TABLE]
where . From this and applying (4.19)-(4.24) again, we get
[TABLE]
where .
From (4.15) and (4.25), we obtain that
[TABLE]
Here . Letting
[TABLE]
we get
[TABLE]
where
[TABLE]
Elementary calculations reveal that
[TABLE]
Now we derive the limits of , , and their second mixed partial derivatives. From (4.15), (4.19)-(4.22), it’s easy to show that
[TABLE]
where and . We thus get
[TABLE]
Their corresponding derivatives are
[TABLE]
respectively.
Collecting results in (4.17), (4.27)-(4.30), we finally get the covariance function in the lemma.
Step 2: Tightness of . From the arguments in [3], the tightness of can be established by verifying the moment condition:
[TABLE]
We first claim that moments of , and are all bounded in and . Taking for example, it’s clear that for . For , applying Lemma 5.5 with suitably large ,
[TABLE]
where the two constant and satisfy and . Therefore for any positive , we may assume that
[TABLE]
Using the above argument, we can extend the inequality in Lemma 5.1 to
[TABLE]
where the matrices are independent of and
[TABLE]
for some positive , where is or with some ’s removed. In applications of (4.33), can be a product of factors of or or similar terms. It’s easy to verify that these terms satisfy (4.34), see pages 579 and 580 in [3] for details.
Let
[TABLE]
We first handle moments of . By a similar decomposition in (4.10), we may get
[TABLE]
Applying Lemma 5.3 and the Hölder inequality to the above expression we then get, for even ,
[TABLE]
where the last inequality uses the boundedness of and . From (4.33) and (4.35), we get
[TABLE]
for even.
Next we show that is bounded for all . By the equality and the boundedness of and , we have
[TABLE]
and thus, for all large enough,
[TABLE]
Now we prove (4.31). From the martingale decomposition and (4.9), we have
[TABLE]
It is then enough to show , , and are all bounded. The arguments for the boundedness are all similar to those in pages 582 and 583 in [3], and hence we only present the details for for illustration.
Replacing in with we may obtain where
[TABLE]
From (4.33), (4.34), and (4.37),
[TABLE]
Using (4.33), (4.34),(4.36), and (4.37),
[TABLE]
Similarly, we may get . Hence the tightness of is obtained.
Step 3: Convergence of . To finish the proof, it is enough to show that the sequence of is bounded and equicontinuous, and converges to the mean function of the lemma for . The boundedness and equicontinuity can be verified following the arguments on pages 592 and 593 of [3], and thus we only focus on the convergence of .
We first list some results that will be used in the sequel:
[TABLE]
where is any nonrandom matrix. These results can be verified step by step following similar discussions in [3] and we omit the details.
Writing , we decompose as
[TABLE]
Notice that
[TABLE]
We have
[TABLE]
where the second equality uses the convergence in (4.38).
Our next task is to study the limits of and . For simplicity, we suppress the expression when it is served as independent variables of some functions in the sequel. All expressions and convergence statements hold uniformly for .
We first simplify the expression of . Using the identity , we have
[TABLE]
From (4.9) and ,
[TABLE]
where . From this and (4.42), we get
[TABLE]
Plugging into the first term in the above equation, we obtain
[TABLE]
Note that, from (4.33), (4.36), and (4.39),
[TABLE]
We thus arrive at
[TABLE]
On the other hand, by the identity , we have
[TABLE]
which implies . From this, together with , (4.33), we get
[TABLE]
Applying Lemma 2 to the simplified and , and then replacing with in the derived results yield
[TABLE]
To study the limits of and , we compare the difference between and . Similar to (4.19)-(4.22), we have
[TABLE]
where and, for any matrix ,
[TABLE]
Moreover, for nonrandom with bounded norm,
[TABLE]
Similar to (4.23), we write
[TABLE]
where , , and Using (4.32), (4.33), and (4.39), we get
[TABLE]
[TABLE]
From (4.15), (4.45)-(4.51) we get
[TABLE]
Combining the above results with (4.43) and (4.44), we obtain
[TABLE]
Therefore we get
[TABLE]
as . Using the identity
[TABLE]
we finally obtain the mean function of the lemma.
∎
4.2. Proof of Theorem 2.1
Following Theorem 1.1 in [5], it is sufficient to show that, for any bounded sequence of symmetric matrices ,
[TABLE]
Write where . Since the eigenvalues of the SSCM are invariant under orthogonal transformation, it’s enough to consider the diagonal matrix . Therefore, by taking in Lemma 4.2, one can verify the condition (4.53).
4.3. Proof of Theorem 2.2
For any distribution function and function analytic on a simple connected domain containing the support of , it holds that
[TABLE]
where denotes the Stieltjes transform of and is a simple, closed, and positively oriented contour enclosing the support of . Similar to (4.4), we choose , , and such that are all analytic on and inside the contour . We denote by a common upper bound of these functions on . Therefore, almost surely, for all large, satisfy the equation in (4.54) with and moreover,
[TABLE]
which converges to zero as . Since
[TABLE]
is a continuous mapping of into , it follows from Lemma 4.3 that the above random vector converges to a multivariate Gaussian vector whose mean and covariance functions are
[TABLE]
where and are two non-overlapping analogues of the contour .
From the following two identities
[TABLE]
we obtain the form of the limiting covariance function in the theorem.
4.4. Proof of Corollary 2.1
Choose a contour for the integrals such that where is the support of . Let denote the image of under . Then is a simple and closed contour having clockwise direction and enclosing zero [33].
By the identity in (2.2), the integral in the mean function of Theorem 2.2 becomes
[TABLE]
From this and the Cauchy integral theorem, we get the mean function. The covariance function can be obtained following the proof of Theorem 1 in [33].
5. Appendix
Lemma 5.1**.**
For any complex matrix and with and ,
[TABLE]
where is a positive constant depending only on .
Proof.
This lemma follows from Lemma 2.2 in [3] and similar arguments in the proof of Lemma 5 in [15]. ∎
Lemma 5.2** ([7]).**
Let be a complex martingale difference sequence with respect to the increasing -field . Then, for ,
[TABLE]
Lemma 5.3** ([7]).**
Let be a complex martingale difference sequence with respect to the increasing -field . Then, for ,
[TABLE]
Lemma 5.4** (Theorem 35.12 of [6]).**
Suppose for each is a real martingale difference sequence with respect to the increasing -field having second moments. If for each ,
[TABLE]
as , where is a positive constant, then
[TABLE]
Lemma 5.5**.**
Suppose that Assumptions (a)-(c) hold. Then, for any positive,
[TABLE]
whenever . If then,
[TABLE]
whenever .
Proof.
Let where and , Also let . From [3], the conclusions of this lemma hold when are replaced with . Choose and satisfying
[TABLE]
where . From Lemma 1, we have
[TABLE]
Using inequalities
[TABLE]
we may get
[TABLE]
where the last equality is from the Chebyshev inequality and the fact . Similarly,
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bai et al. [2010] Bai, Z. D., Chen, J. Q., and Yao, J. F. (2010). On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust. N. Z. J. Stat. 52 , 423–437.
- 2Bai et al. [2015] Bai, Z. D., Hu, J., Pan, G. M., and Zhou, W. (2015). Convergence of the empirical spectral distribution function of Beta matrices. Bernoulli , 2 1, 1538–1574.
- 3Bai and Silverstein [2004] Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. , 32 , 553–605.
- 4Bai and Silverstein [2010] Bai, Z. D. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices , 2nd ed., Springer, New York.
- 5Bai and Zhou [2008] Bai, Z. D. and Zhou, W. (2008). Large sample covariance matrices without independence structures in columns. Statist. Sinica , 18 , 425–442.
- 6Billingsley [1995] Billingsley, P. (1995). Probability and Measure , 3rd ed., Wiley, New York.
- 7Burkholder [1973] Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Probab. , 1 , 19–42.
- 8Cai et al. [2013] Cai, Tony, Ma, Z. M., and Wu, Y. H. (2013) Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 , 3074-3110.
