On spectral properties of high-dimensional spatial-sign covariance   matrices in elliptical distributions with applications

Weiming Li; Wang Zhou

arXiv:1705.06427·math.ST·May 19, 2017

On spectral properties of high-dimensional spatial-sign covariance matrices in elliptical distributions with applications

Weiming Li, Wang Zhou

PDF

Open Access

TL;DR

This paper studies the spectral behavior of the spatial-sign covariance matrix in high-dimensional elliptical distributions, deriving a generalized Marčenko-Pastur law and a CLT for spectral statistics, with applications to covariance matrix spectrum estimation and testing.

Contribution

It introduces a new asymptotic spectral analysis of SSCM in high dimensions, including a CLT for linear spectral statistics and explicit formulas for polynomial cases, extending robust covariance estimation methods.

Findings

01

Empirical spectral distribution converges to a generalized Marčenko-Pastur law.

02

Established a CLT for linear spectral statistics of SSCM.

03

Provided explicit formulas for mean and covariance in polynomial spectral statistics.

Abstract

Spatial-sign covariance matrix (SSCM) is an important substitute of sample covariance matrix (SCM) in robust statistics. This paper investigates the SSCM on its asymptotic spectral behaviors under high-dimensional elliptical populations, where both the dimension $p$ of observations and the sample size $n$ tend to infinity with their ratio $p / n \to c \in (0, \infty)$ . The empirical spectral distribution of this nonparametric scatter matrix is shown to converge in distribution to a generalized Mar\v{c}enko-Pastur law. Beyond this, a new central limit theorem (CLT) for general linear spectral statistics of the SSCM is also established. For polynomial spectral statistics, explicit formulae of the limiting mean and covarance functions in the CLT are provided. The derived results are then applied to an estimation procedure and a test procedure for the spectrum of the shape component of…

Tables3

Table 1. Table 1. Estimation for Model 1 with sample size n 𝑛 n = 100,200,400 and c = 2 𝑐 2 c=2 . The number of independent replications is 10,000 and the nominal coverage probability (C. P.) is fixed at 95 % percent 95 95\% .

$θ$	$n = 100$			$n = 200$			$n = 400$
	Mean	St. D.	C. P.	Mean	St. D.	C. P.	Mean	St. D.	C. P.
$a_{1} = 0.5$	0.4839	0.1145	0.9375	0.4960	0.0550	0.9491	0.5000	0.0269	0.9486
$w_{1} = 0.5$	0.4915	0.1135	0.9137	0.4968	0.0588	0.9423	0.4997	0.0292	0.9488
$a_{2} = 1.5$	1.5030	0.1330	0.9288	1.4990	0.0668	0.9426	1.4998	0.0329	0.9487
$w_{2} = 0.5$	0.5085	0.1135	0.9137	0.5032	0.0588	0.9423	0.5003	0.0292	0.9488

Table 2. Table 2. Estimation for Model 2 with sample size n 𝑛 n = 400,800,1600 and c = 1 / 4 𝑐 1 4 c=1/4 . The number of independent replications is 10,000 and the nominal coverage probability (C. P.) is fixed at 95 % percent 95 95\% .

$θ$	$n = 400$			$n = 800$			$n = 1600$
	Mean	St. D.	C. P.	Mean	St. D.	C. P.	Mean	St. D.	C. P.
$a_{1} = 0.2$	0.1887	0.0429	0.9227	0.1988	0.0147	0.9358	0.2003	0.0071	0.9367
$w_{1} = 0.3$	0.2824	0.0447	0.9403	0.2956	0.0184	0.9525	0.2990	0.0090	0.9483
$a_{2} = 1.0$	0.9960	0.1347	0.9345	0.9924	0.0661	0.9486	0.9991	0.0337	0.9433
$w_{2} = 0.4$	0.4064	0.0373	0.9453	0.4012	0.0209	0.9239	0.4002	0.0110	0.9351
$a_{3} = 1.8$	1.7824	0.0856	0.9236	1.7919	0.0440	0.9413	1.7960	0.0227	0.9392
$w_{3} = 0.3$	0.3113	0.0696	0.9221	0.3031	0.0365	0.9429	0.3008	0.0189	0.9420

Table 3. Table 3. Empirical size and power of T n subscript 𝑇 𝑛 T_{n} in percentage under Model 3 and Model 4 with the sample size n = 400 𝑛 400 n=400 . The number of independent replications is 10,000 and the nominal significance level is 0.05 0.05 0.05 .

$H_{0} : d \leq 1$ under Model 3
$x$	0	0.02	0.04	0.06	0.08	0.10	0.12	0.14	0.16	0.18
$c = \frac{1}{2}$	5.24	5.81	9.13	17.91	34.86	62.30	87.31	98.01	99.90	100
$c = 1$	5.33	5.92	8.43	18.09	35.62	63.12	88.14	98.69	99.96	100
$c = 2$	4.76	6.39	9.69	17.39	35.23	63.57	88.15	98.67	99.97	100
$H_{0} : d \leq 2$ under Model 4
$x$	0	0.05	0.10	0.15	0.20	0.25	0.30	0.35	0.40	0.45
$c = \frac{1}{2}$	4.75	7.19	17.49	43.96	79.28	97.06	99.87	100	100	100
$c = 1$	5.05	6.31	12.22	26.78	53.74	80.74	95.07	99.52	99.97	100
$c = 2$	4.88	5.65	8.56	16.33	30.09	49.17	71.60	86.54	95.20	98.61

Equations331

x = w A u,

x = w A u,

y_{j} = {p \frac{x _{j}}{∣∣ x _{j} ∣∣} 0 x_{j} \neq = 0, x_{j} = 0.

y_{j} = {p \frac{x _{j}}{∣∣ x _{j} ∣∣} 0 x_{j} \neq = 0, x_{j} = 0.

B_{n} = \frac{1}{n} j = 1 \sum n y_{j} y_{j}^{'},

B_{n} = \frac{1}{n} j = 1 \sum n y_{j} y_{j}^{'},

F^{B_{n}} = \frac{1}{p} j = 1 \sum p δ_{λ_{j}},

F^{B_{n}} = \frac{1}{p} j = 1 \sum p δ_{λ_{j}},

x = σ A z,

x = σ A z,

m_{G} (z) = \int \frac{1}{x - z} d G (x), z \in C ∖ S_{G},

m_{G} (z) = \int \frac{1}{x - z} d G (x), z \in C ∖ S_{G},

m = \int \frac{1}{t ( 1 - c - cz m ) - z} d H (t), z \in C^{+},

m = \int \frac{1}{t ( 1 - c - cz m ) - z} d H (t), z \in C^{+},

z = - \frac{1}{m} + c \int \frac{t}{1 + t m} d H (t), z \in C^{+} .

z = - \frac{1}{m} + c \int \frac{t}{1 + t m} d H (t), z \in C^{+} .

\int f (x) d G_{n} (x) = \int f (x) d [F^{B_{n}} (x) - F^{c_{n}, H_{p}} (x)],

\int f (x) d G_{n} (x) = \int f (x) d [F^{B_{n}} (x) - F^{c_{n}, H_{p}} (x)],

[p \to \infty lim inf λ_{m i n}^{Σ} δ_{(0, 1)} (c) (1 - c)^{2}, p \to \infty lim sup λ_{m a x}^{Σ} (1 + c)^{2}] .

[p \to \infty lim inf λ_{m i n}^{Σ} δ_{(0, 1)} (c) (1 - c)^{2}, p \to \infty lim sup λ_{m a x}^{Σ} (1 + c)^{2}] .

p (\int f_{1} (x) d G_{n} (x), \dots, \int f_{k} (x) d G_{n} (x))

p (\int f_{1} (x) d G_{n} (x), \dots, \int f_{k} (x) d G_{n} (x))

E X_{f} =

E X_{f} =

\displaystyle\times\bigg{[}\int\frac{\gamma_{2}t-t^{2}dH(t)}{1+\underline{m}(z)t}\int\frac{tdH(t)}{(1+\underline{m}(z)t)^{2}}-\int\frac{tdH(t)}{1+\underline{m}(z)t}\int\frac{t^{2}dH(t)}{(1+\underline{m}(z)t)^{2}}\bigg{]}dz

Cov (X_{f}, X_{g}) =

Cov (X_{f}, X_{g}) =

+ 2 γ_{2} c \int x f^{'} (x) d F (x) \int x g^{'} (x) d F^{c, H} (x)

- \frac{1}{π i} \oint_{C_{1}} \frac{f ( z ) m ^{'} ( z )}{m ^{2} ( z )} d z \int x g^{'} (x) d F^{c, H} (x)

- \frac{1}{π i} \oint_{C_{1}} \frac{g ( z ) m ^{'} ( z )}{m ^{2} ( z )} d z \int x f^{'} (x) d F^{c, H} (x),

\hat{β}_{nj} = \frac{1}{p} tr (B_{n}^{j}) = \int x^{j} d F^{B_{n}} (x), j = 1, 2, \dots .

\hat{β}_{nj} = \frac{1}{p} tr (B_{n}^{j}) = \int x^{j} d F^{B_{n}} (x), j = 1, 2, \dots .

β_{nj} = \int x^{j} d F^{c_{n}, H_{p}} (x) and γ_{nj} = \int t^{j} d H_{p} (t),

β_{nj} = \int x^{j} d F^{c_{n}, H_{p}} (x) and γ_{nj} = \int t^{j} d H_{p} (t),

β_{nj} = \sum c_{n}^{i_{1} + \dots + i_{j} - 1} (γ_{n 2})^{i_{2}} \dots (γ_{nj})^{i_{j}} ϕ (i_{1}, \dots, i_{j}), j \geq 2,

β_{nj} = \sum c_{n}^{i_{1} + \dots + i_{j} - 1} (γ_{n 2})^{i_{2}} \dots (γ_{nj})^{i_{j}} ϕ (i_{1}, \dots, i_{j}), j \geq 2,

(i_{1}, \dots, i_{j}) : j = i_{1} + 2 i_{2} + \dots + j i_{j}, i_{l} \in N,

(i_{1}, \dots, i_{j}) : j = i_{1} + 2 i_{2} + \dots + j i_{j}, i_{l} \in N,

p (\hat{β}_{n 2} - β_{n 2}, \dots, \hat{β}_{nk} - β_{nk}) D N_{k - 1} (v, Ψ) .

p (\hat{β}_{n 2} - β_{n 2}, \dots, \hat{β}_{nk} - β_{nk}) D N_{k - 1} (v, Ψ) .

\displaystyle v_{j}=\bigg{[}\frac{cP^{j}}{(j-2)!}\bigg{(}\frac{P_{2,3}}{1-cz^{2}P_{2,2}}+2\gamma_{2}P_{1,1}P_{1,2}-2P_{2,1}P_{1,2}-2P_{1,1}P_{2,2}\bigg{)}\bigg{]}^{(j-2)}\bigg{|}_{z=0},

\displaystyle v_{j}=\bigg{[}\frac{cP^{j}}{(j-2)!}\bigg{(}\frac{P_{2,3}}{1-cz^{2}P_{2,2}}+2\gamma_{2}P_{1,1}P_{1,2}-2P_{2,1}P_{1,2}-2P_{1,1}P_{2,2}\bigg{)}\bigg{]}^{(j-2)}\bigg{|}_{z=0},

ψ_{ij} = 2 ℓ = 0 \sum i - 1 (i - ℓ) u_{i, ℓ} u_{j, i + j - ℓ} + 2 c γ_{2} ij β_{i} β_{j} + 2 j β_{j} u_{i, i + 1} + 2 i β_{i} u_{j, j + 1},

ψ_{ij} = 2 ℓ = 0 \sum i - 1 (i - ℓ) u_{i, ℓ} u_{j, i + j - ℓ} + 2 c γ_{2} ij β_{i} β_{j} + 2 j β_{j} u_{i, i + 1} + 2 i β_{i} u_{j, j + 1},

H (θ) = w_{1} δ_{a_{1}} + \dots + w_{d} δ_{a_{d}}, θ = (a_{1}, w_{1}, \dots, a_{d - 1}, w_{d - 1}) \in Θ,

H (θ) = w_{1} δ_{a_{1}} + \dots + w_{d} δ_{a_{d}}, θ = (a_{1}, w_{1}, \dots, a_{d - 1}, w_{d - 1}) \in Θ,

g_{1} : γ_{2 d - 1} \to θ and g_{2, j} : β_{j} \to γ_{j}

g_{1} : γ_{2 d - 1} \to θ and g_{2, j} : β_{j} \to γ_{j}

\hat{β}_{j}^{*} = \hat{β}_{j} - \frac{1}{p} (\overset{v}{^}_{2}, \dots, \overset{v}{^}_{j})^{'}, \hat{γ}_{j}^{*} = g_{2, j} (\hat{β}_{j}^{*}), and \hat{θ}_{n}^{*} = g_{1} (\hat{γ}_{2 d - 1}^{*}),

\hat{β}_{j}^{*} = \hat{β}_{j} - \frac{1}{p} (\overset{v}{^}_{2}, \dots, \overset{v}{^}_{j})^{'}, \hat{γ}_{j}^{*} = g_{2, j} (\hat{β}_{j}^{*}), and \hat{θ}_{n}^{*} = g_{1} (\hat{γ}_{2 d - 1}^{*}),

\displaystyle p\big{(}\hat{\boldsymbol{\gamma}}_{j}^{*}-{\boldsymbol{\gamma}}_{j}\big{)}

\displaystyle p\big{(}\hat{\boldsymbol{\gamma}}_{j}^{*}-{\boldsymbol{\gamma}}_{j}\big{)}

\displaystyle p\big{(}\hat{\theta}_{n}^{*}-{\boldsymbol{\theta}}\big{)}

H_{0} : d \leq d_{0} v . s . H_{1} : d > d_{0},

H_{0} : d \leq d_{0} v . s . H_{1} : d > d_{0},

Γ = 1 γ_{1} ⋮ γ_{d_{0}} γ_{1} γ_{2} ⋮ γ_{d_{0} + 1} \dots \dots \dots γ_{d_{0}} γ_{d_{0} + 1} ⋮ γ_{2 d_{0}} and Γ = 1 \overset{γ}{^}_{1} ⋮ \overset{γ}{^}_{d_{0}} \overset{γ}{^}_{1} \overset{γ}{^}_{2} ⋮ \overset{γ}{^}_{d_{0} + 1} \dots \dots \dots \overset{γ}{^}_{d_{0}} \overset{γ}{^}_{d_{0} + 1} ⋮ \overset{γ}{^}_{2 d_{0}} .

Γ = 1 γ_{1} ⋮ γ_{d_{0}} γ_{1} γ_{2} ⋮ γ_{d_{0} + 1} \dots \dots \dots γ_{d_{0}} γ_{d_{0} + 1} ⋮ γ_{2 d_{0}} and Γ = 1 \overset{γ}{^}_{1} ⋮ \overset{γ}{^}_{d_{0}} \overset{γ}{^}_{1} \overset{γ}{^}_{2} ⋮ \overset{γ}{^}_{d_{0} + 1} \dots \dots \dots \overset{γ}{^}_{d_{0}} \overset{γ}{^}_{d_{0} + 1} ⋮ \overset{γ}{^}_{2 d_{0}} .

p (det (Γ) - det (Γ)) D N (0, σ^{2}),

p (det (Γ) - det (Γ)) D N (0, σ^{2}),

T_{n} := \frac{p det ( Γ )}{σ ^ _{H_{0}}} D N (0, 1),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Statistical Methods and Bayesian Inference · Point processes and geometric inequalities

Full text

On spectral properties of high-dimensional spatial-sign covariance matrices in elliptical distributions with applications

Weiming Li Wang Zhou

School of Statistics and Management, Shanghai University of Finance and Economics, Guoding Road No. 777, Shanghai, 200433, China.

[email protected]

Department of Statistics and Applied Probability, National University of Singapore, Singapore

[email protected]

Abstract.

Spatial-sign covariance matrix (SSCM) is an important substitute of sample covariance matrix (SCM) in robust statistics. This paper investigates the SSCM on its asymptotic spectral behaviors under high-dimensional elliptical populations, where both the dimension $p$ of observations and the sample size $n$ tend to infinity with their ratio $p/n\to c\in(0,\infty)$ . The empirical spectral distribution of this nonparametric scatter matrix is shown to converge in distribution to a generalized Marčenko-Pastur law. Beyond this, a new central limit theorem (CLT) for general linear spectral statistics of the SSCM is also established. For polynomial spectral statistics, explicit formulae of the limiting mean and covarance functions in the CLT are provided. The derived results are then applied to an estimation procedure and a test procedure for the spectrum of the shape component of population covariance matrices.

Key words and phrases:

Spatial-sign, Covariance matrix, High-dimensional data, Elliptical distribution.

2010 Mathematics Subject Classification:

Primary 62H10; Secondary 62H15

Li’s work was partially supported by National Natural Science Foundation of China, No. 11401037 and Program of IRTSHUFE.Zhou’s work was partially supported by the MOE Tier 2 grant MOE2015-T2-2-039 (R-155-000-171-112) at the National University of Singapore.

1. Introduction

Elliptical family of distributions, originally introduced in [20], is an important extension of the multivariate normal distribution and has been broadly applied in biology, finance and economics, signal and image processing, etc. [14, 17]. A random vector ${\mathbf{x}}$ with zero mean is said to be elliptically distributed if it has a stochastic representation [14]:

[TABLE]

where ${\mathbf{A}}$ is a $p\times p$ matrix with $rank({\mathbf{A}})=p$ , $w\geq 0$ is a scalar random variable representing the radius of ${\mathbf{x}}$ , and ${\mathbf{u}}\in\mathbb{R}^{p}$ is the random direction, independent of $w$ and uniformly distributed on the unit sphere in $\mathbb{R}^{p}$ . Besides the normal distribution, this family includes many other celebrated distributions, such as multivariate $t$ -distribution, Kotz-type distributions, and Gaussian scale mixture. In general, the radius $w$ needs not be independent of the direction ${\mathbf{u}}$ but can be a function of the chosen direction [35].

Let ${\mathbf{x}}_{1},\ldots,{\mathbf{x}}_{n}$ be a sequence of independent and identically distributed (i.i.d.) random vectors from the elliptical model in (1.1). Many statistical procedures for this model prefer to transform the original observations into spatial-sign samples for the purpose of robustness, which are defined as

[TABLE]

One can refer to [26] and [29] for a comprehensive review. When an inference is concerned with the shape matrix ${\mathbf{T}}={\mathbf{A}}{\mathbf{A}}^{\prime}$ , assuming ${\text{\rm tr}}({\mathbf{T}})=p$ so that $w$ and ${\mathbf{A}}$ can be identified in the model (1.1), one of the most important statistics is the so-called spatial-sign covariance matrix (SSCM), i.e.

[TABLE]

which is actually the sample covariance matrix (SCM) of $({\mathbf{y}}_{j})$ . As a robust alternative to the SCM ${\mathbf{S}}_{n}=\sum_{j=1}^{n}{\mathbf{x}}_{j}{\mathbf{x}}_{j}^{\prime}/n$ , this nonparametric scatter matrix ${\mathbf{B}}_{n}$ is a fast computed and orthogonally equivariant statistic with high breakdown point, and thus is highly recommended in applications, such as principle component analysis and structural test for covariance matrices, see [23], [16], [39], [31], to name a few. Despite its merits, the SSCM is also a controversial statistic in “ small $p$ , large $n$ ” scenarios due to its lack of affine equivariance [27]. However, the pursuit of this property seems not advisable for high-dimensional situations, as claimed in [38] that any well-defined affine equivariant scatter matrix must be proportional to the SCM ${\mathbf{S}}_{n}$ whenever $p>n$ . Therefore, it is of great interests to discover behaviors of the SSCM in high-dimensional robust statistics.

In this paper, using tools of random matrix theory, we investigate asymptotic spectral behaviors of the SSCM ${\mathbf{B}}_{n}$ in high-dimensional frameworks where both the dimension $p$ and the sample size $n$ tend to infinity with their ratio $p/n\rightarrow c$ , a positive constant in $(0,\infty)$ . Specifically, let $(\lambda_{j})_{1\leq j\leq p}$ be the eigenvalues of ${\mathbf{B}}_{n}$ , then the empirical spectral distribution (ESD) of ${\mathbf{B}}_{n}$ is by definition

[TABLE]

where $\delta_{b}$ denotes the Dirac mass at $b$ . Our aim is to study the limiting properties of $F_{n}$ and the central limit theorem (CLT) for linear spectral statistics (LSS) of the form $\int f(x)dF_{n}(x)$ for a class of smooth test functions $f$ . These properties may become powerful tools to recover spectral features of the population SSCM, i.e. $\Sigma=p{\rm E}({\mathbf{x}}{\mathbf{x}}^{\prime}/||{\mathbf{x}}||^{2})$ , and then those of the shape matrix ${\mathbf{T}}$ since the matrices $\Sigma$ and ${\mathbf{T}}$ share the same eigenvectors and their eigenvalues have a one-to-one correspondence [9]. Moreover, as $p\rightarrow\infty$ , the two matrices coincide in the sense that the spectral norm $||\Sigma-{\mathbf{T}}||\to 0$ , as long as $||\Sigma||$ (or $||{\mathbf{T}}||$ ) is uniformly bounded, see Lemma 4.1.

Spectral properties of high-dimensional SCM have been extensively studied in random matrix theory since the pioneer work of [25]. The standard model in the literature has the form

[TABLE]

where ${\mathbf{A}}$ is as before, $\sigma$ is a constant, and ${\mathbf{z}}=(z_{1},\ldots,z_{p})^{\prime}\in\mathbb{R}^{p}$ is a set of i.i.d. random variables satisfying E $(z_{1})=0$ , E $(z_{1}^{2})=1$ , and E $(z_{1}^{4})<\infty$ . Let $\widetilde{{\mathbf{x}}}_{1},\ldots,\widetilde{{\mathbf{x}}}_{n}$ be $n$ i.i.d. copies of $\widetilde{{\mathbf{x}}}$ and $\widetilde{\mathbf{S}}_{n}=\sum_{j=1}^{n}\widetilde{{\mathbf{x}}}_{j}\widetilde{{\mathbf{x}}}_{j}^{\prime}/n$ be the corresponding SCM. It has been known that the ESD of $\widetilde{\mathbf{S}}_{n}$ converges to the celebrated Marčenko-Pastur (MP) law when ${\mathbf{A}}=I_{p}$ , and generalized MP law for general matrix ${\mathbf{A}}$ , as $(n,p)\to\infty$ with $p/n\to c>0$ . One can refer to [25] and [36]. The CLT for LSS of $\widetilde{\mathbf{S}}_{n}$ was first studied in [19] by assuming the population to be standard multivariate normal. One breakthrough on the CLT was obtained by [3], where the population is allowed to be general with E $(z_{1}^{4})=3$ . This fourth moment condition was then weakened to be E $(z_{1}^{4})<\infty$ in [30]. For more references, one can refer to [4], [2], [15], and references therein. However, these results do not apply to general elliptical populations since the two underlying models in (1.1) and (1.2) have little in common, except for normal distributions. In fact, for general elliptical populations, it has been reported that the ESD of the SCM ${\mathbf{S}}_{n}$ converges to a deterministic distribution that is not a generalized MP law, but has to be characterized by both the distribution of $w$ and the limiting spectrum of ${\mathbf{T}}$ through a system of implicit equations [11, 24]. The involvement of $w$ seriously interferes with our understanding of the spectrum of ${\mathbf{T}}$ from the ESD of ${\mathbf{S}}_{n}$ . This again motivates us to shift our attention to the SSCM ${\mathbf{B}}_{n}$ which discards the random radiuses $(w_{j})$ and focus only on the directions $({\mathbf{A}}{\mathbf{u}}_{j})$ .

The main contributions of this paper are as follows. First in Section 2, asymptotic results on the eigenvalues of ${\mathbf{B}}_{n}$ are derived, including the limit of the ESD $F_{n}$ and a new CLT for LSS of ${\mathbf{B}}_{n}$ . As a corollary, polynomial spectral statistics are fully addressed with explicit limiting mean and covariance functions in the CLT. Then in Section 3, relying on these results, we develop two statistical applications on the spectrum of $\Sigma$ , the population SSCM, under a setting that the spectrum forms a discrete distribution with finite support. One is to estimate the spectrum of $\Sigma$ through moment methods and the other is to test the hypothesis that there are no more than $d_{0}$ distinct eigenvalues of $\Sigma$ . Technical proofs of the main theorems are gathered in Section 4. Some lemmas and their necessary proofs are postponed to the last section.

2. High-dimensional theory for eigenvalues of ${\mathbf{B}}_{n}$

2.1. Limiting spectral distribution of ${\mathbf{B}}_{n}$

We consider here the limit of the ESD sequence $(F^{{\mathbf{B}}_{n}})$ in high-dimensional regimes, namely limiting spectral distribution (LSD). Our main assumptions are listed below.

Assumption (a). Both the sample size and population dimension $n,p$ tend to infinity in such a way that $c_{n}=p/n\to c\in(0,\infty)$ .

Assumption (b). Sample observations are ${\mathbf{y}}_{j}=\sqrt{p}{\mathbf{A}}{\mathbf{u}}_{j}/||{\mathbf{A}}{\mathbf{u}}_{j}||$ , $j=1,\ldots,n,$ where ${\mathbf{A}}$ is a $p\times p$ matrix with ${\mathbf{A}}{\mathbf{A}}^{\prime}={\mathbf{T}}$ and $({\mathbf{u}}_{j})$ are i.i.d. random vectors, uniformly distributed on the unit sphere in $\mathbb{R}^{p}$ .

Assumption (c). The spectral norm of $\Sigma={\rm E}({\mathbf{y}}_{1}{\mathbf{y}}_{1}^{\prime})$ is bounded and its spectral distribution $H_{p}$ converges weakly to a probability distribution $H$ , called population spectral distribution (PSD).

From Lemma 4.1, it is clear that the spectral distributions of $\Sigma$ and ${\mathbf{T}}$ are asymptotically identical. So one can certainly replace $\Sigma$ with ${\mathbf{T}}$ in Assumption (c), which does not affect the LSD of $F^{{\mathbf{B}}_{n}}$ . However we keep $\Sigma$ because it is easy to describe the CLT for LSS using the spectral distribution $H_{p}$ of $\Sigma$ .

For the characterization of the LSD of $F^{{\mathbf{B}}_{n}}$ , we need to introduce the Stieltjes transform of a measure $G$ on the real line, which is defined as

[TABLE]

where $S_{G}\subset\mathbb{R}$ denotes the support of $G$ .

Theorem 2.1.

Suppose that Assumptions (a)-(c) hold. Then, almost surely, the empirical spectral distribution $F^{{\mathbf{B}}_{n}}$ converges weakly to a probability distribution $F^{c,H}$ , whose Stieltjes transform $m=m(z)$ is the unique solution to the equation

[TABLE]

in the set $\{m\in\mathbb{C}:-(1-c)/z+cm\in{\mathbb{C}^{+}}\}$ where $\mathbb{C}^{+}\equiv\{z\in\mathbb{C}:\Im(z)>0\}$ .

The LSD $F^{c,H}$ defined in (2.1) agrees with that in [25]. Let $\underline{m}=\underline{m}(z)$ denote the Stieltjes transform of $\underline{F}^{c,H}=cF^{c,H}+(1-c)\delta_{0}$ . Then (2.1) can also be represented as

[TABLE]

See [36]. For procedures on finding the density function and the support set of $F^{c,H}$ from (2.1) and (2.2), one is referred to [4].

2.2. CLT for linear spectral statistics of ${\mathbf{B}}_{n}$

Let $F^{c_{n},H_{p}}$ be the LSD as defined in (2.2) with the parameters $(c,H)$ replaced by $(c_{n},H_{p})$ . Writing $G_{n}=F^{{\mathbf{B}}_{n}}-F^{c_{n},H_{p}}$ , we next study the fluctuation of

[TABLE]

which is a centralized linear spectral statistic with analytic $f$ .

Theorem 2.2.

Suppose that Assumptions (a)-(c) hold. Let $f_{1},\ldots,f_{k}$ be $k$ functions analytic on an open interval containing

[TABLE]

Then the random vector

[TABLE]

converges weakly to a Gaussian vector $(X_{f_{1}},\ldots,X_{f_{k}})$ , whose mean function is

[TABLE]

and covariance function is

[TABLE]

$(f,g\in\{f_{1},\cdots,f_{k}\})$ , where the contours $\mathcal{C}_{1}$ and $\mathcal{C}_{2}$ are non-overlapping, closed, counter-clockwise orientated in the complex plane, and each encloses the support of the LSD $F^{c,H}$ .

When the underlying population is multivariate normal, the elliptical model in (1.1) and the linear transformation model in (1.2) hold simultaneously. In this case, it is interesting to compare the limiting distribution in Theorem 2.2 based on SSCM with the classical result in [3] based on SCM. It turns out that there are some additional terms in our new CLT: the second contour integral in the mean function and the second to fourth summands in the covariance function.

Among all LSS, polynomial spectral statistics are of fundamental importance. The bases of these statistics are moments of ESD $F^{{\mathbf{B}}_{n}}$ , i.e.

[TABLE]

The first order moment $\hat{\beta}_{n1}$ is 1 since ${\text{\rm tr}}({\mathbf{B}}_{n})\equiv{\text{\rm tr}}(\Sigma)\equiv p$ . Other moments $(\hat{\beta}_{nj})$ , $j\geq 2$ , are random. Their limiting behavior can be described through the following two quantities

[TABLE]

as well as their limits, denoted by $\beta_{j}$ and $\gamma_{j}$ , respectively, $j=1,2,\ldots.$ From [28], the quantities $(\beta_{nj})$ and $(\gamma_{nj})$ are connected through the recursive formulae:

[TABLE]

and $\beta_{n1}=\gamma_{n1}\equiv 1$ , where the sum runs over the following partitions of $j$ :

[TABLE]

and $\phi(i_{1},\ldots,i_{j})=j!/[i_{1}!\cdots i_{j}!(j+1-i_{1}-\cdots-i_{j})!].$ The joint limiting distribution of moments $(\hat{\beta}_{nj})_{2\leq j\leq k}$ can be derived from Theorem 2.2 by taking functions $f_{j}(x)=x^{j},j=2,\ldots,k$ . For this particular case, the mean and covariance functions in the limiting distribution can be explicitly formulated.

Corollary 2.1.

Suppose that Assumptions (a)-(c) hold. Then the random vector

[TABLE]

The mean vector $v=(v_{j})_{2\leq j\leq k}$ satisfies

[TABLE]

where $P_{s,t}=\int x^{s}(1+xz)^{-t}dH(x)$ , $P=(czP_{1,1}-1)$ , and $g^{(\ell)}(z)$ denotes the $\ell$ th derivative of $g(z)$ with respect to $z$ . The covariance matrix $\Psi=(\psi_{ij})_{2\leq i,j\leq k}$ has entries

[TABLE]

where $u_{s,t}=[P^{s}]^{(t)}/t!|_{z=0}$ .

3. Applications to spectral inference

Inference on PSD is fundamentally important in many high-dimensional statistical analysis, such as the principal component analysis [18, 8, 40], factor models [12, 13], and covariance matrix estimation [21].

In this section, we illustrate two statistical applications of the theoretical results developed in Section 2: one is estimating a PSD and the other is testing the order of a PSD. The family of PSDs under study is a class of parameterized discrete distributions with finite support on $\mathbb{R}^{+}$ , that is,

[TABLE]

where ${\Theta}=\left\{{\boldsymbol{\theta}}:0<a_{1}<\cdots<a_{d}<\infty;\ 0<\prod_{i=1}^{d}w_{i},\ \sum_{i=1}^{d}a_{i}^{\ell}w_{i}=1,\ell=0,1\right\}.$ Here the restriction $\sum_{i=1}^{d}a_{i}w_{i}=1$ is due to the fact that $\int tdH_{p}(t)={\text{\rm tr}}(\Sigma)/p\equiv 1$ . For the model (3.1), the order of $H$ refers to the cardinality of its support, which is equal to $d$ . This model for PSDs can be viewed as the spectral structure of noise covariance matrices in factor models [12], and extensions of the spiked model [18] which allows the number of leading eigenvalues to grow with the dimension $p$ . More discussions on this model can be found in [10], [34], [1], [22], etc. Similar to [10], we adopt the setting of fixed PSDs in this section, i.e. $(c_{n},H_{p})\equiv(c,H)$ for all $(n,p)$ large.

3.1. Estimation of a PSD

For the model in (3.1), [1] introduced a moment method for the PSD estimation. By assuming the order $d$ to be known, their method first estimates the moments $(\gamma_{j})$ of $H$ through the recursive formulae in (2.3), and then solve a system of moment equations, $\{\hat{\gamma}_{j}=\sum_{i=1}^{d}a_{i}^{j}w_{i},\ j=0,\ldots,2d-1\},$ to get a consistent estimator of ${\boldsymbol{\theta}}$ .

In our situation, with notation ${\boldsymbol{\beta}}_{j}=(\beta_{2},\ldots,\beta_{j})^{\prime}$ and ${\boldsymbol{\gamma}}_{j}=(\gamma_{2},\ldots,\gamma_{j})^{\prime}$ for $j\geq 2$ , we denote

[TABLE]

as the mappings between the corresponding vectors. These two mappings are both one-to-one and the determinants of their Jacobian matrices are all nonzero. See [1]. Therefore, applying Theorem 2.1, $\hat{\boldsymbol{\beta}}_{j}:=(\hat{\beta}_{n2},\ldots,\hat{\beta}_{nj})^{\prime}\xrightarrow{a.s.}{\boldsymbol{\beta}}_{j}$ which is followed by $\hat{\boldsymbol{\theta}}_{n}:=g_{1}\circ g_{2,2d-1}(\hat{\boldsymbol{\beta}}_{2d-1})\xrightarrow{a.s.}{\boldsymbol{\theta}}$ , as $(n,p)\to\infty$ . However, as shown by the CLT in Corollary 2.1, the estimator $\hat{\boldsymbol{\beta}}_{j}$ is biased by the order of $O(1/p)$ . So it’s natural to modify $\hat{\boldsymbol{\beta}}_{j}$ by subtracting its limiting mean in the CLT to obtain a better estimator of ${\boldsymbol{\theta}}$ . Beyond this correction, the CLT can also provide confidence regions for the parameter ${\boldsymbol{\theta}}$ .

Denote the modified estimators of ${\boldsymbol{\beta}}_{j}$ , ${\boldsymbol{\gamma}}_{j}$ , and ${\boldsymbol{\theta}}$ by

[TABLE]

respectively, where $\hat{v}_{\ell}=v_{\ell}(\hat{\boldsymbol{\beta}}_{\ell})$ with $v_{\ell}$ defined in Corollary 2.1 for $\ell=2,\ldots,j.$ From Theorem 2.1, Corollary 2.1, and a standard application of the Delta method, one may easily get asymptotic properties of these estimators.

Theorem 3.1.

Suppose that Assumptions (a)-(c) hold and the true value ${\boldsymbol{\theta}}$ is an inner point of $\Theta$ . Then we have $\hat{\boldsymbol{\beta}}_{j}^{*}\xrightarrow{a.s.}{\boldsymbol{\beta}}_{j}$ , $\hat{\boldsymbol{\gamma}}_{j}^{*}\xrightarrow{a.s.}{\boldsymbol{\gamma}}_{j}$ , $\hat{\boldsymbol{\theta}}_{n}^{*}\xrightarrow{a.s.}{\boldsymbol{\theta}}$ , and moreover

[TABLE]

where $J_{1}$ and $J_{2,\ell}$ represent the Jacobian matrices $\partial g_{1}/\partial{\boldsymbol{\gamma}}_{2d-1}$ and $\partial g_{2,\ell}/\partial{\boldsymbol{\beta}}_{\ell}$ , respectively, and $\Psi_{\ell}$ is defined in Corollary 2.1 with $k=\ell$ .

3.2. Test for the order of a PSD

The aforementioned estimation procedure requires that the order $d$ of the PSD be pre-specified. In general, this prior knowledge should be testified in advance. To deal with this problem, we consider the hypotheses

[TABLE]

where $d_{0}\geq 1$ is a known constant. These hypotheses can also be regarded as a generalization of the well-known sphericity hypotheses on covariance matrices, i.e. the case $d_{0}=1$ .

In [32], a test procedure was outlined based on a moment matrix $\Gamma$ and its estimator $\widehat{\Gamma}$ which can be formulated as

[TABLE]

Here we set $\hat{\gamma}_{1}=1$ and $\hat{\gamma}_{j}=\hat{\gamma}^{*}_{j}$ , as defined in (3.2), for $j\geq 2$ . It has been proved that the determinant $\det(\Gamma)$ of $\Gamma$ is zero if the null hypothesis in (3.4) holds, otherwise $\det(\Gamma)$ is strictly positive [22]. Therefore, the determinant $\det(\widehat{\Gamma})$ can serve as a test statistic for (3.4) and the null hypothesis shall be rejected if the statistic is significantly greater than zero. Applying Theorem 3.1 and the main theorem in [32], the asymptotic distribution of $\det(\widehat{\Gamma})$ is obtained immediately.

Theorem 3.2.

Suppose that Assumptions (a)-(c) hold. Then the statistic $\det(\widehat{\Gamma})$ is asymptotically normal, i.e.

[TABLE]

where $\sigma^{2}=\alpha^{\prime}V\Omega V^{\prime}\alpha$ with $\alpha=vec(adj(\Gamma))$ , the vectorization of the adjugate matrix of $\Gamma$ . The first two rows and columns of the $(2d_{0}+1)\times(2d_{0}+1)$ matrix $\Omega$ consist of zero and the remaining submatrix $J_{2,2d_{0}}\Psi_{2d_{0}}J_{2,2d_{0}}^{\prime}$ is defined in (3.3). The $(d_{0}+1)^{2}\times(2d_{0}+1)$ matrix $V=(v_{ij})$ is a 0-1 matrix with only $v_{i,a_{i}}=1$ , $a_{i}=i-\lfloor(i-1)/(d_{0}+1)\rfloor d_{0}$ , $i=1,\ldots,(d_{0}+1)^{2}$ , where $\lfloor x\rfloor$ denotes the greatest integer not exceeding $x$ .

From Theorem 3.1, the limiting variance $\sigma^{2}$ in (3.5) is a continuous function of ${\boldsymbol{\gamma}}_{4d_{0}}$ . While, under the null hypothesis, this variance is a function of ${\boldsymbol{\gamma}}_{2d_{0}-1}$ , denoted by $\sigma^{2}_{H_{0}}({\boldsymbol{\gamma}}_{2d_{0}-1})$ . Let $\hat{\sigma}_{H_{0}}^{2}=\sigma^{2}_{H_{0}}(\hat{\boldsymbol{\gamma}}^{*}_{2d_{0}-1})$ . Then it is a strongly consistent estimator of $\sigma_{H_{0}}^{2}({\boldsymbol{\gamma}}_{2d_{0}-1})$ .

Corollary 3.1.

Suppose that Assumptions (a)-(c) hold. Then, under the null hypothesis,

[TABLE]

as $n\to\infty$ . In addition, the asymptotic power of $T_{n}$ tends to 1.

Corollary 3.1 follows directly from Theorem 3.2 and its proof is thus omitted. This corollary includes as a particular case the sphericity test. For this case, the test statistic reduces to $T_{n}=n(\hat{\gamma}^{*}_{2}-1)/2$ and its null distribution is consistent with that in [31].

3.3. Simulation experiments

Simulations are carried out to evaluate the performance of proposed estimation and test for discrete PSDs in (3.1). Samples of $(z_{ij})$ are drawn from $N(0,1)$ and all statistics are calculated from 10,000 independent replications.

The estimation procedure are conducted for two PSDs, Models 1 and 2: Model 1 is of order 2 with the dimension to sample size ratio $c=2$ and Model 2 is of order 3 with the ratio $c=1/4$ .

•

Model 1: $H_{1}=0.5\delta_{0.5}+0.5\delta_{1.5}$ and $c=2$ .

•

Model 2: $H_{2}=0.3\delta_{0.2}+0.4\delta_{1}+0.3\delta_{1.8}$ and $c=1/4$ .

The sample size is $n=100,200,400$ for Model 1 and $n=400,800,1600$ for Model 2, respectively. In addition to empirical means and standard deviations of all estimators, we also calculate 95% confidence intervals for all parameters and report their coverage probabilities. Results are collected in Tables 1 and 2, which clearly demonstrate the consistency of all estimators as the sample size $n$ become large.

Next we examine the test for the order of a PSD. Two models are employed for this experiment:

•

Model 3: $H_{3}=0.5\delta_{1-x}+0.5\delta_{1+x}$ ,

•

Model 4: $H_{4}=0.25\delta_{0.5-x}+0.25\delta_{0.5+x}+0.25\delta_{1.5-x}+0.25\delta_{1.5+x}$ ,

where the parameter $x\in[0,0.5)$ represents the distance between the null and alternative hypotheses. In particular, Model 3 is used for testing $H_{0}:d\leq 1$ (sphericity test) with $x$ ranging from 0 to 0.2 by a step 0.18 and Model 4 is for testing $H_{0}:d\leq 2$ with $x$ ranging from 0 to 0.45 by a step 0.05. The sample size is taken as $n=400$ , the dimension-sample size ratio is $c=1/2,1,2$ , and the significance level is fixed at $\alpha=0.05$ . Results summarized in Table 3 show that the proposed test has accurate empirical size and its power tends to 1 as the parameter $x$ increases under the two models. Different from the sphericity test, the power for Model 2 declines significantly as the ratio $c$ increases. This phenomenon is consistent with that based on SCM depicted in [32].

4. Proofs

4.1. Some key lemmas

We present three lemmas which form the core basis for the proofs of Theorems 2.1 and 2.2.

Lemma 4.1.

Let ${\mathbf{x}}=(x_{1},\ldots,x_{p})^{\prime}\sim N_{p}(0,{\mathbf{T}})$ where ${\mathbf{T}}=diag(\sigma_{1}^{2},\ldots,\sigma_{p}^{2})$ is a diagonal matrix with the spectral norm $||{\mathbf{T}}||$ bounded. Write $r_{k}=\sum_{i=1}^{p}\sigma_{i}^{2k}/p$ , $k=1,2$ . Then we have for $1\leq i\neq j\leq p$ ,

[TABLE]

Proof.

As three expectations can be evaluated through a similar way, we only present the details for the second one as an illustration. Replacing the denominator of the quantity inside the expectation by $r_{1}^{2}$ and making their difference yields

[TABLE]

where

[TABLE]

Taking expectations of $A_{p}$ and $B_{p}$ , we get

[TABLE]

which combined with (4.1) gives

[TABLE]

∎

Lemma 4.2.

Let ${\mathbf{y}}=\sqrt{p}{\mathbf{x}}/||{\mathbf{x}}||$ where ${\mathbf{x}}$ is as defined in Lemma 4.1 such that ${\rm E}({\mathbf{y}}{\mathbf{y}}^{\prime})=\Sigma$ . For any $p\times p$ complex matrices ${\mathbf{C}}$ and $\tilde{\mathbf{C}}$ with bounded spectral norms,

[TABLE]

where $\gamma_{2}={\text{\rm tr}}\Sigma^{2}/p$ .

Proof.

By symmetry, ${\rm E}(y_{i}^{3}y_{j})={\rm E}(y_{i}^{2}y_{j}y_{k})={\rm E}(y_{i}y_{j}y_{k}y_{l})=0$ for $1\leq i\neq j\neq k\neq l\leq p$ . Write ${\mathbf{C}}=(C_{ij})$ and $\tilde{\mathbf{C}}=(\tilde{C}_{ij})$ , we thus get

[TABLE]

From Lemma 1, we have

[TABLE]

From the above quantities and (4.2), we obtain

[TABLE]

On the other hand, from the first conclusion of Lemma 1, one may derive that

[TABLE]

for any $p\times p$ matrix ${\mathbf{M}}$ with bounded spectral norm, which implies

[TABLE]

Therefore,

[TABLE]

Finally, from (4.3), we may replace ${\mathbf{T}}$ with $r_{1}\Sigma$ and replace $r_{2}/r_{1}^{2}$ with ${\text{\rm tr}}(\Sigma^{2})/p$ in the above expression and then obtain the result of the Lemma. ∎

Let $v_{0}>0$ be arbitrary, $x_{r}$ any number greater than $\limsup_{p\rightarrow\infty}\lambda_{\max}^{\Sigma}(1+\sqrt{c})^{2}$ , and $x_{l}$ any negative number if $\liminf_{p\rightarrow\infty}\lambda_{\min}^{\Sigma}(1-\sqrt{c})^{2}I_{(0,1)}(c)=0$ , otherwise choose $x_{l}\in(0,\liminf_{p\rightarrow\infty}\lambda_{\min}^{\Sigma}(1-\sqrt{c})^{2})$ . Define a contour $\mathcal{C}$ as

[TABLE]

Let $m_{0}(z)$ and $\underline{m}_{0}(z)$ be the Stieltjes transforms of $F^{c_{n},H_{p}}$ and $c_{n}F^{c_{n},H_{p}}+(1-c_{n})\delta_{0}$ . Our next aim is to study the fluctuation of the random process

[TABLE]

For this, we define a truncated version $\widehat{M}_{n}(z)$ of $M_{n}(z)$ as

[TABLE]

where $\mathcal{C}_{n}=\{x\pm iv_{0}:x\in[x_{l},x_{r}]\}\cup\{x\pm iv:x\in\{x_{l},x_{r}\},v\in[n^{-1}\varepsilon_{n},v_{0}]\}$ and the sequence $(\varepsilon_{n})$ decreasing to zero satisfying $\varepsilon_{n}>n^{-a}$ for some $a\in(0,1)$ .

Lemma 4.3.

Under Assumptions (a)-(c), the random process $\widehat{M}_{n}(\cdot)$ converges weakly to a two-dimensional Gaussian process $M(\cdot)$ satisfying for $z,z_{1},z_{2}\in\mathcal{C}$ ,

[TABLE]

and covariance function

[TABLE]

Proof.

Split $\widehat{M}_{n}(z)$ into two parts, $\widehat{M}_{n}(z)=M_{n}^{(1)}(z)+M_{n}^{(2)}(z)$ , where

[TABLE]

Following the strategy in [3], we prove the convergence of $\widehat{M}_{n}(z)$ by three steps:

Step 1: Finite dimensional convergence of $M_{n}^{(1)}(z)$ in distribution;
Step 2: Tightness of $M_{n}^{(1)}(z)$ on $\mathcal{C}_{n}$ ;
Step 3: Convergence of $M_{n}^{(2)}(z)$ .

Without loss of generality, we assume $\|\Sigma\|\leq 1$ for all $p$ . Constants appearing in inequalities will be denoted by $K$ which may take different values from one expression to the next.

Step 1: Finite dimensional convergence of $M_{n}^{(1)}(z)$ in distribution. We show in this part, for any $w$ complex numbers $z_{1},\ldots,z_{w}\in\mathcal{C}_{n}$ , the random vector

[TABLE]

converges in distribution to a Gaussian vector. We begin with introducing some notation which will be frequently used in the sequel.

[TABLE]

Note that, for any $z=u+iv\in\mathbb{C}^{+}$ , the last three quantities are bounded in absolute value by $|z|/v$ .

Let ${\rm E}_{0}(\cdot)$ denote expectation and ${\rm E}_{j}(\cdot)$ denote conditional expectation with respect to the $\sigma$ -field generated by ${\mathbf{r}}_{1},\ldots,{\mathbf{r}}_{j}$ . From the martingale decomposition and the identity

[TABLE]

we have

[TABLE]

Writing $\beta_{j}(z)=\bar{\beta}_{j}(z)-\bar{\beta}_{j}(z)\beta_{j}(z)\varepsilon_{j}(z)=\bar{\beta}_{j}(z)-\bar{\beta}_{j}^{2}\varepsilon_{j}(z)+\bar{\beta}_{j}^{2}(z)\beta_{j}(z)\varepsilon_{j}^{2}(z)$ , we have

[TABLE]

Note that

[TABLE]

which is $o(1)$ from Lemma 5.1. Similarly, ${\rm E}|\sum_{j=1}^{n}({\rm E}_{j}-{\rm E}_{j-1})\bar{\beta}_{j}^{2}(z)\beta_{j}(z)\varepsilon_{j}^{2}(z){\mathbf{r}}_{j}^{\prime}{\mathbf{D}}_{j}^{-2}(z){\mathbf{r}}_{j}|^{2}=o(1)$ . Thus we get

[TABLE]

which implies that we need only to consider the limiting distribution of

[TABLE]

in finite dimensional situations. For any $\epsilon>0$ ,

[TABLE]

which tends to zero according to Lemma 5.1 and thus verifies the Lyapunov condition. Therefore, from the martingale CLT (Lemma 5.4), the random vector in (4.8) will tend to a Gaussian vector $(M^{(1)}(z_{1}),\ldots,M^{(1)}(z_{w}))$ with covariance function

[TABLE]

provided this limit exits. By the same arguments in page 571 of [3], it is sufficient to show that

[TABLE]

converges in probability. Since

[TABLE]

where the last inequality is from

[TABLE]

for any $p\times p$ matrix ${\mathbf{M}}$ , see Lemma 2.6 in [37]. Moreover, from the definition of $\underline{m}_{0}(z)$ and discussions in Page 439 in [5], we also have

[TABLE]

It is hence sufficient to study the convergence of

[TABLE]

whose second mixed partial derivative yields the limit of (4.11). From Lemma 2, we know that

[TABLE]

where

[TABLE]

Now we consider the limit of $T_{1}$ . Let

[TABLE]

Note that

[TABLE]

From the equality ${\mathbf{r}}_{i}^{\prime}{\mathbf{D}}_{j}^{-1}(z)=\beta_{ij}(z){\mathbf{r}}_{i}^{\prime}{\mathbf{D}}_{ij}^{-1}(z)$ , we get

[TABLE]

where

[TABLE]

For any $p\times p$ matrix ${\mathbf{M}}$ , let $|||{\mathbf{M}}|||$ denote a non-random upper bound for the spectral norm of ${\mathbf{M}}$ . From Lemma 5.1, (4.14), and (4.18), we get

[TABLE]

where the matrix $M$ in the first two inequalities is assumed nonrandom.

Using the equality (4.9) we write

[TABLE]

where

[TABLE]

From (4.14) and (4.18) we get $|R_{12}(z_{1},z_{2})|\leq(1+p/(nv))/v^{3}$ and ${\rm E}|R_{13}(z_{1},z_{2})|\leq n^{1/2}(1+p/(nv))/v^{3}$ . Using Lemma 5.1 we have, for $i<j$ ,

[TABLE]

and by (4.14),

[TABLE]

These imply that

[TABLE]

Therefore, from (4.19)-(4.24),

[TABLE]

where ${\rm E}|R_{14}(z_{1},z_{2})|\leq Kn^{1/2}$ . From this and applying (4.19)-(4.24) again, we get

[TABLE]

where ${\rm E}|R_{15}(z_{1},z_{2})|\leq Kn^{1/2}$ .

From (4.15) and (4.25), we obtain that

[TABLE]

Here ${\rm E}|R_{16}(z_{1},z_{2})|\leq Kn^{1/2}$ . Letting

[TABLE]

we get

[TABLE]

where

[TABLE]

Elementary calculations reveal that

[TABLE]

Now we derive the limits of $T_{2}$ , $T_{3}$ , $T_{4}$ and their second mixed partial derivatives. From (4.15), (4.19)-(4.22), it’s easy to show that

[TABLE]

where ${\rm E}|R_{17}(z_{1},z_{2})|\leq Kn$ and ${\rm E}|R_{18}(z_{1},z_{2})|\leq Kn$ . We thus get

[TABLE]

Their corresponding derivatives are

[TABLE]

respectively.

Collecting results in (4.17), (4.27)-(4.30), we finally get the covariance function in the lemma.

Step 2: Tightness of $M_{n}^{(1)}(z)$ . From the arguments in [3], the tightness of $M_{n}^{(1)}(z)$ can be established by verifying the moment condition:

[TABLE]

We first claim that moments of ${\mathbf{D}}^{-1}(z)$ , ${\mathbf{D}}^{-1}_{j}(z)$ and ${\mathbf{D}}^{-1}_{ij}(z)$ are all bounded in $n$ and $z\in\mathcal{C}_{n}$ . Taking ${\mathbf{D}}^{-1}(z)$ for example, it’s clear that ${\rm E}||{\mathbf{D}}^{-1}(z)||^{q}<1/v_{0}^{q}$ for $z\in\mathcal{C}_{u}$ . For $z\in\mathcal{C}_{l}\cup C_{r}$ , applying Lemma 5.5 with suitably large $s$ ,

[TABLE]

where the two constant $\eta_{r}$ and $\eta_{l}$ satisfy $\limsup_{n,p\rightarrow\infty}\lambda_{\max}^{\Sigma}(1+\sqrt{c})^{2}<\eta_{r}<x_{r}$ and $x_{l}<\eta_{l}<\liminf_{n,p\rightarrow\infty}\lambda_{\min}^{\Sigma}I_{(0,1)}(c)(1-\sqrt{c})^{2}$ . Therefore for any positive $q$ , we may assume that

[TABLE]

Using the above argument, we can extend the inequality in Lemma 5.1 to

[TABLE]

where the matrices ${\mathbf{B}}_{l}(v)$ are independent of ${\mathbf{u}}_{1}$ and

[TABLE]

for some positive $s$ , where $\tilde{\mathbf{B}}$ is ${\mathbf{B}}_{n}$ or ${\mathbf{B}}_{n}$ with some ${\mathbf{r}}_{j}$ ’s removed. In applications of (4.33), $a(v)$ can be a product of factors of $\beta_{1}(z)$ or ${\mathbf{r}}_{1}^{\prime}{\mathbf{D}}_{1}^{-1}(z_{1}){\mathbf{D}}_{1}^{-1}(z_{2}){\mathbf{r}}_{1}$ or similar terms. It’s easy to verify that these terms satisfy (4.34), see pages 579 and 580 in [3] for details.

Let

[TABLE]

We first handle moments of $\gamma_{j}(z)$ . By a similar decomposition in (4.10), we may get

[TABLE]

Applying Lemma 5.3 and the Hölder inequality to the above expression we then get, for even $q$ ,

[TABLE]

where the last inequality uses the boundedness of ${\rm E}|\beta_{ij}(z)|^{q}$ and ${\rm E}|{\mathbf{r}}_{i}^{\prime}{\mathbf{D}}_{ij}^{-1}(z)\Sigma{\mathbf{D}}_{ij}^{-1}(z){\mathbf{r}}_{i}|^{q}$ . From (4.33) and (4.35), we get

[TABLE]

for $q$ even.

Next we show that $b_{n}(z)$ is bounded for all $n$ . By the equality $b_{n}(z)-\beta_{j}(z)=b_{n}(z)\beta_{j}(z)\gamma_{j}(z)$ and the boundedness of ${\rm E}|\beta_{j}(z)|^{q}$ and ${\rm E}|\gamma_{j}|^{q}$ , we have

[TABLE]

and thus, for all $n$ large enough,

[TABLE]

Now we prove (4.31). From the martingale decomposition and (4.9), we have

[TABLE]

It is then enough to show ${\rm E}|A_{1}|^{2}$ , ${\rm E}|A_{2}|^{2}$ , and ${\rm E}|A_{3}|^{2}$ are all bounded. The arguments for the boundedness are all similar to those in pages 582 and 583 in [3], and hence we only present the details for ${\rm E}|A_{1}|^{2}$ for illustration.

Replacing $\beta_{j}(z)$ in $R_{1}$ with $\beta_{j}(z)=b_{n}(z)-b_{n}(z)\beta_{j}(z)\gamma_{j}(z),$ we may obtain $A_{1}=A_{11}-A_{12}-A_{13}$ where

[TABLE]

From (4.33), (4.34), and (4.37),

[TABLE]

Using (4.33), (4.34),(4.36), and (4.37),

[TABLE]

Similarly, we may get ${\rm E}|A_{13}|^{2}<K$ . Hence the tightness of $M_{n}^{(1)}(z)$ is obtained.

Step 3: Convergence of $M_{n}^{(2)}(z)$ . To finish the proof, it is enough to show that the sequence of $M_{n}^{(2)}(z)$ is bounded and equicontinuous, and converges to the mean function of the lemma for $z\in\mathcal{C}_{n}$ . The boundedness and equicontinuity can be verified following the arguments on pages 592 and 593 of [3], and thus we only focus on the convergence of $M_{n}^{(2)}(z)$ .

We first list some results that will be used in the sequel:

[TABLE]

where ${\mathbf{M}}$ is any nonrandom $p\times p$ matrix. These results can be verified step by step following similar discussions in [3] and we omit the details.

Writing ${\mathbf{V}}(z)=zI-b_{n}(z)\Sigma$ , we decompose $M_{n}^{(2)}(z)$ as

[TABLE]

Notice that

[TABLE]

We have

[TABLE]

where the second equality uses the convergence in (4.38).

Our next task is to study the limits of $S_{n}(z)$ and $\underline{S}_{n}(z)$ . For simplicity, we suppress the expression $z$ when it is served as independent variables of some functions in the sequel. All expressions and convergence statements hold uniformly for $z\in\mathcal{C}_{n}$ .

We first simplify the expression of $S_{n}$ . Using the identity ${\mathbf{r}}_{j}^{\prime}{\mathbf{D}}^{-1}={\mathbf{r}}_{j}^{\prime}{\mathbf{D}}_{j}^{-1}\beta_{j}$ , we have

[TABLE]

From (4.9) and $\beta_{1}=b_{n}-b_{n}\beta_{1}\gamma_{1}$ ,

[TABLE]

where $|{\rm E}\beta_{1}\gamma_{1}{\mathbf{r}}_{1}^{\prime}{\mathbf{D}}_{1}^{-1}{\mathbf{V}}^{-1}\Sigma{\mathbf{D}}_{1}^{-1}{\mathbf{r}}_{1}|\leq Kn^{-1/2}$ . From this and (4.42), we get

[TABLE]

Plugging $\beta_{1}=b_{n}-b_{n}^{2}\gamma_{1}+b_{n}^{3}\gamma_{1}^{2}-\beta_{1}b_{n}^{3}\gamma_{1}^{3}$ into the first term in the above equation, we obtain

[TABLE]

Note that, from (4.33), (4.36), and (4.39),

[TABLE]

We thus arrive at

[TABLE]

On the other hand, by the identity ${\mathbf{r}}_{j}^{\prime}{\mathbf{D}}^{-1}={\mathbf{r}}_{j}^{\prime}{\mathbf{D}}_{j}^{-1}\beta_{j}$ , we have

[TABLE]

which implies $nz\underline{m}_{n}=-\sum_{j=1}^{n}\beta_{j}$ . From this, together with $\beta_{1}=b_{n}-b_{n}^{2}\gamma_{1}+b_{n}^{3}\gamma_{1}^{2}-\beta_{1}b_{n}^{3}\gamma_{1}^{3}$ , (4.33), we get

[TABLE]

Applying Lemma 2 to the simplified $S_{n}$ and $\underline{S}_{n}$ , and then replacing ${\mathbf{D}}_{j}$ with ${\mathbf{D}}$ in the derived results yield

[TABLE]

To study the limits of $S_{n}$ and $\underline{S}_{n}$ , we compare the difference between ${\mathbf{D}}^{-1}$ and ${\mathbf{V}}^{-1}$ . Similar to (4.19)-(4.22), we have

[TABLE]

where ${\tilde{\mathbf{R}}}_{1}=\sum_{j=1}^{n}{\mathbf{V}}^{-1}({\mathbf{r}}_{j}{\mathbf{r}}_{j}^{\prime}-n^{-1}\Sigma){\mathbf{D}}_{j}^{-1}$ and, for any $p\times p$ matrix ${\mathbf{M}}$ ,

[TABLE]

Moreover, for nonrandom ${\mathbf{M}}$ with bounded norm,

[TABLE]

Similar to (4.23), we write

[TABLE]

where $\tilde{R}_{11}={\text{\rm tr}}\sum_{j=1}^{n}{\mathbf{V}}^{-1}{\mathbf{r}}_{j}{\mathbf{r}}_{j}^{\prime}{\mathbf{D}}_{j}^{-1}\Sigma({\mathbf{D}}^{-1}-{\mathbf{D}}_{j}^{-1}){\mathbf{M}}$ , ${\rm E}\tilde{R}_{12}=0$ , and $|{\rm E}\tilde{R}_{13}|\leq K.$ Using (4.32), (4.33), and (4.39), we get

[TABLE]

From (4.45)-(4.49) we get

[TABLE]

From (4.15), (4.45)-(4.51) we get

[TABLE]

Combining the above results with (4.43) and (4.44), we obtain

[TABLE]

Therefore we get

[TABLE]

as $n\to\infty$ . Using the identity

[TABLE]

we finally obtain the mean function of the lemma.

∎

4.2. Proof of Theorem 2.1

Following Theorem 1.1 in [5], it is sufficient to show that, for any bounded sequence of symmetric matrices $\{{\mathbf{C}}_{p}\}$ ,

[TABLE]

Write ${\mathbf{y}}=\sqrt{p}{\mathbf{A}}{\mathbf{u}}/||{\mathbf{A}}{\mathbf{u}}||=\sqrt{p}{\mathbf{A}}{\mathbf{z}}/||{\mathbf{A}}{\mathbf{z}}||$ where ${\mathbf{z}}\sim N(0,I_{p})$ . Since the eigenvalues of the SSCM ${\mathbf{B}}_{n}$ are invariant under orthogonal transformation, it’s enough to consider the diagonal matrix ${\mathbf{A}}$ . Therefore, by taking ${\mathbf{C}}=\tilde{\mathbf{C}}={\mathbf{C}}_{p}$ in Lemma 4.2, one can verify the condition (4.53).

4.3. Proof of Theorem 2.2

For any distribution function $G$ and function $f$ analytic on a simple connected domain $D$ containing the support of $G$ , it holds that

[TABLE]

where $m_{G}(z)$ denotes the Stieltjes transform of $G$ and $\mathcal{C}\subset D$ is a simple, closed, and positively oriented contour enclosing the support of $G$ . Similar to (4.4), we choose $v_{0}$ , $x_{r}$ , and $x_{l}$ such that $f_{1},\ldots,f_{k}$ are all analytic on and inside the contour $\mathcal{C}$ . We denote by $K$ a common upper bound of these functions on $\mathcal{C}$ . Therefore, almost surely, for all $n$ large, $\{f_{1},\ldots,f_{k}\}$ satisfy the equation in (4.54) with $G=F^{B_{n}}$ and moreover,

[TABLE]

which converges to zero as $n\to\infty$ . Since

[TABLE]

is a continuous mapping of $C(\mathcal{C},{\mathbb{R}}^{2})$ into ${\mathbb{R}}^{k}$ , it follows from Lemma 4.3 that the above random vector converges to a multivariate Gaussian vector $(X_{f_{1}},\ldots,X_{f_{k}})$ whose mean and covariance functions are

[TABLE]

where $f,g\in\{f_{1},\ldots,f_{k}\}$ and $\{\mathcal{C}_{1},\mathcal{C}_{2}\}$ are two non-overlapping analogues of the contour $\mathcal{C}$ .

From the following two identities

[TABLE]

we obtain the form of the limiting covariance function in the theorem.

4.4. Proof of Corollary 2.1

Choose a contour $\mathcal{C}$ for the integrals such that $\max_{t\in S_{H},z\in\mathcal{C}}|t\underline{m}(z)|<1,$ where $S_{H}$ is the support of $H$ . Let $\underline{m}(\mathcal{C})=\{\underline{m}(z):z\in\mathcal{C}\}$ denote the image of $\mathcal{C}$ under $\underline{m}(z)$ . Then $\underline{m}(\mathcal{C})$ is a simple and closed contour having clockwise direction and enclosing zero [33].

By the identity in (2.2), the integral in the mean function of Theorem 2.2 becomes

[TABLE]

From this and the Cauchy integral theorem, we get the mean function. The covariance function can be obtained following the proof of Theorem 1 in [33].

5. Appendix

Lemma 5.1.

For any $p\times p$ complex matrix ${\mathbf{C}}$ and ${\mathbf{y}}=\sqrt{p}{\mathbf{x}}/||{\mathbf{x}}||$ with ${\mathbf{x}}\sim N(0,\Sigma)$ and $||\Sigma||\leq 1$ ,

[TABLE]

where $K_{q}$ is a positive constant depending only on $q$ .

Proof.

This lemma follows from Lemma 2.2 in [3] and similar arguments in the proof of Lemma 5 in [15]. ∎

Lemma 5.2 ([7]).

Let $\{X_{k}\}$ be a complex martingale difference sequence with respect to the increasing $\sigma$ -field $\{\mathcal{F}_{k}\}$ . Then, for $q\geq 2$ ,

[TABLE]

Lemma 5.3 ([7]).

Let $\{X_{k}\}$ be a complex martingale difference sequence with respect to the increasing $\sigma$ -field $\{\mathcal{F}_{k}\}$ . Then, for $q>1$ ,

[TABLE]

Lemma 5.4 (Theorem 35.12 of [6]).

Suppose for each $n$ $Y_{n1},Y_{n2},\ldots Y_{n{\mathbf{r}}_{n}}$ is a real martingale difference sequence with respect to the increasing $\sigma$ -field ${\{\mathcal{F}_{nj}\}}$ having second moments. If for each $\varepsilon>0$ ,

[TABLE]

as $n\rightarrow\infty$ , where $\sigma^{2}$ is a positive constant, then

[TABLE]

Lemma 5.5.

Suppose that Assumptions (a)-(c) hold. Then, for any $s$ positive,

[TABLE]

whenever $\eta_{r}>\lim\sup_{p\rightarrow\infty}||\Sigma||(1+\sqrt{c})^{2}$ . If $0<\lim\inf_{p\rightarrow\infty}\lambda_{\min}^{\Sigma}I_{(0,1]}(c)$ then,

[TABLE]

whenever $0<\eta_{l}<\lim\inf_{p\rightarrow\infty}\lambda_{\min}^{\Sigma}I_{(0,1)}(c)(1-\sqrt{c})^{2}$ .

Proof.

Let ${\mathbf{x}}_{j}={\mathbf{A}}{\mathbf{z}}_{j}$ where ${\mathbf{A}}{\mathbf{A}}^{\prime}={\mathbf{T}}$ and ${\mathbf{z}}_{j}\sim N(0,I_{p})$ , $j=1,\ldots,n.$ Also let ${\mathbf{B}}_{n}^{(0)}=(1/n)\sum_{j=1}^{n}{\mathbf{A}}{\mathbf{z}}_{j}{\mathbf{z}}_{j}^{\prime}{\mathbf{A}}^{\prime}$ . From [3], the conclusions of this lemma hold when $({\mathbf{B}}_{n},\Sigma)$ are replaced with $({\mathbf{B}}_{n}^{(0)},{\mathbf{T}})$ . Choose $\eta_{r}^{(0)}$ and $\eta_{l}^{(0)}$ satisfying

[TABLE]

where $r_{1}={\text{\rm tr}}({\mathbf{T}})/p$ . From Lemma 1, we have

[TABLE]

Using inequalities

[TABLE]

we may get

[TABLE]

where the last equality is from the Chebyshev inequality and the fact $r_{1}>\eta_{r}^{(0)}/\eta_{r}$ . Similarly, $P(\lambda_{\min}^{{\mathbf{B}}_{n}}<\eta_{l})=o(n^{-s}).$

∎

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bai et al. [2010] Bai, Z. D., Chen, J. Q., and Yao, J. F. (2010). On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust. N. Z. J. Stat. 52 , 423–437.
2Bai et al. [2015] Bai, Z. D., Hu, J., Pan, G. M., and Zhou, W. (2015). Convergence of the empirical spectral distribution function of Beta matrices. Bernoulli , 2 1, 1538–1574.
3Bai and Silverstein [2004] Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. , 32 , 553–605.
4Bai and Silverstein [2010] Bai, Z. D. and Silverstein, J. W. (2010). Spectral analysis of large dimensional random matrices , 2nd ed., Springer, New York.
5Bai and Zhou [2008] Bai, Z. D. and Zhou, W. (2008). Large sample covariance matrices without independence structures in columns. Statist. Sinica , 18 , 425–442.
6Billingsley [1995] Billingsley, P. (1995). Probability and Measure , 3rd ed., Wiley, New York.
7Burkholder [1973] Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Probab. , 1 , 19–42.
8Cai et al. [2013] Cai, Tony, Ma, Z. M., and Wu, Y. H. (2013) Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 , 3074-3110.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On spectral properties of high-dimensional spatial-sign covariance matrices in elliptical distributions with applications

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

2. High-dimensional theory for eigenvalues of Bn{\mathbf{B}}_{n}Bn​

2.1. Limiting spectral distribution of Bn{\mathbf{B}}_{n}Bn​

Theorem 2.1**.**

2.2. CLT for linear spectral statistics of Bn{\mathbf{B}}_{n}Bn​

Theorem 2.2**.**

Corollary 2.1**.**

3. Applications to spectral inference

3.1. Estimation of a PSD

Theorem 3.1**.**

3.2. Test for the order of a PSD

Theorem 3.2**.**

Corollary 3.1**.**

3.3. Simulation experiments

4. Proofs

4.1. Some key lemmas

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

Proof.

Lemma 4.3**.**

Proof.

4.2. Proof of Theorem 2.1

4.3. Proof of Theorem 2.2

4.4. Proof of Corollary 2.1

5. Appendix

Lemma 5.1**.**

Proof.

Lemma 5.2** ([7]).**

Lemma 5.3** ([7]).**

Lemma 5.4** (Theorem 35.12 of [6]).**

Lemma 5.5**.**

Proof.

2. High-dimensional theory for eigenvalues of ${\mathbf{B}}_{n}$

2.1. Limiting spectral distribution of ${\mathbf{B}}_{n}$

Theorem 2.1.

2.2. CLT for linear spectral statistics of ${\mathbf{B}}_{n}$

Theorem 2.2.

Corollary 2.1.

Theorem 3.1.

Theorem 3.2.

Corollary 3.1.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 5.1.

Lemma 5.2 ([7]).

Lemma 5.3 ([7]).

Lemma 5.4 (Theorem 35.12 of [6]).

Lemma 5.5.