Asymptotic power of Rao's score test for independence in high dimensions

Dennis Leung; Qi-Man Shao

arXiv:1701.07249·math.ST·December 12, 2017

Asymptotic power of Rao's score test for independence in high dimensions

Dennis Leung, Qi-Man Shao

PDF

Open Access

TL;DR

This paper analyzes the asymptotic power of Rao's score test for independence in high-dimensional normal data, showing it is rate-optimal for detecting dependencies as both sample size and dimension grow.

Contribution

It derives the asymptotic minimax power function of Rao's score test in high dimensions, establishing its rate-optimality for dependency detection.

Findings

01

Rao's score test is rate-optimal for dependency signals of order sqrt(m/n)

02

The test's power function is characterized asymptotically in high dimensions

03

Both dimension and sample size tend to infinity with bounded ratio

Abstract

Let $R$ be the Pearson correlation matrix of $m$ normal random variables. The Rao's score test for the independence hypothesis $H_{0} : R = I_{m}$ , where $I_{m}$ is the identity matrix of dimension $m$ , was first considered by Schott (2005) in the high dimensional setting. In this paper, we study the asymptotic minimax power function of this test, under an asymptotic regime in which both $m$ and the sample size $n$ tend to infinity with the ratio $m / n$ upper bounded by a constant. In particular, our result implies that the Rao's score test is rate-optimal for detecting the dependency signal $∥ R - I_{m} ∥_{F}$ of order $m / n$ , where $∥ \cdot ∥_{F}$ is the matrix Frobenius norm.

Equations493

T = 1 \leq p < q \leq m \sum \overset{ρ}{^}_{pq}^{2},

T = 1 \leq p < q \leq m \sum \overset{ρ}{^}_{pq}^{2},

H_{0} : R = I_{m},

H_{0} : R = I_{m},

H_{1} : R \in Θ (b),

H_{1} : R \in Θ (b),

Θ (b) := {R : ∥ R - I_{m} ∥_{F} \geq b m / n, diag (R) = I_{m}},

Θ (b) := {R : ∥ R - I_{m} ∥_{F} \geq b m / n, diag (R) = I_{m}},

1 \leq p < q \leq m max \overset{ρ}{^}_{pq}^{2} .

1 \leq p < q \leq m max \overset{ρ}{^}_{pq}^{2} .

\overset{ρ}{^}_{pq}^{2} := \frac{S _{pq}^{2}}{S _{pp} S _{q q}} = f (S_{pp}, S_{q q}, S_{pq}),

\overset{ρ}{^}_{pq}^{2} := \frac{S _{pq}^{2}}{S _{pp} S _{q q}} = f (S_{pp}, S_{q q}, S_{pq}),

f (u_{1}, u_{2}, u_{3}) := u_{1}^{- 1} u_{2}^{- 1} u_{3}^{2},

f (u_{1}, u_{2}, u_{3}) := u_{1}^{- 1} u_{2}^{- 1} u_{3}^{2},

S_{pq} := \frac{\sum _{i = 1}^{n} X _{p i} X _{q i}}{n} .

S_{pq} := \frac{\sum _{i = 1}^{n} X _{p i} X _{q i}}{n} .

\overset{ˉ}{S}_{pq} := S_{pq} - ρ_{pq}

\overset{ˉ}{S}_{pq} := S_{pq} - ρ_{pq}

\frac{\sum _{i = 1}^{n} ( X _{p i} - n ^{- 1} \sum _{j = 1}^{n} X _{p j} ) ( X _{q i} - n ^{- 1} \sum _{j = 1}^{n} X _{q j} )}{n - 1}

\frac{\sum _{i = 1}^{n} ( X _{p i} - n ^{- 1} \sum _{j = 1}^{n} X _{p j} ) ( X _{q i} - n ^{- 1} \sum _{j = 1}^{n} X _{q j} )}{n - 1}

ψ = I (T - \frac{m ( m - 1 )}{2 n} > \frac{m}{n} z_{α}),

ψ = I (T - \frac{m ( m - 1 )}{2 n} > \frac{m}{n} z_{α}),

n \to \infty lim Θ (b) in f E_{R} [ψ] = \overset{ˉ}{Φ} (z_{α} - 2^{- 1} b^{2}) .

n \to \infty lim Θ (b) in f E_{R} [ψ] = \overset{ˉ}{Φ} (z_{α} - 2^{- 1} b^{2}) .

n ⟶ \infty lim sup Θ (b) in f E_{R} [ϕ] < β

n ⟶ \infty lim sup Θ (b) in f E_{R} [ϕ] < β

\overset{ρ}{^}_{pq}^{2} - ρ_{pq}^{2} = λ \in N_{\geq 0}^{3} : 1 \leq ∣ λ ∣ \leq 4 \sum \frac{\partial ^{λ} f ( 1 , 1 , ρ _{pq} )}{λ !} \overset{ˉ}{S}_{pp}^{λ_{1}} \overset{ˉ}{S}_{q q}^{λ_{2}} \overset{ˉ}{S}_{pq}^{λ_{3}} + I I I_{pq} a.s.,

\overset{ρ}{^}_{pq}^{2} - ρ_{pq}^{2} = λ \in N_{\geq 0}^{3} : 1 \leq ∣ λ ∣ \leq 4 \sum \frac{\partial ^{λ} f ( 1 , 1 , ρ _{pq} )}{λ !} \overset{ˉ}{S}_{pp}^{λ_{1}} \overset{ˉ}{S}_{q q}^{λ_{2}} \overset{ˉ}{S}_{pq}^{λ_{3}} + I I I_{pq} a.s.,

I I I_{pq} := λ \in N_{\geq 0}^{3} : ∣ λ ∣ = 5 \sum \frac{( ρ _{pq} + k _{pq} S ˉ _{pq} ) ^{2 - λ_{1}} S ˉ _{pp}^{λ_{1}} S ˉ _{q q}^{λ_{2}} S ˉ _{pq}^{λ_{3}}}{( 1 + k _{pq} S ˉ _{pp} ) ^{1 + λ_{2}} ( 1 + k _{pq} S ˉ _{q q} ) ^{1 + λ_{3}}},

I I I_{pq} := λ \in N_{\geq 0}^{3} : ∣ λ ∣ = 5 \sum \frac{( ρ _{pq} + k _{pq} S ˉ _{pq} ) ^{2 - λ_{1}} S ˉ _{pp}^{λ_{1}} S ˉ _{q q}^{λ_{2}} S ˉ _{pq}^{λ_{3}}}{( 1 + k _{pq} S ˉ _{pp} ) ^{1 + λ_{2}} ( 1 + k _{pq} S ˉ _{q q} ) ^{1 + λ_{3}}},

\frac{\partial ^{λ} f ( 1 , 1 , ρ _{pq} )}{λ !} \overset{ˉ}{S}_{pp}^{λ_{1}} \overset{ˉ}{S}_{q q}^{λ_{2}} \overset{ˉ}{S}_{pq}^{λ_{3}} = \overset{ˉ}{S}_{pq}^{2}

\frac{\partial ^{λ} f ( 1 , 1 , ρ _{pq} )}{λ !} \overset{ˉ}{S}_{pp}^{λ_{1}} \overset{ˉ}{S}_{q q}^{λ_{2}} \overset{ˉ}{S}_{pq}^{λ_{3}} = \overset{ˉ}{S}_{pq}^{2}

= \frac{\sum _{i = 1}^{n} ( X _{p i} X _{q i} - ρ _{pq} ) ^{2}}{n ^{2}} + \frac{2 \sum _{1 \leq i < j \leq n} ( X _{p i} X _{q i} - ρ _{pq} ) ( X _{p j} X _{q j} - ρ _{pq} )}{n ^{2}},

\overset{ρ}{^}_{pq}^{2} - ρ_{pq}^{2} = I_{pq} + I I_{pq} + I I I_{pq},

\overset{ρ}{^}_{pq}^{2} - ρ_{pq}^{2} = I_{pq} + I I_{pq} + I I I_{pq},

I_{pq} := \frac{2 \sum _{1 \leq i < j \leq n} ( X _{p i} X _{q i} - ρ _{pq} ) ( X _{p j} X _{q j} - ρ _{pq} )}{n ^{2}}, and

I_{pq} := \frac{2 \sum _{1 \leq i < j \leq n} ( X _{p i} X _{q i} - ρ _{pq} ) ( X _{p j} X _{q j} - ρ _{pq} )}{n ^{2}}, and

I I_{pq} := \frac{\sum _{i = 1}^{n} ( X _{p i} X _{q i} - ρ _{pq} ) ^{2}}{n ^{2}} + λ \in N_{\geq 0}^{3} : 1 \leq ∣ λ ∣ \leq 4 λ \neq = (0, 0, 2) \sum \frac{\partial ^{λ} f ( 1 , 1 , ρ _{pq} )}{λ !} \overset{ˉ}{S}_{pp}^{λ_{1}} \overset{ˉ}{S}_{q q}^{λ_{2}} \overset{ˉ}{S}_{pq}^{λ_{3}} .

I I_{pq} := \frac{\sum _{i = 1}^{n} ( X _{p i} X _{q i} - ρ _{pq} ) ^{2}}{n ^{2}} + λ \in N_{\geq 0}^{3} : 1 \leq ∣ λ ∣ \leq 4 λ \neq = (0, 0, 2) \sum \frac{\partial ^{λ} f ( 1 , 1 , ρ _{pq} )}{λ !} \overset{ˉ}{S}_{pp}^{λ_{1}} \overset{ˉ}{S}_{q q}^{λ_{2}} \overset{ˉ}{S}_{pq}^{λ_{3}} .

T - \frac{m ( m - 1 )}{2 n} - 2^{- 1} ∥ R - I_{m} ∥_{F}^{2} = I + (I I - \frac{m ( m - 1 )}{2 n}) + I I I,

T - \frac{m ( m - 1 )}{2 n} - 2^{- 1} ∥ R - I_{m} ∥_{F}^{2} = I + (I I - \frac{m ( m - 1 )}{2 n}) + I I I,

Var [I] = E [I^{2}] = \frac{m ^{2}}{n ^{2}} + o (\frac{m ^{2 (1 - γ)}}{n ^{2}}) k = 0 \sum 2 ∥ R - I_{m} ∥_{F}^{2 k}

Var [I] = E [I^{2}] = \frac{m ^{2}}{n ^{2}} + o (\frac{m ^{2 (1 - γ)}}{n ^{2}}) k = 0 \sum 2 ∥ R - I_{m} ∥_{F}^{2 k}

t \in R sup P (\frac{I}{Var(I)} \leq t) - Φ (t) ≲ {\frac{o ( m ^{4} / n ^{4} ) \sum _{k = 0}^{8} ∥ R - I _{m} ∥ _{F}^{k}}{Var ( I ) ^{2}}}^{1/5} .

t \in R sup P (\frac{I}{Var(I)} \leq t) - Φ (t) ≲ {\frac{o ( m ^{4} / n ^{4} ) \sum _{k = 0}^{8} ∥ R - I _{m} ∥ _{F}^{k}}{Var ( I ) ^{2}}}^{1/5} .

E [(I I - \frac{m ( m - 1 )}{2 n})^{2}] ≲ \frac{∥ R - I _{m} ∥ _{F}^{2} + ∥ R - I _{m} ∥ _{F}^{4}}{n} + o (\frac{m ^{2 (1 - γ)}}{n ^{2}}) k = 0 \sum 4 ∥ R - I_{m} ∥_{F}^{k},

E [(I I - \frac{m ( m - 1 )}{2 n})^{2}] ≲ \frac{∥ R - I _{m} ∥ _{F}^{2} + ∥ R - I _{m} ∥ _{F}^{4}}{n} + o (\frac{m ^{2 (1 - γ)}}{n ^{2}}) k = 0 \sum 4 ∥ R - I_{m} ∥_{F}^{k},

P (∣ I I I ∣ > C \frac{m ^{2}}{n ^{5 c}}) ≲ (n^{c - 1} lo g m + n^{c - 1/2} lo g m)

P (∣ I I I ∣ > C \frac{m ^{2}}{n ^{5 c}}) ≲ (n^{c - 1} lo g m + n^{c - 1/2} lo g m)

E [ψ] = P (I + I I + I I I - \frac{m ( m - 1 )}{2 n} > \frac{m}{n} z_{α} - 2^{- 1} ∥ R - I_{m} ∥_{F}^{2}) .

E [ψ] = P (I + I I + I I I - \frac{m ( m - 1 )}{2 n} > \frac{m}{n} z_{α} - 2^{- 1} ∥ R - I_{m} ∥_{F}^{2}) .

Θ (b, B) = {R : B m / n > ∥ R - I_{m} ∥_{F} \geq b m / n}

Θ (b, B) = {R : B m / n > ∥ R - I_{m} ∥_{F} \geq b m / n}

Θ (B) = {R : ∥ R - I_{m} ∥_{F} \geq B m / n},

Θ (B) = {R : ∥ R - I_{m} ∥_{F} \geq B m / n},

n \to \infty lim inf Θ (B) in f E_{R} [ψ] \geq \overset{ˉ}{Φ} (z_{α} - \frac{b ^{2}}{2})

n \to \infty lim inf Θ (B) in f E_{R} [ψ] \geq \overset{ˉ}{Φ} (z_{α} - \frac{b ^{2}}{2})

Θ (b, B) sup E_{R} ψ - \overset{ˉ}{Φ} (z_{α} - \frac{∥ R - I _{m} ∥ _{F}^{2}}{2 m / n}) ⟶ 0

Θ (b, B) sup E_{R} ψ - \overset{ˉ}{Φ} (z_{α} - \frac{∥ R - I _{m} ∥ _{F}^{2}}{2 m / n}) ⟶ 0

n \to \infty lim Θ (b, B) in f E_{R} ψ = n \to \infty lim Θ (b, B) in f \overset{ˉ}{Φ} (z_{α} - \frac{∥ R - I _{m} ∥ _{F}^{2}}{2 m / n}) = \overset{ˉ}{Φ} (z_{α} - \frac{b ^{2}}{2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Statistical Methods and Inference · Statistical Methods and Bayesian Inference

Full text

Asymptotic power of Rao’s score test for independence in high dimensions

Dennis Leung

Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong

[email protected]

and

Qi-Man Shao

Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong

[email protected]

Abstract.

Let $\operatorname{{\bf R}}$ be the Pearson correlation matrix of $m$ normal random variables. The Rao’s score test for the independence hypothesis $H_{0}:\operatorname{{\bf R}}=\operatorname{{\bf I}_{m}}$ , where $\operatorname{{\bf I}_{m}}$ is the identity matrix of dimension $m$ , was first considered by Schott (2005) in the high dimensional setting. In this paper, we study the asymptotic exact power function of this test, under an asymptotic regime in which both $m$ and the sample size $n$ tend to infinity with the ratio $m/n$ upper bounded by a constant. In particular, our result implies that the Rao’s score test is minimax rate-optimal for detecting the dependency signal $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ of order $\sqrt{m/n}$ , where $\|\cdot\|_{F}$ is the matrix Frobenius norm.

2000 Mathematics Subject Classification:

62H05

1. Introduction

Let $(X_{1},\dots,X_{m})^{\prime}$ be an $m$ -variate normal vector with population Pearson correlation matrix denoted by ${\bf R}=(\rho_{pq})_{1\leq p,q\leq m}$ . Suppose we observe $n$ independent samples $X_{p1},\dots,X_{pn}$ for each component $X_{p}$ , $1\leq p\leq m$ . When the dimension $m$ can be larger than the sample size $n$ , Schott (2005) was the first to consider the Rao’s score statistic

[TABLE]

for testing the independence null hypothesis

[TABLE]

where $\hat{\rho}_{pq}$ , $1\leq p\not=q\leq m$ is the sample correlation of the pair $(X_{p},X_{q})$ computed from the data, and ${\bf I}_{m}$ is the $m$ -by- $m$ identity matrix. It was shown to be asymptotically normal under $H_{0}$ as both $m$ and $n$ go to infinity with the ratio $m/n$ converging to a positive constant. The purpose of this paper is to complement the theoretical study of $T$ by investigating its power under alternatives of the form

[TABLE]

where for any constant $b>0$ and matrix Frobenius norm $\|\cdot\|_{F}$ , we define the set of Pearson correlation matrices

[TABLE]

which comprises a composite alternative hypothesis delineated by a signal size $\|{\bf R}-{\bf I}_{m}\|_{F}$ of order no less than $\sqrt{m/n}$ .

There are three major approaches to testing independence with growing dimension $m$ in the literature, to the best of our knowledge. The first is the statistic $T$ considered in this paper. Being a “sum” of squared pairwise sample correlation as in (1.1), it is good at detecting diffuse dependency among many pairs of variables. Such dependency is most naturally described by the signal $\|{\bf R}-{\bf I}_{m}\|_{F}$ . In fact, the main result in this paper will show that $T$ is minimax rate optimal for detecting such signal. The second approach considers the “max” statistic,

[TABLE]

Following many previous works (Jiang, 2004, Liu et al., 2008, Li et al., 2010, Zhou, 2007, Li et al., 2012), Cai and Jiang (2011) showed that it admits an asymptotic Gumbel distribution under $H_{0}$ in the ultra high dimensional regime when $m$ can be as large as $e^{n^{c}}$ for some constant $0<c<1$ , as $m,n\longrightarrow\infty$ . Naturally, it is good at detecting a structured alternative whose population correlation matrix $\operatorname{{\bf R}}$ has sparse non-zero off-diagonal entries with considerable magnitudes. Both the “sum” and “max” approaches base their test on forming intuitive statistics that measure the overall dependency among the $m$ variables, with their respective non-parametric extensions; see Leung and Drton (2015) and Han and Liu (2014). The third is likelihood ratio test (LRT), which is well-known to give implementable test only if the dimension $m$ is smaller than $n$ . Despite this limitation, Jiang and Qi (2015) showed the LRT statistic to be asymptotically normal when $m,n\longrightarrow\infty$ , as long as $m+4$ is less than $n$ .

We remark that the derivation of (1.1) as the Rao’s score statistic involves taking derivatives of the log-normal likelihood with respect to the mean vector and the precision matrix. The interested reader is referred to Appendix A in Leung and Drton (2015) for those calculations.

2. Notations and main results

For any positive integer $k$ , $[k]$ is defined as the set $\{1,\dots,k\}$ . $\mathcal{S}_{k}$ is the symmetric group of order $k$ . Depending on the context, its elements will sometimes be treated as permutation functions on $k$ elements, or simply permutations of the set $[k]$ . $C$ always denotes a positive constant that is universal, i.e, its value may change from place to place but does not depend on $m$ and $n$ . “ $a\lesssim b$ ” means that $a\leq Cb$ for some constant $C>0$ . $\mathbb{E}[\cdot]$ , $\text{Var}[\cdot]$ and $P[\cdot]$ are expectation, variance and probability operators respectively.

In this paper we shall always assume that, for all $1\leq p\leq m$ , $\text{Var}[X_{p}]=1$ and $\mathbb{E}[X_{p}]=0$ . Thus, for a duple $(p,q)\in[m]\times[m]$ , $\mathbb{E}[X_{p}X_{q}]=\rho_{pq}$ , and its corresponding squared sample correlation is defined as

[TABLE]

where $f:\mathbb{R}_{>0}^{2}\times\mathbb{R}\longrightarrow\mathbb{R}$ is the function

[TABLE]

and

[TABLE]

We will also use

[TABLE]

to denote the centered sample covariance. Imposing the assumption $\text{Var}[X_{p}]=1$ is always permitted, even if we use the more general form of Pearson correlations with all sample covariances $S_{pq}$ defined alternatively as

[TABLE]

in (2.1), since the distribution of $\hat{\rho}_{pq}$ is invariant to the scaling of variables. Under normality, the restrictions $\mathbb{E}[X_{p}]=0$ and (2.3) can be still be assumed without forgoing any generality of our results to follow; see the classical result in Anderson (2003, Theorem 3.3.2).

According to Chen and Shao (2012, Theorem 2.2) who refined the asymptotic result of Schott (2005) under $H_{0}$ , for a given $\alpha\in(0,1)$ , a test of asymptotic level $\alpha$ based on (1.1) is given as

[TABLE]

where $I(\cdot)$ is the indicator function , $z_{\alpha}:=\bar{\Phi}^{-1}(\alpha)$ , and $\Phi$ and $\bar{\Phi}(x):=1-\Phi(x)$ are respectively the cumulative distribution function and tail probability of a standard normal variate. Below, $\mathbb{E}_{\operatorname{{\bf R}}}[\cdot]$ simply emphasizes that the expectation is taken with respect to a particular correlation matrix $\operatorname{{\bf R}}\in\Theta(b)$ .

Theorem 2.1 (Main result: asymptotic power).

Suppose $m,n\longrightarrow\infty$ such that $\frac{m}{n}\leq\kappa$ for some constant $\kappa<\infty$ . For any significance level $\alpha\in(0,1)$ , the asymptotic power of $\psi$ is given as

[TABLE]

This theorem resembles Cai and Ma (2013, Theorem $4$ ), in which the different problem of testing $H_{0}:{\boldsymbol{\Sigma}}=\operatorname{{\bf I}_{m}}$ , where ${\boldsymbol{\Sigma}}$ is the covariance matrix of $(X_{1},\dots,X_{m})^{\prime}$ , is studied. Despite this, Theorem $1$ and Remark $1$ in their paper indicate that a matching lower bound on the detectable signal size as measured by $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ can be established for our problem (1.2), which we restate next for our readers’ convenience. We add that Theorem 2.1 is slightly weaker than the parallel result of Cai and Ma (2013) in that an upper bound on the ratio $m/n$ is imposed, which we believe to be merely a proof artifact not necessary for the theorem to hold. Discussion on this will be deferred later.

Theorem 2.2 (Matching lower bound, Cai and Ma (2013)).

Let $0<\alpha<\beta<1$ . Suppose $m,n\longrightarrow\infty$ such that $\frac{m}{n}\leq\kappa$ for some constant $\kappa<\infty$ . Then there exists a constant $b=b(\kappa,\beta-\alpha)<1$ , such that

[TABLE]

for any test $\phi$ with significance level $\alpha$ for testing $H_{0}$ .

The lower bound result says that no $\alpha$ -level test for $H_{0}$ can achieve a preset target power if the signal size $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ falls below a certain threshold modulo the separation rate $\sqrt{m/n}$ . Our main result in Theorem 2.1 hence suggests that our test $\psi$ is “rate” optimal when the ratio $m/n$ is bounded, since the asymptotic power $\lim_{n\longrightarrow\infty}\inf_{\Theta(b)}\mathbb{E}_{\operatorname{{\bf R}}}[\psi]$ tends to one as $b\longrightarrow\infty$ .

Although the result in Theorem 2.1 is neat, its proof, which occupies the rest of this paper, is quite involved. As it will become clear later, this is because our statistic $T$ is constructed with Pearson correlations whose higher order moment properties involve a lot of computations to be understood; see Hotelling (1953, Section 7) for classical work on this. At some point in this paper we will use mathematica to help us with certain symbolic calculations. We shall begin with a Taylor expansion of the expression for $\hat{\rho}^{2}_{pq}$ in terms of the function $f$ in (2.1). We need the multi-index notations: For a vector ${\boldsymbol{\lambda}}=(\lambda_{1},\dots,\lambda_{k})$ of $k$ non-negative integers, ${\boldsymbol{\lambda}}!=\lambda_{1}!\dots\lambda_{k}!$ and $|{\boldsymbol{\lambda}}|=\lambda_{1}+\dots+\lambda_{k}$ , and if $g=g(u_{1},\dots,u_{k})$ is a function in $k$ arguments, $\partial^{\boldsymbol{\lambda}}g(\tilde{u}_{1},\dots,\tilde{u}_{k})=\frac{\partial^{|{\boldsymbol{\lambda}}|}g}{\partial u_{1}^{\lambda_{1}}\dots\partial u_{k}^{\lambda_{k}}}\big{|}_{u_{i}=\tilde{u}_{i}}$ is its partial derivative with respect to $\boldsymbol{\lambda}$ evaluated at the point $(\tilde{u}_{1},\dots\tilde{u}_{k})$ . Since $\rho_{pq}^{2}=f(1,1,\rho_{pq})=f(\rho_{pp},\rho_{qq},\rho_{pq})$ , by Taylor’s theorem, for each pair $1\leq p\not=q\leq m$ ,

[TABLE]

where

[TABLE]

for some $k_{pq}=k_{pq}(S_{pp},S_{qq},S_{pq})\in(0,1)$ , is the remainder in Lagrange’s form. The “almost surely” qualifier is in (2.6) because on an event of measure zero, either $S_{pp}$ or $S_{qq}$ may be zero, in which case the Taylor’s theorem doesn’t apply since $f$ is defined on $\mathbb{R}_{>0}^{2}\times\mathbb{R}$ . Our proof depends crucially on recognizing that, when ${\boldsymbol{\lambda}}=(\lambda_{1},\lambda_{2},\lambda_{3})=(0,0,2)$ ,

[TABLE]

in light of Lemma B.1 which specifies the partial derivatives of $f$ . One can then equivalently write (2.6) as

[TABLE]

where

[TABLE]

Defining $I:=\sum_{1\leq p<q\leq m}I_{pq}$ , $II:=\sum_{1\leq p<q\leq m}II_{pq}$ and $III:=\sum_{1\leq p<q\leq m}III_{pq}$ by summing over all $1\leq p<q\leq m$ , from (2.8) one can write

[TABLE]

realizing that $2^{-1}\|{\bf R}-{\bf I}_{m}\|_{F}^{2}=\sum_{1\leq p<q\leq m}\rho_{pq}^{2}$ . We are now in the position to introduce three supporting lemmas that are the building blocks of Theorem 2.1. The first lemma gives a Berry-Esseen bound for the cumulative distribution function of the term $I$ with $\Phi(\cdot)$ after standardization. This will ultimately drive the form of our power function in Theorem 2.1. The next two lemmas control the variability of the extra terms, $(II-\frac{m(m-1)}{2n})$ and $III$ . From now on for the rest of this paper all the big $O$ , little $o$ notations are with respect to our considered asymptotic regime $m,n\longrightarrow\infty$ , $m/n\leq\kappa$ .

Lemma 2.3 (Berry Esseen theorem for $I$ ).

The following are true for $I$ :

(i)

Variance:

[TABLE]

for any $0<\gamma<1/2$ . 2. (ii)

Berry-Esseen bound:

[TABLE]

Lemma 2.4 (Bound on the 2nd moment of $II-\frac{m(m-1)}{2n}$ ).

[TABLE]

for any fixed $0<\gamma<1/2$ .

Lemma 2.5 (Probability bound for $III$ ).

For any $0<c<\frac{1}{2}$ , there exists $C>0$ such that

[TABLE]

for large enough $m,n$ .

The proofs of Lemmas 2.3 and 2.4 are separately given in the next two sections. Lemma 2.5 is proved by a standard maximal inequality in Appendix A. With these tools we can now establish Theorem 2.1 based on the general approach laid out in Cai and Ma (2013).

Proof of Theorem 2.1.

From (2.5) and (2.11) the power of our test can be written as

[TABLE]

By dividing the set $\Theta(b)$ into two subsets

[TABLE]

and

[TABLE]

where $B$ is a sufficiently large constant depending on $(\alpha,b,\kappa)$ , it suffices to show

[TABLE]

and

[TABLE]

as $m,n\longrightarrow\infty$ , $m/n\leq\kappa$ . Together, they lead to the theorem since (2.15) implies that

[TABLE]

To prove (2.14) we first suppose that $B$ is larger than $\sqrt{3z_{\alpha}}$ , and let $\delta$ be any positive constant satisfying $0<\delta\leq 4^{-1}z_{\alpha}$ . By definition, for any $\operatorname{{\bf R}}\in\Theta(B)$ , it must be the case that $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}=\tau\sqrt{m/n}$ for some $\tau\geq B$ . Together with the fact that $mn^{-1}z_{\alpha}-2^{-1}\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}^{2}\leq-\frac{m\tau^{2}}{n6}$ and $\delta\leq 12^{-1}\tau^{2}$ which are consequences of the choice of $B$ , by a union bound and Chebyshev’s inequality we continue from (2.13) and obtain

[TABLE]

Substituting $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ for $\tau\sqrt{m/n}$ into the bounds for $\mathbb{E}[I^{2}]$ and $\mathbb{E}[(II-\frac{m(m-1)}{2n})^{2}]$ in Lemmas 2.3 and 2.4, it is seen that the first term in (2.16) is bounded by a term of order

[TABLE]

Moreover, the second term in (2.16) converges to [math] as $m,n\longrightarrow\infty$ by Lemma 2.5 since $\delta m/n$ is larger than $m^{2}/n^{5c}$ asymptotically for any constant $2/5<c<1/2$ , given that $m/n\leq\kappa$ . They together imply that the constant $B=B(\alpha,b,\kappa)$ can be taken large enough so that

[TABLE]

which is equivalent to (2.14).

To show (2.15), the uniform convergence of power on the “stripe” of alternatives with the signal $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ bounded from above and below in size, we shall first establish that

[TABLE]

uniformly over the set $\Theta(b,B)$ , where

[TABLE]

and $\gamma$ is any number such that $0<\gamma<1/2$ . By a union bound we have

[TABLE]

for any $(2+\gamma)/5<c<1/2$ and large enough $m,n$ . The last inequality comes from the Chebyshev inequality and the fact that, by taking $(2+\gamma)/5<c<1/2$ in Lemma 2.5, for large enough $m,n$ , under $m/n\leq\kappa$ , we have

[TABLE]

where the constant $C$ is same as the one in Lemma 2.5. Since $\operatorname{{\bf R}}\in\Theta(b,B)$ , it must be that $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}=\tau\sqrt{m/n}$ for some $b\leq\tau\leq B$ , and substituting this into the variance bound in Lemma 2.4 it can be easily seen that

[TABLE]

uniformly over $\Theta(b,B)$ as $m,n\longrightarrow\infty$ , $m/n\leq\kappa$ . This gives (2.17) since $c<1/2$ in (2.18).

To finish the proof of (2.15), by union bound arguments one has

[TABLE]

and

[TABLE]

which collectively imply

[TABLE]

since $|\bar{\Phi}(x\pm\epsilon)-\bar{\Phi}(x)|\leq\epsilon$ for any $x\in\mathbb{R}$ and $\epsilon\geq 0$ . Moreover, all three terms on the right hand side of (2.20) are of order $o(1)$ uniformly over $\Theta(b,B)$ . The first two terms are so by Lemma 2.3 $(ii)$ and (2.17), and the last term is so since by Lemma 2.3 $(i)$ , $\sqrt{\text{Var}(I)}=m/n+o(m^{1-\gamma}/n)$ where the $o(m^{1-\gamma}/n)$ term is also uniform over $\Theta(b,B)$ . Finally, by Lemma 2.3 $(i)$ as $m,n\longrightarrow\infty$ , $m/n\leq\kappa$ , we also have

[TABLE]

and it is not hard to see that this implies

[TABLE]

Applying these facts to (2.20) leads to (2.15). ∎

In establishing the normal tail form of our power function, perhaps the most important step is singling out $I$ as the main term that drives the asymptotic normality of the left hand side in (2.11) under the “stripe” of alternative $\Theta(b,B)$ via the Berry-Esseen bound in Lemma 2.3 $(ii)$ . We note that $I$ is already a rather simple term to handle, but proving Lemma 2.3 $(ii)$ for it still takes considerable effort in the next section. Moreover, $m/n\leq\kappa$ has been used at different places, the convergences in (2.19) and (2.21) for instances. However, the assumption is mostly a convenient one for such statements regarding terms $I$ and $II$ , since the estimates presented in Lemmas 2.3 and 2.4 are not the sharpest possible, for either aesthetic purpose or saving us some effort on refining them in the next two sections.

It is the remainder term $III$ that truly prevents us from removing the upper bound on $m/n$ . In order to show it tends to zero in probability, as in (2.18), we applied the crude tail bound in Lemma 2.5 based on a maximal inequality (see Appendix A). Such an estimate doesn’t take the correlations among the constituent summands $III_{pq}$ into account, as was done for the $II_{pq}$ ’s with respective to $II-(m-1)m(2n)^{-1}$ via explicitly estimating its second moment in Lemma 2.4. The major obstacle to computing $\mathbb{E}[III^{2}]$ is the random coefficients

[TABLE]

attached to the products $\bar{S}_{pp}^{\lambda_{1}}\bar{S}_{qq}^{\lambda_{2}}\bar{S}_{pq}^{\lambda_{3}}$ in definition (2.7). Unlike $II$ , where the constituents $II_{pq}$ have constant coefficients, not only is the coefficient in (2.22) a rational functions in $\bar{S}_{pp}$ , $\bar{S}_{pq}$ , $\bar{S}_{qq}$ , but it also involves the intractable random quantity $k_{pq}=k_{pq}(\bar{S}_{pp},\bar{S}_{pq},\bar{S}_{qq})\in(0,1)$ . As such, there is no straightforward way of applying Isserlis’s theorem (Theorem B.2) to compute the moment $\mathbb{E}[III^{2}]$ like we did for $\mathbb{E}[(II-(m-1)m(2n)^{-1})^{2}]$ in Section 4. In fact, even with the help of mathematica, it still took us substantial effort to get our bound in Lemma 2.4 as seen later. At this moment, we cannot think of other ways to control term $III$ .

3. The Berry Esseen bound for $I$

We will prove Lemma 2.3 in this section. For our presentation, given a finite set $D$ and $|D|$ duples $(p_{d},q_{d})\in[m]\times[m]$ indexed by a subscript $d$ that ranges over $D$ , we define the central moment quantities

[TABLE]

Recall that $I$ is defined as $\sum_{p<q}I_{pq}$ , where each $I_{pq}$ is given in (2.9). We first observe that $I$ has a natural martingale structure: For each $i=1,\dots,n$ , let $\mathcal{F}_{i}$ be the sigma-algebra generated by $\{X_{pj}:1\leq p\leq m;1\leq j\leq i\}$ and $\mathcal{F}_{0}$ be the trivial sigma algebra, and define

[TABLE]

as well as

[TABLE]

Then $I=\sum_{i=0}^{n}Y_{i}$ , and $(Y_{i})_{i=0}^{n}$ is a the sequence of martingale differences since

[TABLE]

for $i\geq 2$ , where $\mathbb{E}[Y_{i}|\mathcal{F}_{i-1}]=0$ is trivial for $i=0,1$ .

With the observations just made it is easy to see that $\mathbb{E}[I]=0$ and

[TABLE]

By the i.i.d.’ness of the samples, for each $i=2,\dots,n$ ,

[TABLE]

where, to clarify, $\sum_{\begin{subarray}{c}1\leq p_{d}<q_{d}\leq m\\ d=1,2\end{subarray}}$ means a summation over all pairs of duples $\{(p_{1},q_{1}),(p_{2},q_{2})\}$ such that $1\leq p_{d}<q_{d}\leq m$ for each $d=1,2$ . We have the equality in (3.4) because $\mathbb{E}[(X_{p_{1}j^{\prime}}X_{q_{1}j^{\prime}}-\rho_{p_{1}q_{1}})(X_{p_{2}j}X_{q_{2}j}-\rho_{p_{2}q_{2}})]$ equals $\mathcal{M}_{\begin{subarray}{c}(p_{d},q_{d})\\ d\in\{1,2\}\end{subarray}}$ when $j=j^{\prime}$ and zero otherwise. For $k=2,3,4$ , let

[TABLE]

correspond to a sum over all duples $1\leq p_{d}<q_{d}\leq m$ , $d=1,2$ such that as a set $\cup_{d=1}^{2}\{p_{d},q_{d}\}$ has cardinality $k$ . From (3.3) and (3.4) we can write

[TABLE]

since $\sum_{i=2}^{n}(i-1)=2^{-1}(n^{2}-n)$ . In Appendix C, we will show the following estimates hold:

[TABLE]

Substituting these into (3.6) results in Lemma 2.3 $(i)$ . In fact, this general strategy of decomposing a sum according to the cardinality of an index set as in (3.5) and forming separate estimates will be employed repeatedly in the sequel.

We shall now prove the normal approximation in Lemma 2.3 $(ii)$ . With a Berry-Esseen theorem for martingale central limit theorem in Heyde and Brown (1970), it suffices to verify the fourth moment conditions

[TABLE]

and

[TABLE]

Note that the equality before (3.11) holds because $\mathbb{E}[\sum_{i=2}^{n}\mathbb{E}[Y_{i}^{2}|\mathcal{F}_{i-1}]]=\mathbb{E}[\sum_{i=2}^{n}Y_{i}^{2}]=\text{Var}(I)$ .

We will first show (3.10). For any $2\leq i\leq n$ , on raising $Y_{i}$ to the $4$ th power and taking expectation, by the i.i.d.’ness of samples, we have

[TABLE]

where the summations $\sum_{\begin{subarray}{c}1\leq p_{d}<q_{d}\leq m\\ d=1,2,3,4\end{subarray}}$ and $\sum_{\begin{subarray}{c}1\leq j_{d}<i\\ d=1,2,3,4\end{subarray}}$ are defined similarly as the one in (3.4). The last equality in (3.12) is explained as follows: For a fixed $i$ and a given set of variables index pairs $\{(p_{d},q_{d}):d=1,\dots,4\}$ , with any choice of the sample indices $j_{1},\dots,j_{4}$ in order for the expectation

[TABLE]

to be non-zero, by independence it must be true that there exists a permutation function $\pi\in\mathcal{S}_{4}$ so that

[TABLE]

Since the condition in (3.14) implies that $|\cup_{d=1}^{4}\{j_{d}\}|\leq 2$ , at most $O({i-1\choose 2})=O(i^{2})$ many expectations in (3.13) can be non-zero. This leads to (3.12) since the expectations in (3.13), when they are non-zero, can be uniformly bounded regardless of the choice for $\{(p_{d},q_{d},j_{d});d=1,\dots,4\}$ , owing to our assumptions at the beginning of Section 2 and Theorem B.2 on higher order normal moments. Provided that $\sum_{i=2}^{n}i^{2}=6^{-1}(2n^{3}+3n^{2}+n-6)$ , with (3.12) we further write

[TABLE]

Now the last term in (3.15) can be decomposed, according to the cardinality of the set of duples $\cup_{d=1}^{4}\{p_{d},q_{d}\}$ , as

[TABLE]

where for $k=2,\dots,8$ ,

[TABLE]

and the $O(m^{4})$ term comes from the fact that there are only $O(m^{4})$ many uniformly bounded extra summands under the restriction $|\cup_{k=1}^{4}\{p_{d},q_{d}\}|\leq 4$ . In Appendix C we will show that

[TABLE]

for each $k=5,\dots,8$ . Collecting (3.15), (3.16) and (3.17) we get (3.10).

To show (3.11) it suffices to understand the term $\mathbb{E}[(\sum_{i=1}^{n}\mathbb{E}[Y_{i}^{2}|\mathcal{F}_{i-1}])^{2}]$ since the form of $\text{Var}(I)$ has been proven in Lemma 2.3 $(i)$ . On expansion,

[TABLE]

Proceeding with our calculations,

[TABLE]

where

[TABLE]

By independence, we note that the expression

[TABLE]

on the right hand side of (3.19) can be non-zero only if the four sample indices $i_{1},\dots,i_{4}$ are such that either

[TABLE]

or

[TABLE]

For any fixed given pair $2\leq i,j\leq n$ , by simple counting, there are, respectively, $i\wedge j-1$ , $(i\wedge j-1)(i\vee j-2)$ , $(i\wedge j-1)(i\wedge j-2)$ , $(i\wedge j-1)(i\wedge j-2)$ combinations of $(i_{1},i_{2},i_{3},i_{4})$ that satisfy (3.21), (3.22), (3.23), (3.24) for which $1\leq i_{1},i_{2}<i$ and $1\leq i_{3},i_{4}<j$ , where $a\vee b=\max(a,b)$ and $a\wedge b=\min(a,b)$ . Hence,

[TABLE]

where

[TABLE]

are the value of $\mathbb{E}[\prod_{d=1}^{4}(X_{p_{d}i_{d}}X_{q_{d}i_{d}}-\rho_{p_{d}q_{d}})]$ when $i_{1},\dots,i_{4}$ satisfy the criteria (3.23) and (3.24) respectively. Substituting (3.25) into (3.19) gives

[TABLE]

where the terms $\mathcal{M}_{\begin{subarray}{c}(p_{d},q_{d})\\ d\in[4]\end{subarray}}$ in(3.25) are absorbed into the first $O(n^{-5})$ term because they are uniformly bounded regardless of the choice of $p_{1},q_{1},\dots,p_{4},q_{4}$ , again by our assumptions and Theorem B.2. From this it remains to show the estimates

[TABLE]

and

[TABLE]

which, together with Lemma 2.3 $(i)$ and (3.26), imply (3.11). The proofs of these estimates will, again, be deferred to Appendix C.

4. The second moment bound for $II-\frac{m(m-1)}{2n}$

We will now prove Lemma 2.4. Recall that $II:=\sum_{p<q}II_{pq}$ , and from the definition of $II_{pq}$ in (2.10) we can equivalently write it as

[TABLE]

where

[TABLE]

and

[TABLE]

We form this grouping of terms for reasons that will be explained later. As such, by defining $II_{1}:=\sum_{p<q}II_{pq,1}$ and $II_{2}:=\sum_{p<q}II_{pq,2}$ , one can write

[TABLE]

To finish the proof of Lemma 2.4, it suffices to bound the second moments of $II_{1}-\frac{m(m-1)}{2n}$ and $II_{2}$ respectively in terms of $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ .

Lemma 4.1 (Bound on the second moment of $II_{1}-\frac{m(m-1)}{2n}$ ).

[TABLE]

for any $0<\gamma<1/2$ .

Lemma 4.2 (Bound on the second moment of $II_{2}$ ).

[TABLE]

for any $0<\gamma<1/2$ .

Using Lemmas 4.1 and 4.2, Lemma 2.4 immediately follows from

(i) $II^{2}=(II_{1}-\frac{m(m-1)}{2})^{2}+II_{2}^{2}+2(II_{1}-\frac{m(m-1)}{2})II_{2}$ and (ii) $2|(II_{1}-\frac{m(m-1)}{2})II_{2}|\leq(II_{1}-\frac{m(m-1)}{2})^{2}+II_{2}^{2}$ .

For each pair $p<q$ , the main difference between $II_{pq,1}$ and $II_{pq,2}$ is that when $\lambda_{3}\not=2$ , all the coefficients $\frac{\partial^{\boldsymbol{\lambda}}f(1,1,\rho_{pq})}{{\boldsymbol{\lambda}}!}$ appearing in the second term of (4.2) can be bounded by either $|\rho_{pq}|$ or $\rho_{pq}^{2}$ up to some multiplicative constants. This makes proving the useful bound for $\mathbb{E}[II_{2}^{2}]$ in terms of the norm $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ amenable to the straightforward approach of squaring and taking expectation. Thus we shall defer the proof of Lemma 4.2 to Appendix D and address the bound in Lemma 4.1 for the rest of this section.

We will start with the fact that

[TABLE]

and form estimates for the terms on the right hand side. To understand the mean and variance of $II_{1}$ , it is more instructive to first recognize that each term in (4.1) can be written as a U-statistic of degree $4$ . For instance, for any four distinct indices $1\leq i,j,k,l\leq n$ , if we only treat ${\bf X}_{pq,i}=(X_{pi},X_{qi})^{\prime},\dots,{\bf X}_{pq,l}=(X_{pl},X_{ql})^{\prime}$ as a four tuple in $\mathbb{R}^{2}$ , the function

[TABLE]

is symmetric in its four arguments, and the first term in (4.1) can be written as the U-statistic

[TABLE]

where the summation on the right hand side is over all distinct unordered qradruples $i,j,k,l$ that can be formed from $[n]$ . We note that the factor $n-1\choose 3$ appears as a denominator in (4.5) because for each $i\in\{1,\dots,n\}$ , the summand $(X_{pi}X_{qi}-\rho_{pq})^{2}$ will appear only once on the left hand side of (4.6), while by the definition of $h_{1,pq}$ it will appear in $n-1\choose 3$ kernels that are summed over on the right hand side of (4.6) (Since for each $i$ , there will be $n-1\choose 3$ choices of $j,k,l$ to form a quadruple $(i,j,k,l)$ from $\{1,\dots,n\}$ ). Thus, the factor $n-1\choose 3$ appears as a denominator in definition (4.5) to account for the multiple counting.

Note that the other terms of the form $\frac{\partial^{\boldsymbol{\lambda}}f(1,1,\rho_{pq})}{{\boldsymbol{\lambda}}!}\bar{S}_{pp}^{\lambda_{1}}\bar{S}_{qq}^{\lambda_{2}}\bar{S}_{pq}^{\lambda_{3}}$ in (4.1) are indexed by ${\boldsymbol{\lambda}}$ equal to $(1,0,2)$ , $(0,1,2)$ , $(2,0,2)$ , $(0,2,2)$ . These terms can be represented as U-statistics of degree $4$ using a similar strategy: With four distinct indices $i,j,k,l$ from $[n]$ , by defining the symmetric kernel function

[TABLE]

for ${\boldsymbol{\lambda}}=(1,0,2)$ , where above we interpret $\pi$ as permutation functions on distinct elements, we have the U-statistic representation of degree $4$

[TABLE]

Note that (4.8) simply comes from Lemma B.1. What we have done here is that, for each term $(X^{2}_{p\tilde{i}}-1)(X_{p\tilde{j}}X_{q\tilde{j}}-\rho_{pq})(X_{p\tilde{k}}X_{q\tilde{k}}-\rho_{pq})$ in (4.8) with $\tilde{i},\tilde{j},\tilde{k}$ not necessarily distinct, we find any $4$ distinct indices $i,j,k,l$ that contain $\tilde{i},\tilde{j},\tilde{k}$ as sets, and arrange the term into one of the three summands of order $O(1)$ , $O(n^{-1})$ and $O(n^{-2)}$ in (4.7) according to the actual set cardinality $|\{\tilde{i},\tilde{j},\tilde{k}\}|$ , which can be equal to $1$ , $2$ or $3$ . Since there are ${n-|\{\tilde{i},\tilde{j},\tilde{k}\}|\choose 4-|\{\tilde{i},\tilde{j},\tilde{k}\}|}$ choices of distinct $i,j,k,l$ that contain $\{\tilde{i},\tilde{j},\tilde{k}\}$ as sets, to account for the duplications we put the factors $n-3\choose 1$ , $n-2\choose 2$ , $n-1\choose 3$ as denominators for the three summands in the definition (4.7) of the kernel. By a simple symmetry argument if we define the kernel

[TABLE]

where $\bar{\bf X}_{pq,i}:=(X_{qi},X_{pi})^{\prime}$ , we have

[TABLE]

In the same vein, for ${\boldsymbol{\lambda}}$ equals $(2,0,2)$ or $(0,2,2)$ and four distinct indices $i,j,k,l$ from $[n]$ , we leave it to the reader to check that one can define a symmetric kernel $h_{3,pq}$ of degree $4$ as shown in Appendix D such that

[TABLE]

and

[TABLE]

where

[TABLE]

Letting ${\bf X}_{i}=(X_{1i},\dots,X_{mi})^{\prime}$ denote the entire $i$ -th sample, we have the degree- $4$ U-statistic representation for $II_{1}$ :

[TABLE]

where

[TABLE]

Hence,

[TABLE]

The expectation for each of $h_{1,pq}(\cdot),h_{2,pq}(\cdot),h_{3,pq}(\cdot)$ in the preceding display can be computed by taking expectation for each of the product terms appearing in $\{\cdot\}$ in definitions (4.5), (4.7) as well as the counterparts in the definition of $h_{3,pq}$ in Appendix D (Note that quite a few of these expectations are simply zero due to independence of samples). Exploiting symmetry the same can be done for (4.9) and (4.10). In principle, these higher-order normal moments can all be obtained by repeatedly applying Isserlis’s theorem (Theorem B.2) laboriously. With symbolic computational softwares such as mathematica they can however be much more effortlessly computed. These computations lead to

[TABLE]

and further details are given in Appendix D. As a direct consequence of Hoeffding (1948)’s classical result on the variance of U-statistics, we also have the bound

[TABLE]

where

[TABLE]

and the functions $g_{c}:(\mathbb{R}^{m})^{c}\longrightarrow\mathbb{R}$ , $c=1,\dots,4$ , are defined as

[TABLE]

Hence, forming estimates of the quantities $\zeta_{1},\dots,\zeta_{4}$ can lead to an estimate of $\text{Var}[II_{1}]$ .

Lemma 4.3 (Bound for the $\zeta_{c}$ ’s).

[TABLE]

Again, proving these estimates involves repeatedly applying Theorem B.2 with the help of mathematica and the details will be deferred to Appendix D. We note that these estimates are by no means sharp, but suffice for our purpose. Putting Lemma 4.3 and (4.14) together, it is a routine task to check that

[TABLE]

for any $0<\gamma<1/2$ . This, together with (4.4) and (4.13), proved Lemma 4.1.

5. Conclusion

In this paper, we studied the exact power of the Rao’s score statistic for testing independence, under the asymptotic regime where both the dimension $m$ and sample size $n$ grow to infinity when the ratio $m/n$ is bounded. A consequence of our main result is that the Rao’s score test is minimax rate optimal under this regime, with respect to a signal size $\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ of order $\sqrt{m/n}$ .

While previous related work (Chen and Shao, 2012) on the null theory only requires the random variables to have finite moments, our power analysis relied on the normality assumption in different ways. Via applications of the Isserlis’ theorem on normal moments (Theorem B.2), all the higher moment quantities appeared in the calculations for the terms $I$ and $II$ in Sections 3 and 4 can be controlled in terms of $\|{\bf R}-{\bf I}_{m}\|_{F}$ , a second moment quantity in the original variables $X_{1},\dots,X_{m}$ per se. It is thus conceivable that one can replace normality with appropriate higher moment conditions by carefully keeping track of these calculations. The tail bound for $III$ in Lemma 2.5 relies on a maximal inequality applicable to sub-exponential random variables, which is true for the centered sample covariances $\bar{S}_{pq}$ when they are formed with normal data (see Appendix A). When normality cannot be assumed, we expect that one can use more general maximal inequalities such as Chernozhukov et al. (2015, Lemma 8) along with their consequential moment conditions. A final caveat for pursuing the non-normal generality is that one should consider the more common definition of the sample covariance in (2.4) when constructing their Pearson correlations. Comparing (2.3) with (2.4), the insertion of sample means will likely complicate the calculations to follow under our current proof strategy.

Acknowledgments

We thank the referees for their valuable comments and suggestions. Qi-Man Shao’s research is partially supported by the grant Hong Kong RGC GRF14302515.

Appendix A Probability tail bound of $III$

We will prove the tail bound for $III$ in Lemma 2.5. For $1\leq p,q\leq m$ , by a standard trick (Bickel and Levina, 2008, p.221), for any $t>0$ , one can show the sub-exponential inequality

[TABLE]

under our assumptions at the beginning of Section 2. Then by the maximal inequality in van der Vaart and Wellner (1996, Lemma 2.2.10) and a union bound, we have for any $0<c<1/2$ ,

[TABLE]

Note that by the definition of $III$ ,

[TABLE]

for ${\boldsymbol{\lambda}}=(\lambda_{1},\lambda_{2},\lambda_{3})$ . If $\max_{1\leq p,q\leq m}|\bar{S}_{pq}|\leq n^{-c}$ , for all $1\leq p,q\leq m$ it must be true that

[TABLE]

since $k_{pq}\in(0,1)$ Combining (A.1), (A.2), (A.3), with probability larger than $1-C(n^{c-1}\log m+n^{c-1/2}\sqrt{\log m})$

[TABLE]

for large $m,n$ .

Appendix B Technical tools

In this section we will lay out the technical tools required to finish the proofs in the paper.

Lemma B.1.

Let $f$ be as defined in (2.2). For any ${\boldsymbol{\lambda}}=(\lambda_{1},\lambda_{2},\lambda_{3})\in\mathbb{N}^{3}_{\geq 0}$

[TABLE]

Theorem B.2 (Isserlis (1918)).

For any natural number $k\geq 1$ , let $(Z_{1},\dots,Z_{2k})$ be a mean zero normal vector with covariance matrix $\operatorname{{\bf R}}=(\rho_{pq})_{1\leq p,q\leq 2k}$ . Then

[TABLE]

where the summation is over all possible $\frac{(2k)!}{2^{k}k!}$ partitions of the indices $1,\dots,2k$ into $k$ pairs $(p_{1},p_{2}),\dots,(p_{2k-1},p_{2k})$ .

Corollary B.3.

For any four indices $1\leq p_{1},q_{1},p_{2},q_{2}\leq m$ ,

[TABLE]

Proof.

A simple corollary of Theorem B.2. ∎

Lemma B.4.

For a fixed natural number $k$ , suppose $1\leq p_{d},q_{d},\leq m$ , $d=1,\dots,2k$ are any $2k$ pairs of variable indices. Then

[TABLE]

where the $O(\cdot)$ term is uniform for all choices of $1\leq p_{d},q_{d},\leq m$ , $d=1,\dots,2k$ .

Proof.

On expansion,

[TABLE]

so we only need to show the term in $\{\cdot\}$ on the right hand side above is a uniform $O(n^{k})$ term. We note that, by independence, an expectation on the right hand of the preceding display can only be non-zero if

[TABLE]

One way that (B.1) may happen is when there is a permutation $\pi\in\mathcal{S}_{2k}$ such that

[TABLE]

There can at most be $O(n^{k})$ many combinations of $i_{1},\dots,i_{2k}$ satisfying (B.2) since when (B.2) is true, the set $\cup_{d=1}^{2k}\{i_{d}\}$ can at most have $k$ elements leaving us with $O({n\choose k})=O(n^{k})$ many choices for the combination of $i_{1},\dots,i_{2k}$ . We note that when a configuration in (B.1) is such that the set $\cup_{d=1}^{2k}\{i_{d}\}$ has cardinality exactly equal to $k$ ,

[TABLE]

by Corollary B.3. One can also easily see that there are at most $O(n^{k-1})$ many combinations of $i_{1},\dots,i_{2k}$ other than ones satisfying (B.2) that can lead to (B.1). Hence by Theorem B.2 and our assumption at the beginning of Section 2, we have

[TABLE]

where the $O(\cdot)$ is uniform for all choices of $1\leq p_{d},q_{d}\leq m$ . ∎

The next two lemmas on sums of products of the entries in the population correlation matrix $\operatorname{{\bf R}}=(\rho_{pq})_{1\leq p,q\leq m}$ are keys for finishing our proofs.

Lemma B.5.

Suppose $\pi=(\pi_{1},\dots,\pi_{4})$ is a particular permutation of the four indices $p,q,r,s$ , say, $\pi=(p,r,s,q)$ . The following estimates are true:

[TABLE]

Proof of Lemma B.5.

With a slight abuse of notations, the expression “ $r\not=p,q$ ” means that $r$ is a number that is not equal to $p$ nor $q$ .

By the fact that $2|ab|\leq a^{2}+b^{2}$ for all $a,b\in\mathbb{R}$ ,

[TABLE]

which proves (B.4). Similarly,

[TABLE]

where the last inequality comes from a similar proof as the one for (B.4). ∎

Lemma B.6.

For $k=5,\dots,8$ ,

(i)

[TABLE] 2. (ii)

If $\pi=(\pi_{1},\dots,\pi_{8})$ and $\tau=(\tau_{1},\dots,\tau_{8})$ are two fixed permutations of the eight indices $p_{1},q_{1},\dots,p_{4},q_{4}$ . For instance, $\pi$ can be equal to, say, $(p_{1},p_{4},q_{3},q_{2},p_{2},q_{1},q_{4},p_{3})$ . Then

[TABLE]

Proof of Lemma B.6.

We first note that for $(ii)$ , By the inequality that $2|ab|\leq a^{2}+b^{2}$ for all $a,b\in\mathbb{R}$ , we have

[TABLE]

hence to show $(ii)$ it suffices to show

[TABLE]

Given $k\in\{5,\dots,8\}$ , when $k$ of the indices $p_{1},q_{1},\dots,p_{4},q_{4}$ are distinct, it must be the case that there exist $k-4$ pairs of $(p_{d},q_{d})$ such that all indices from these $k-4$ pairs are distinct elements from $[m]$ . Without lost of generality we can assume these $k-4$ pairs to be $(p_{1},q_{1}),\dots,(p_{k-4},q_{k-4})$ , which contains a total of $2k-8$ distinct indices, and for proving $(i)$ and (B.6) it suffices to show, respectively,

[TABLE]

and

[TABLE]

As all the $\rho$ ’s are bounded in absolute value by $1$ , summing over the other $k-(2k-8)=8-k$ indices different from $p_{1},q_{1},\dots,p_{k-4},q_{k-4}$ results in a $O(m^{8-k})$ term which gives

[TABLE]

Now since

[TABLE]

by standard norm inequality, evaluating the sum on the right hand side of (B.9) we further obtain

[TABLE]

which is exactly (B.7). Similarly, summing over the other $8-k$ indices different from $p_{1},q_{1},\dots,p_{k-4},q_{k-4}$ on the left hand side of (B.8) results in a $O(m^{8-k})$ term and hence

[TABLE]

Since $\sum_{1\leq p\not=q\leq m}\rho_{pq}^{2}=\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}^{2}$ , we get (B.8) by continuing from the preceding display. ∎

Appendix C Proofs for Section 3

C.1. Proof of (3.7)-(3.9)

First, we show the estimates in (3.7)-(3.9). Note that by Corollary B.3,

[TABLE]

for $k=2,3,4$ . Also recall that $\rho_{pp}=1$ for all $1\leq p\leq m$ . We will analyze the sum in (C.1) for different $k$ .

(3.7): When $k=2$ , with $p_{d}<q_{d}$ for $d=1,2$ , it must be that $p_{1}=p_{2}$ and $q_{1}=q_{2}$ , and hence from (C.1)

[TABLE]

since $\sum_{1\leq p<q\leq m}\rho_{pq}^{2}=2^{-1}\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}^{2}$ and $\rho_{pq}^{4}\leq\rho_{pq}^{2}$ .

(3.8): When $k=3$ , one possible configuration of $\cup_{d=1}^{2}\{p_{d},q_{d}\}$ as a set with cardinality $3$ is that

[TABLE]

Taking a sum just over the terms in (C.1) whose indices $p_{1},q_{1},p_{2},q_{2}$ satisfy the configuration (C.2) we get

[TABLE]

where the second last inequality is true because we enlarged the set of indices $p_{1},q_{1},q_{2}$ we are summing over and used the fact that

[TABLE]

since any $\rho^{2}_{pq}$ is less than $1$ , and the last inequality follows from (B.5) and that

[TABLE]

The same estimates can be proved for other set configurations of $\cup_{d=1}^{2}\{p_{d},q_{d}\}$ similar to the one in (C.2). Since there are only finitely many such configurations, we get the estimate in (3.8).

(3.9): By considering different configurations for the set $\cup_{d=1}^{2}\{p_{d},q_{d}\}$ with cardinality $4$ , from (C.1) we have

[TABLE]

where the last inequality used (B.4) and

[TABLE]

C.2. Proof of (3.17)

In fact, the strategy we used in proving (3.8) will also lead to a quick proof of the estimates for $\mathbb{T}(k)$ , $k=5,\dots,8$ in (3.17). Be definition,

[TABLE]

By expanding the product $\prod_{d\in[4]}(X_{p_{d}}X_{q_{d}}-\rho_{p_{d}q_{d}})$ at the end of the above equation and taking expectation with respect to Theorem B.2, one can see that

[TABLE]

where here we interpret $\pi=(\pi_{1},\dots,\pi_{8})$ as a permutation of the eight indices $p_{1},q_{1},\dots,p_{4},q_{4}$ . When the permutation $\pi=(p_{1},q_{1},p_{2},q_{2},p_{3},q_{3},p_{4},q_{4})$ , we have

[TABLE]

by Lemma B.6 $(i)$ . Although (C.4) is only proved for $\pi=(p_{1},q_{1},p_{2},\dots,p_{4},q_{4})$ , a same bound for all other permutations easily generalize, which gives our estimate in (3.17) in light of (C.3).

C.3. Proof of (3.27)-(3.29)

(3.27): We first write

[TABLE]

where the $O(m^{4})$ term comes from a remaining sum of $O(m^{4})$ many universally bounded terms when $|\cup_{d=1}^{4}\{p_{d},q_{d}\}|\leq 4$ . By the definition of $\mathbb{P}_{1}$ in (3.20), Corollary B.3 and Lemma B.6 $(i)$ , it can be seen that for each $k=5,\dots,8$ ,

[TABLE]

giving (3.27) in light of (C.5).

(3.28) and (3.29): Similar to (C.5) for $u=1,2,3$ , we can write

[TABLE]

By Corollary B.3 we get that $\mathbb{P}_{1}\mathbb{P}_{u}$ is a finite sum of terms each having the form

[TABLE]

for $\pi=(\pi_{1},\dots,\pi_{8})$ and $\tau=(\tau_{1},\dots,\tau_{8})$ that are certain permutations of the $8$ indices $p_{1},q_{1},\dots,p_{4},q_{4}$ . As such, by Lemma B.6 $(ii)$ , for a given $k=5,\dots,8$ ,

[TABLE]

Given (C.6) and (C.3) it remains to show

[TABLE]

and, for $u=2,3$ ,

[TABLE]

to prove (3.28) and (3.29). To that end we make the following claim:

Claim. Suppose $\pi=(\pi_{1},\dots,\pi_{8})$ and $\tau=(\tau_{1},\dots,\tau_{8})$ are two given permutations of eight indices $p_{1},q_{1},\dots,p_{4},q_{4}\in[m]$ . Then

[TABLE]

unless, as elements in $[m]$ ,

[TABLE]

for all $d^{\prime}=1,3,5,7$ when $1\leq p_{d}<q_{d}\leq m$ for all $d=1,\dots,4$ and $|\cup_{d=1}^{4}\{p_{d},q_{d}\}|=4$ .

The proof of this claim will be left till the end of this section. Using this, we will first show (C.9) for $u=2$ while the proof for $u=3$ follows similarly and is thus omitted.

By Corollary B.3, on expansion we get that the $\sum_{\begin{subarray}{c}1\leq p_{d}<q_{d}\leq m\\ d=1,2,3,4\\ \cup_{d=1}^{4}|\{p_{d},q_{d}\}|=4\end{subarray}}\mathbb{P}_{1}\mathbb{P}_{2}$ is a finite sum of terms each having the form as in the left hand side of (C.10) with $\pi$ and $\tau$ NOT satisfying the description in (C.11) of the claim. For example, by Corollary B.3, on expansion

[TABLE]

which leads to

[TABLE]

where

[TABLE]

and

[TABLE]

and similar terms are omitted in $\cdots$ above. Note that when $|\cup_{d=1}^{4}\{p_{d},q_{d}\}|=4$ and $1\leq p_{d}<q_{d}\leq m$ , there must be a pair among $\{(\pi^{\prime}_{d},\pi^{\prime}_{d+1}),(\tau^{\prime}_{d},\tau^{\prime}_{d+1}):d=1,3,5,7\}$ that contains two distinct elements in $[m]$ due to a mismatch of the permutations $\pi^{\prime}$ and $\tau^{\prime}$ : For if not in consideration of $\pi^{\prime}$ it must be the case that $p_{1}=p_{2}$ , $q_{1}=q_{2}$ , $p_{3}=p_{4}$ and $q_{3}=q_{3}$ with $p_{1},q_{1},p_{3},p_{4}$ being four distinct elements in $[m]$ , but this will imply $\tau^{\prime}_{1}=p_{1}\not=p_{3}=\tau^{\prime}_{2}$ , a contradiction. By the claim above the first term on the right hand side of (C.12) equals to $O(m^{3})\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ , where as the finitely many omitted terms $\cdots$ in (C.12) can also be similarly bounded and (C.9) is proved.

We now show (C.8), again with Corollary B.3, we expand

[TABLE]

where we leave it to the reader to check that the omitted terms in $\cdots$ of (C.15) is of order $O(m^{3})\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ due to mismatch of permutations as in (C.13) and (C.14). In fact, summing over the three terms on the second line of (C.15) also contribute a term of order $O(m^{3})\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ : For example, summing over the last term on the second line of (C.15) equals

[TABLE]

with

[TABLE]

When $1\leq p_{d}<q_{d}\leq m$ and $|\cup_{d=1}^{4}\{p_{d},q_{d}\}|=4$ , we cannot have $\tilde{\pi}_{d^{\prime}}=\tilde{\pi}_{d^{\prime}+1}$ for all $d^{\prime}=1,3,5,7$ and hence by the previous claim (C.16) is of order $O(m^{3})\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ . Hence it remains to show that summing over the terms on the first line of (C.15) gives

[TABLE]

When $|\cup_{d=1}^{4}\{p_{d},q_{d}\}|=4$ with $p_{d}<q_{d}$ for all $d=1,\dots,4$ , as a set $\cup_{d=1}^{4}\{p_{d},q_{d}\}$ can take the configuration

[TABLE]

When (C.18) is true, $\rho_{p_{1}p_{2}}^{2}\rho_{q_{1}q_{2}}^{2}\rho_{p_{3}p_{4}}^{2}\rho_{q_{3}q_{4}}^{2}=1$ , and hence

[TABLE]

For any configurations of the set $\cup_{d=1}^{4}\{p_{d},q_{d}\}$ other than (C.18), one of

(i) $p_{1}\not=p_{2}$ , (ii) $q_{1}\not=q_{2}$ , (iii) $p_{3}\not=p_{4}$ or (iv) $q_{3}\not=q_{4}$ must be true. For example, one such configuration is

[TABLE]

For this particular configuration, $(i)$ $p_{1}\not=p_{2}$ is true. Then we leave it to the reader to verify that by the same line of reasoning as in the proof of the claim below, we can show

[TABLE]

where similar bounds can in fact be proved for all configurations of $\cup_{d=1}^{4}\{p_{d},q_{d}\}$ other than (C.18). This, togethers with (C.19), leads to (C.17).

Proof of the claim.

Suppose (C.11) is not true for some $d^{\prime}\in\{1,3,5,7\}$ , and without loss of generality we assume $\pi_{1}\not=\pi_{2}$ . Since $|\rho_{pq}|\leq 1$ for all $1\leq p,q\leq m$ , we have

[TABLE]

aa desired. ∎

Appendix D Proofs for Section 4

Before finishing the proofs for Section 4, we first give the definition of the kernel $h_{3,pq}$ as mentioned in the main text:

[TABLE]

We now proceed with the remaining proofs.

Proof for Lemma 4.2.

Note that by definition,

[TABLE]

Since there are only finitely many ${\boldsymbol{\lambda}}$ we are summing over for the second term in (D.1), by the general fact that $2|ab|\leq a^{2}+b^{2}$ , it suffices to show that, for ${\boldsymbol{\lambda}}=(\lambda_{1},\lambda_{2},\lambda_{3})$ with $1\leq|{\boldsymbol{\lambda}}|\leq 4$ and $\lambda_{3}\not=2$ , the quantities

[TABLE]

as well as

[TABLE]

can be bounded by the right hand side of (4.3) up to some multiplicative constants. We will first show it for (D.2) case by case according to the multi-index degree of $\boldsymbol{\lambda}$ . The arguments rely on the fact that, by Lemma B.1, it must be true that

[TABLE]

and

[TABLE]

for some constant $C>0$ . Consider $3$ cases:

$|{\boldsymbol{\lambda}}|=3$ or $4$ : With the facts in (D.4) and (D.5), with Lemma B.4, (D.2) is less than

[TABLE]

Respectively, by properties of norms they can be estimated by

[TABLE]

which are both less than the right hand side of (4.3) up to constants since $|\boldsymbol{\lambda}|=3$ or $4$ .

$|{\boldsymbol{\lambda}}|=1$ : The only ${\boldsymbol{\lambda}}\in\mathbb{N}^{3}_{\geq 0}$ with $|{\boldsymbol{\lambda}}|=1$ and $\lambda_{3}\not=2$ are $(1,0,0)$ , $(0,1,0)$ , $(0,0,1)$ . When $\lambda_{3}=0$ , by (D.5) and Lemma B.4 the second moment quantity in (D.2) is bounded by

[TABLE]

less than the right hand side of (4.3). When ${\boldsymbol{\lambda}}=(0,0,1)$ , by Lemma B.1, (D.2) equals

[TABLE]

where the second equality comes from the fact that

[TABLE]

due to the i.i.d.’ness of samples and Corollary B.3. To show the last equality, by exploiting symmetry it is easy to see that

[TABLE]

Observe that

[TABLE]

In light of Lemma B.5, applying these bounds to (D.8) implies (D.7).

$|{\boldsymbol{\lambda}}|=2$ : The only ${\boldsymbol{\lambda}}$ ’s with $|{\boldsymbol{\lambda}}|=2$ and $\lambda_{3}\not=2$ are $(1,1,0),(2,0,0),(0,2,0),(1,0,1)$ and $(0,1,1)$ . For the first three of these since $\lambda_{3}=0$ , by (D.5) and Lemma B.4 the quantity in (D.2) equals

[TABLE]

For ${\boldsymbol{\lambda}}=(1,0,1)$ , with Lemma B.1 the quantity in (D.2) equals

[TABLE]

By simple argument as in the proof of Lemma B.4 and Corollary B.3, it is not hard to see that

[TABLE]

Substituting (D.11) into (D.10) we get

[TABLE]

where the last two inequalities make use of similar arguments that prove (D.7). By a symmetry argument the same estimate holds for ${\boldsymbol{\lambda}}=(0,1,1)$ . Both (D.9) and (D.12) are less than the right hand side of (4.3).

It remains to form an estimate for (D.3). Note that

[TABLE]

where the $O(n^{-4})$ term comes from an argument similar to the proof of Lemma B.4, and the $O(m^{3})$ term comes from that the $O(m^{3})$ many choices for $p_{1},q_{1},p_{2},q_{2}$ when $|\cup_{d=1}^{2}\{p_{d},q_{d}\}|\leq 3$ . Hence it now suffices to show the first term on the right hand side of the preceding display is less than the RHS of (4.3). The argument is simple but a little tedious so we just sketch it here: By a similar argument as in the proof of Lemma B.4 we must have

[TABLE]

where the expectations on the right come from the fact that $(X_{p_{1}}^{2}-1)$ must pair with one of $(X_{p_{2}}^{2}-1)$ , $(X_{q_{1}}^{2}-1)$ , $(X_{q_{2}}^{2}-1)$ , $(X_{p_{1}}X_{q_{1}}-\rho_{p_{1}q_{1}})$ and $(X_{p_{2}}X_{q_{2}}-\rho_{p_{2}q_{2}})$ as in (B.3). By Corollary B.3, for $k$ equals $p_{2}$ , $q_{1}$ or $q_{2}$ , it must be that

[TABLE]

for $d^{\prime}$ equals $1$ or $2$ , it must be that

[TABLE]

Substituting (D.14) and (D.15) into (D.13) gives that

[TABLE]

which gives us an estimate less than the one required. ∎

Proof of (4.13) and Lemma 4.3.

As described by the main text, with the help of the Expectation function provided by mathematica, we easily find that

[TABLE]

for each pair $1\leq p<q\leq m$ . Collecting these and summing over all $1\leq p<q\leq m$ gives the expectation of the kernel $h$ in (4.13). We will now prove Lemma 4.3, first dealing with (4.19). Note that $g_{4}(\cdot)$ simply equals the kernel function $h$ , in particular for a set of distinct sample indices $i,j,k,l\in[n]$ we have

[TABLE]

by collecting the $O(1)$ terms in the definition of $h_{2,pq},\bar{h}_{2,pq},h_{3,pq},\bar{h}_{3,pq}$ , where $t_{4}(\cdot)$ is just a fixed polynomial in ${\bf X}_{pq,i},{\bf X}_{pq,j},{\bf X}_{pq,k},{\bf X}_{pq,l},\rho_{pq}$ whose form is irrelevant to us. Using the fact that $2|ab|\leq a^{2}+b^{2}$ for all $a,b\in\mathbb{R}$ , we have

[TABLE]

A key observation is that upon squaring and taking expectation, $\mathbb{E}[\tilde{g}_{4}({\bf X}_{i},{\bf X}_{j},{\bf X}_{k},{\bf X}_{l})^{2}]$ must be a sum of finitely many terms, where for some sample indices $\tilde{i},\tilde{j}\in\{i,j,k,l\}$ , each of these terms can be “ $\lesssim$ ” bounded by the form

[TABLE]

where for any sample index $i\in[n]$ and variable indice $p,q\in[m]$ , $A(p,q,i)$ and $B(p,q,i)$ may equal to one of

[TABLE]

Now if $\tilde{i}\not=\tilde{j}$ , the form in (D.17) equals zero. If $\tilde{i}=\tilde{j}$ , the form in (D.17) can be bounded by

[TABLE]

and by applying Corollary B.3, we leave it for the reader to check that the leading term in the preceding display must be “ $\lesssim$ ” bounded by $m^{3}\|\operatorname{{\bf R}}-\operatorname{{\bf I}_{m}}\|_{F}$ . Summarizing this gives us the bound in (4.19).

Now we get to (4.16)-(4.18). The functions $g_{c}(\cdot)$ , $c=1,\dots,3$ for the kernel $h$ can be found by simply conditioning and taking expectation using Theorem B.2. With the help of mathematica, they are found to be

[TABLE]

and

[TABLE]

Above, $t_{1}(\cdot),t_{2}(\cdot)$ and $t_{3}(\cdot)$ are simply three fixed polynomials in their respective arguments, and their forms are irrelevant to us. $\tilde{g}_{1}(\cdot)$ , $\tilde{g}_{2}(\cdot)$ and $\tilde{g}_{3}(\cdot)$ simply collect the terms that do not involve $t_{1}(\cdot),t_{2}(\cdot)$ and $t_{3}(\cdot)$ , respectively. Note that by the same fact that $2|ab|\leq a^{2}+b^{2}$ for $a,b\in\mathbb{R}$ ,

[TABLE]

Note that in the definition of $\tilde{g}_{1}$ , there is a leading factor of order $n^{-1}$ . By applying Theorem B.2 with the help of mathematica, we find that $\mathbb{E}[\tilde{g}_{1}({\bf X}_{i})^{2}]$ is a finite sum of terms each, up to a factor of order $n^{-2}$ , can be bounded by one of the forms:

[TABLE]

We leave it for the reader to check that with the two estimates in Lemma B.5 and the familiar trick of decomposing a sum according to the cardinality of an index set as in (3.5), the forms in (D.21) can all be bounded by

[TABLE]

up to constants, and hence from (D.18) we obtain the estimate for $\zeta_{1}$ in (4.16). By the same token, with the help of Mathematica we observe that

[TABLE]

again by the index set decomposition trick and Lemma B.5 we have

[TABLE]

Collecting (D.19),(D.20), (D.22) and (D.23) gives us (4.17) and (4.18).

∎

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson (2003) Anderson, T. W. (2003). An introduction to multivariate statistical analysis . Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, third edition.
2Bickel and Levina (2008) Bickel, P. J. and Levina, E. (2008). “Regularized estimation of large covariance matrices.” Ann. Statist. , 36(1): 199–227.
3Cai and Jiang (2011) Cai, T. T. and Jiang, T. (2011). “Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices.” Ann. Statist. , 39(3): 1496–1525.
4Cai and Ma (2013) Cai, T. T. and Ma, Z. (2013). “Optimal hypothesis testing for high dimensional covariance matrices.” Bernoulli , 19(5B): 2359–2388.
5Chen and Shao (2012) Chen, Y. and Shao, Q.-M. (2012). “Berry-Esseen inequality for unbounded exchangeable pairs.” In Probability approximations and beyond , volume 205 of Lecture Notes in Statist. , 13–30. Springer, New York.
6Chernozhukov et al. (2015) Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). “Comparison and anti-concentration bounds for maxima of Gaussian random vectors.” Probab. Theory Related Fields , 162(1-2): 47–70. URL http://dx.doi.org/10.1007/s 00440-014-0565-9 · doi ↗
7Han and Liu (2014) Han, F. and Liu, H. (2014). “Distribution-free tests of independence with applications to testing more structures.” ar Xiv preprint ar Xiv:1410.4179 .
8Heyde and Brown (1970) Heyde, C. C. and Brown, B. M. (1970). “On the departure from normality of a certain class of martingales.” Ann. Math. Statist. , 41: 2161–2165.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Asymptotic power of Rao’s score test for independence in high dimensions

Abstract.

2000 Mathematics Subject Classification:

1. Introduction

2. Notations and main results

Theorem 2.1** (Main result: asymptotic power).**

Theorem 2.2** (Matching lower bound, Cai and Ma (2013)).**

Lemma 2.3** (Berry Esseen theorem for III).**

Lemma 2.4** (Bound on the 2nd moment of II−m(m−1)2nII-\frac{m(m-1)}{2n}II−2nm(m−1)​ ).**

Lemma 2.5** (Probability bound for IIIIIIIII).**

Proof of Theorem 2.1.

3. The Berry Esseen bound for III

4. The second moment bound for II−m(m−1)2nII-\frac{m(m-1)}{2n}II−2nm(m−1)​

Lemma 4.1** (Bound on the second moment of II1−m(m−1)2nII_{1}-\frac{m(m-1)}{2n}II1​−2nm(m−1)​).**

Lemma 4.2** (Bound on the second moment of II2II_{2}II2​).**

Lemma 4.3** (Bound for the ζc\zeta_{c}ζc​’s).**

5. Conclusion

Acknowledgments

Appendix A Probability tail bound of IIIIIIIII

Appendix B Technical tools

Lemma B.1**.**

Theorem B.2** (Isserlis (1918)).**

Corollary B.3**.**

Proof.

Lemma B.4**.**

Proof.

Lemma B.5**.**

Proof of Lemma B.5.

Lemma B.6**.**

Proof of Lemma B.6.

Appendix C Proofs for Section 3

C.1. Proof of (3.7)-(3.9)

C.2. Proof of (3.17)

C.3. Proof of (3.27)-(3.29)

Proof of the claim.

Appendix D Proofs for Section 4

Proof for Lemma 4.2.

Proof of (4.13) and Lemma 4.3.

Theorem 2.1 (Main result: asymptotic power).

Theorem 2.2 (Matching lower bound, Cai and Ma (2013)).

Lemma 2.3 (Berry Esseen theorem for $I$ ).

Lemma 2.4 (Bound on the 2nd moment of $II-\frac{m(m-1)}{2n}$ ).

Lemma 2.5 (Probability bound for $III$ ).

3. The Berry Esseen bound for $I$

4. The second moment bound for $II-\frac{m(m-1)}{2n}$

Lemma 4.1 (Bound on the second moment of $II_{1}-\frac{m(m-1)}{2n}$ ).

Lemma 4.2 (Bound on the second moment of $II_{2}$ ).

Lemma 4.3 (Bound for the $\zeta_{c}$ ’s).

Appendix A Probability tail bound of $III$

Lemma B.1.

Theorem B.2 (Isserlis (1918)).

Corollary B.3.

Lemma B.4.

Lemma B.5.

Lemma B.6.