How to avoid the zero-power trap in testing for correlation

David Preinerstorfer

arXiv:1812.10752·math.ST·July 1, 2021

How to avoid the zero-power trap in testing for correlation

David Preinerstorfer

PDF

Open Access

TL;DR

This paper addresses the zero-power trap in correlation testing, proposing methods to modify tests so they maintain high power even with strongly correlated errors, thus improving test reliability.

Contribution

It introduces a practical modification to existing tests that avoids the zero-power trap while preserving their optimality properties.

Findings

01

Modified tests achieve power close to one for strong correlations

02

The approach preserves the original test's power function

03

Numerical illustrations demonstrate effectiveness in network correlation testing

Abstract

In testing for correlation of the errors in regression models the power of tests can be very low for strongly correlated errors. This counterintuitive phenomenon has become known as the "zero-power trap". Despite a considerable amount of literature devoted to this problem, mainly focusing on its detection, a convincing solution has not yet been found. In this article we first discuss theoretical results concerning the occurrence of the zero-power trap phenomenon. Then, we suggest and compare three ways to avoid it. Given an initial test that suffers from the zero-power trap, the method we recommend for practice leads to a modified test whose power converges to one as the correlation gets very strong. Furthermore, the modified test has approximately the same power function as the initial test, and thus approximately preserves all of its optimality properties. We also provide some…

Equations164

y = X β + u,

y = X β + u,

{P_{β, σ, ρ} : β \in R^{k}, σ \in (0, \infty), ρ \in [0, a)}

{P_{β, σ, ρ} : β \in R^{k}, σ \in (0, \infty), ρ \in [0, a)}

H_{0} : ρ = 0, β \in R^{k}, 0 < σ < \infty against H_{1} : ρ > 0, β \in R^{k}, 0 < σ < \infty,

H_{0} : ρ = 0, β \in R^{k}, 0 < σ < \infty against H_{1} : ρ > 0, β \in R^{k}, 0 < σ < \infty,

G_{Z} := {g_{γ, θ} : γ \in R \ {0}, θ \in R^{m}},

G_{Z} := {g_{γ, θ} : γ \in R \ {0}, θ \in R^{m}},

Φ_{B, c} = Φ_{B, C_{X}, c} = {y \in R^{n} : T_{B} (y) > c},

Φ_{B, c} = Φ_{B, C_{X}, c} = {y \in R^{n} : T_{B} (y) > c},

T_{B} (y) = T_{B, C_{X}} (y) = {y^{'} C_{X}^{'} B C_{X} y /∥ C_{X} y ∥^{2} λ_{1} (B) if y \in / s p an (X) if y \in s p an (X) .

T_{B} (y) = T_{B, C_{X}} (y) = {y^{'} C_{X}^{'} B C_{X} y /∥ C_{X} y ∥^{2} λ_{1} (B) if y \in / s p an (X) if y \in s p an (X) .

P_{β, σ, 0} (Φ_{B, κ (α)}) = α for every β \in R^{k} and every σ \in (0, \infty) .

P_{β, σ, 0} (Φ_{B, κ (α)}) = α for every β \in R^{k} and every σ \in (0, \infty) .

G^{'} [B - c I_{n - k}] G > 0,

G^{'} [B - c I_{n - k}] G > 0,

ρ \to a lim inf E_{β, σ, ρ} (φ) = 0;

ρ \to a lim inf E_{β, σ, ρ} (φ) = 0;

ρ \to a lim E_{β, σ, ρ} (φ) = 0 for every β \in R^{k} and every σ \in (0, \infty) .

ρ \to a lim E_{β, σ, ρ} (φ) = 0 for every β \in R^{k} and every σ \in (0, \infty) .

ρ \to a lim P_{β, σ, ρ} (Φ_{B, κ (α)}) = 0 for every β \in R^{k} and every σ \in (0, \infty) .

ρ \to a lim P_{β, σ, ρ} (Φ_{B, κ (α)}) = 0 for every β \in R^{k} and every σ \in (0, \infty) .

r ank (X) = k and C_{X} e \in / Eig (B (X), λ_{n - k} (B (X))),

r ank (X) = k and C_{X} e \in / Eig (B (X), λ_{n - k} (B (X))),

E i g (B (X), λ_{n - k} (B (X))) = E i g (C_{X} M C_{X}^{'}, λ_{n - k} (C_{X} M C_{X}^{'})) .

E i g (B (X), λ_{n - k} (B (X))) = E i g (C_{X} M C_{X}^{'}, λ_{n - k} (C_{X} M C_{X}^{'})) .

ρ \to a lim P_{β, σ, ρ}^{X} (Φ_{B (X), C_{X}, κ (α)}) = 0 for every β \in R^{k} and every σ \in (0, \infty) .

ρ \to a lim P_{β, σ, ρ}^{X} (Φ_{B (X), C_{X}, κ (α)}) = 0 for every β \in R^{k} and every σ \in (0, \infty) .

E i g (F (A), λ_{1} (F (A))) = E i g (A, λ_{1} (A)) .

E i g (F (A), λ_{1} (F (A))) = E i g (A, λ_{1} (A)) .

ρ \to a lim P_{β, σ, ρ} (Φ_{C_{X} e e^{'} C_{X}^{'}, κ (α)}) = 1.

ρ \to a lim P_{β, σ, ρ} (Φ_{C_{X} e e^{'} C_{X}^{'}, κ (α)}) = 1.

\overset{ˉ}{T}_{\overset{ˉ}{B}} (y) = \overset{ˉ}{T}_{\overset{ˉ}{B}, C_{\overset{ˉ}{X}}} (y) = {y^{'} C_{\overset{ˉ}{X}}^{'} \overset{ˉ}{B} C_{\overset{ˉ}{X}} y /∥ C_{\overset{ˉ}{X}} y ∥^{2} λ_{1} (\overset{ˉ}{B}) if y \in / s p an (\overset{ˉ}{X}) if y \in s p an (\overset{ˉ}{X}) .

\overset{ˉ}{T}_{\overset{ˉ}{B}} (y) = \overset{ˉ}{T}_{\overset{ˉ}{B}, C_{\overset{ˉ}{X}}} (y) = {y^{'} C_{\overset{ˉ}{X}}^{'} \overset{ˉ}{B} C_{\overset{ˉ}{X}} y /∥ C_{\overset{ˉ}{X}} y ∥^{2} λ_{1} (\overset{ˉ}{B}) if y \in / s p an (\overset{ˉ}{X}) if y \in s p an (\overset{ˉ}{X}) .

P_{β, σ, 0} ({y \in R^{n} : \overset{ˉ}{T}_{\overset{ˉ}{B}} (y) > \overset{κ}{ˉ} (α)}) = α .

P_{β, σ, 0} ({y \in R^{n} : \overset{ˉ}{T}_{\overset{ˉ}{B}} (y) > \overset{κ}{ˉ} (α)}) = α .

\overset{ˉ}{Φ}_{\overset{ˉ}{B}, \overset{κ}{ˉ} (α)} := {y \in R^{n} : \overset{ˉ}{T}_{\overset{ˉ}{B}} (y) > \overset{κ}{ˉ} (α)} .

\overset{ˉ}{Φ}_{\overset{ˉ}{B}, \overset{κ}{ˉ} (α)} := {y \in R^{n} : \overset{ˉ}{T}_{\overset{ˉ}{B}} (y) > \overset{κ}{ˉ} (α)} .

Λ := ρ \to a lim c (ρ) Π_{s p an (e)^{⊥}} L_{*} (ρ)

Λ := ρ \to a lim c (ρ) Π_{s p an (e)^{⊥}} L_{*} (ρ)

0 < ρ \to a lim P_{β, σ, ρ} (\overset{ˉ}{Φ}_{\overset{ˉ}{B}, \overset{κ}{ˉ} (α)}) = Pr (\overset{ˉ}{T}_{\overset{ˉ}{B}} (Λ G) > \overset{κ}{ˉ} (α)) < 1,

0 < ρ \to a lim P_{β, σ, ρ} (\overset{ˉ}{Φ}_{\overset{ˉ}{B}, \overset{κ}{ˉ} (α)}) = Pr (\overset{ˉ}{T}_{\overset{ˉ}{B}} (Λ G) > \overset{κ}{ˉ} (α)) < 1,

(β, σ, ρ) \mapsto E_{β, σ, ρ} (φ_{α})

(β, σ, ρ) \mapsto E_{β, σ, ρ} (φ_{α})

β \in R^{k} sup σ \in (0, \infty) sup E_{β, σ, 0} (φ_{α}) = α .

β \in R^{k} sup σ \in (0, \infty) sup E_{β, σ, 0} (φ_{α}) = α .

{y \in R^{n} : T_{B} (y) = κ (α)}

{y \in R^{n} : T_{B} (y) = κ (α)}

min (φ_{α - ε} + 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, κ (ε)}}, 1),

min (φ_{α - ε} + 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, κ (ε)}}, 1),

φ_{α, ε}^{*} := min (φ_{α - ε} + 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, c (α, ε)}}, 1) = φ_{α - ε} + (1 - φ_{α - ε}) 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, c (α, ε)}},

φ_{α, ε}^{*} := min (φ_{α - ε} + 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, c (α, ε)}}, 1) = φ_{α - ε} + (1 - φ_{α - ε}) 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, c (α, ε)}},

β \in R^{k} sup σ \in (0, \infty) sup E_{β, σ, 0} [min (φ_{α - ε} + 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, c (α, ε)}}, 1)] = α,

β \in R^{k} sup σ \in (0, \infty) sup E_{β, σ, 0} [min (φ_{α - ε} + 1_{Φ_{C_{X} e e^{'} C_{X}^{'}, c (α, ε)}}, 1)] = α,

ρ \to a lim E_{β, σ, ρ} (φ_{α, ε}^{*}) = 1;

ρ \to a lim E_{β, σ, ρ} (φ_{α, ε}^{*}) = 1;

{Σ (ρ) /∥Σ (ρ) ∥ : ρ \in A}

{Σ (ρ) /∥Σ (ρ) ∥ : ρ \in A}

ε \to 0^{+} lim β \in R^{k} sup σ \in (0, \infty) sup ρ \in A sup ∣ E_{β, σ, ρ} (φ_{α, ε}^{*}) - E_{β, σ, ρ} (φ_{α}) ∣ = 0.

ε \to 0^{+} lim β \in R^{k} sup σ \in (0, \infty) sup ρ \in A sup ∣ E_{β, σ, ρ} (φ_{α, ε}^{*}) - E_{β, σ, ρ} (φ_{α}) ∣ = 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Statistical Methods and Inference

Full text

How to avoid the zero-power trap in testing for correlation

David Preinerstorfer

ECARES and SBS-EM

Université libre de Bruxelles

[email protected]

(August 2018)

Abstract

In testing for correlation of the errors in regression models the power of tests can be very low for strongly correlated errors. This counterintuitive phenomenon has become known as the “zero-power trap”. Despite a considerable amount of literature devoted to this problem, mainly focusing on its detection, a convincing solution has not yet been found. In this article we first discuss theoretical results concerning the occurrence of the zero-power trap phenomenon. Then, we suggest and compare three ways to avoid it. Given an initial test that suffers from the zero-power trap, the method we recommend for practice leads to a modified test whose power converges to one as the correlation gets very strong. Furthermore, the modified test has approximately the same power function as the initial test, and thus approximately preserves all of its optimality properties. We also provide some numerical illustrations in the context of testing for network generated correlation.

1 Introduction

Testing whether the errors in a regression model are uncorrelated is a standard problem in econometrics. For many forms of correlation under the alternative there are well-established tests available. Two prominent examples are the Durbin-Watson test for serial autocorrelation, and the Cliff-Ord test for spatial autocorrelation. Nevertheless, this type of testing problem is not completely solved, not even in the Gaussian case. This is partly due to the fact that tests for correlation, including the well-established tests mentioned before, do not always behave as they ideally should in finite samples: Whereas the size of most tests can be easily controlled, at least under suitable distributional assumptions such as Gaussianity, their power function can attain very small values in regions of the alternative where the correlation is very strong. This, however, does not match with the intuition that strong correlations should be easily detectable from the data, i.e., that the power of a test for correlation should be close to one if the degree of correlation in the errors is very strong.

That the power function of a test for correlation can drop to zero as the correlation increases was first formally established in Krämer (1985), who considered the power function of the Durbin-Watson test in testing for serial autocorrelation. The results in Krämer (1985) were extended in later work by Zeisel (1989), Krämer and Zeisel (1990) and Löbus and Ritter (2000). Kleiber and Krämer (2005) obtained similar results for the Durbin-Watson test when the disturbances are fractionally integrated. Krämer (2005) proved related results for Cliff-Ord-type tests in case the regression errors are spatially autocorrelated. A unifying general theory that neither relies on the specific form of correlation nor on very special structural properties of the tests was developed recently in Martellosio (2010) and Preinerstorfer and Pötscher (2017). We refer the interested reader to the latter articles for formal results and a thorough discussion of the literature.

The major practical value of the just mentioned articles is of a diagnostic nature: they provide conditions which depend on observable quantities only and which let a user decide whether a particular test is subject to the zero-power trap, i.e., whether its power function drops to zero as the correlation increases. This is important, because if it turns out that an initial test is subject to this trap, one may want to use another test. However, one is then confronted with the problem of finding a test that avoids the zero-power trap. One complication is as follows: Typically, the initial test was chosen for a reason, i.e., for its “optimal” power properties in certain regions of the parameter space (think of a locally best invariant test). In such situations, one would not just like to use some other test that avoids the zero-power trap. Much more likely, one would prefer to slightly modify the initial test in such a way that its optimality properties are preserved, at least approximately, but such that its modified version does not suffer from the zero-power trap. Compared to the amount of literature that concentrates on deriving diagnostic tools for detecting the zero-power trap, the attention that has been paid to the question how one can construct tests which do not suffer from the zero-power trap is much less. Furthermore, it is not clear how to obtain said “optimality-preserving” modifications. The main contribution of the present article is to fill this gap. In the following paragraphs we provide an overview of the article’s structure together with a more detailed summary of our contributions.

In Section 2 we introduce the framework: the model and the testing problem, some notational conventions and an important class of tests. In Section 3 we formally define the zero-power trap phenomenon, obtain some sufficient conditions for it from results in Preinerstorfer and Pötscher (2017), and then consider in our general framework the question how often, i.e., for “how many” design matrices, the zero-power trap actually arises. We answer this question in Propositions 3.4 and 3.6. The former proposition proves (and generalizes) an observation already made in the discussion section of Krämer (1985). The latter proposition is obtained by generalizing an argument in Martellosio (2012), who considered the same question in a spatial autoregressive setting. Essentially, these two propositions show (for the tests based on the specific family of test statistics and the corresponding critical values considered) respectively that (i) the zero-power trap arises for generic design matrices (i.e., up to a Lebesgue null set of exceptional matrices) for small enough critical values; and (ii) for any critical value that leads to a size in $(0,1)$ there exists an open set of design matrices for which the zero-power trap arises.

In Section 4 we present three ways to avoid the zero-power trap: In Section 4.1 we briefly discuss a test for which Preinerstorfer and Pötscher (2017) have shown that it does not suffer from the zero-power trap. This test typically does not have very favorable power properties, apart from the fact that it avoids the zero-power trap. We shall mainly use it later as a building block in our construction of “optimality-preserving” tests. In Section 4.2 we discuss tests that incorporate artificial regressors to avoid the zero-power trap. The suggestion of adding artificial regressors to the regression and to use “optimal” tests in this expanded model is present already in Krämer (1985), who observed numerically that adding the intercept to a regression without intercept helps to avoid the zero-power trap for the Durbin-Watson test. Our theoretical results in Section 4.2 exploit results in Preinerstorfer and Pötscher (2017), and are related to the methods in Preinerstorfer and Pötscher (2016) and Preinerstorfer (2017), who considered the construction of tests with good size and power properties for testing restrictions on the regression coefficient vector. While the tests in Section 4.2 are “optimality-preserving” to some extent (more specifically they often have the same optimality property as initial tests, but within a smaller class of tests), it turns out that this solution to the zero-power trap is not ideal. For example, the power function of these tests does not increase to one as the strength of the correlation increases (which is the case for the approach outlined in Section 4.1).

In Section 4.3 we construct optimality-preserving modifications avoiding the zero-power-trap out of an initial test that suffers from the zero-power trap. Our approach overcomes the limitations of the approaches discussed in Sections 4.1 and 4.2. In particular, our method leads to tests that have approximately the same power properties as the initial test. Furthermore, their power converges to one as the strength of the correlation increases. The construction is inspired by the power enhancement principle of Fan et al. (2015) in the formulation used in Section 3 of Kock and Preinerstorfer (2017). The basic idea of this principle is to improve the asymptotic power of an initial test by using another test, a power enhancement component, which has better asymptotic power properties than the initial test in certain regions of the alternative. Since the theory in Fan et al. (2015) and Kock and Preinerstorfer (2017) is asymptotic, and the present article is concerned exclusively with finite sample properties, their results do not apply here. Nevertheless, we can adapt the underlying heuristic to our context: given an initial test that suffers from the zero-power trap, but has favorable power properties in other regions of the alternative, we “combine” this initial test with the test from Section 4.1 to obtain an “enhanced” test.

In Section 5 we compare the approaches for avoiding the zero-power trap discussed in Section 4 numerically. We reconsider an example in Krämer (2005) in which the Cliff-Ord test turns out to suffer from the zero-power trap. Section 6 concludes. All proofs are collected in Appendices A-C.

2 Framework

In the present section we introduce the model, the testing problem and some notation, and we discuss an important class of tests. Most of the notational conventions and terminology we use are standard, and coincide to a large extent with the ones in Preinerstorfer and Pötscher (2017). We repeat them here for the convenience of the reader.

2.1 Model and testing problem

We consider the linear model

[TABLE]

where $X\in\mathbb{R}^{n\times k}$ is a non-stochastic matrix of rank $k$ with $0<k<n$ , and where $\beta\in\mathbb{R}^{k}$ is the regression coefficient vector. The disturbance vector $\mathbf{u}$ is assumed to be Gaussian with mean zero and covariance matrix $\sigma^{2}\Sigma(\rho)$ . Here $\Sigma(.)$ is a known function from $[0,a)$ to the set of symmetric and positive definite $n\times n$ matrices, and $a$ is a prespecified positive real number. Without loss of generality we assume throughout that $\Sigma(0)$ equals the identity matrix $I_{n}$ . The parameters $\beta\in\mathbb{R}^{k}$ , $\sigma\in(0,\infty)$ and $\rho\in[0,a)$ are unknown.

The Gaussianity assumption could be relaxed considerably. It is imposed mainly to avoid technical conditions that do not deliver deeper insights into the problem. For example, we could replace the Gaussianity assumption by the assumption that the distribution of the error vector $\mathbf{u}$ is elliptically symmetric without changing any of our results. This and other generalizations are discussed in detail in Section 3 of Preinerstorfer and Pötscher (2017).

Denoting the Gaussian probability measure with mean $X\beta$ and covariance matrix $\sigma^{2}\Sigma(\rho)$ by $P_{\beta,\sigma,\rho}$ , we see that the model (1) induces the parametric family of distributions

[TABLE]

on the sample space $\mathbb{R}^{n}$ equipped with its Borel $\sigma$ -algebra. The expectation operator with respect to (w.r.t.) $P_{\beta,\sigma,\rho}$ will be denoted by $E_{\beta,\sigma,\rho}$ . Note that the set of probability measures in the previous display is dominated by Lebesgue measure $\mu_{\mathbb{R}^{n}}$ on the Borel sets of $\mathbb{R}^{n}$ , because $\Sigma(\rho)$ is positive definite for every $\rho\in[0,a)$ by assumption.

In the family of distributions (2) we are interested in the testing problem $\rho=0$ against $\rho>0$ . More precisely, the testing problem is

[TABLE]

with the implicit understanding that always $\rho\in[0,a)$ . In this testing problem the parameter $\rho$ is the target of inference, and the regression coefficient vector $\beta$ and the parameter $\sigma$ are nuisance parameters.

Two specific examples that received a considerable amount of attention in the econometrics literature and which fit into the above framework are testing for positive serial autocorrelation and testing for spatial autocorrelation, cf. Examples 2.1 and 2.2 in Preinerstorfer and Pötscher (2017) for details and a discussion of related literature. See also Section 5 below for more information on testing for spatial autocorrelation and related numerical results.

2.2 Notation, invariance and an important class of tests

2.2.1 Notation

All matrices we shall consider are real matrices, the transpose of a matrix $A$ is denoted by $A^{\prime}$ , and the space spanned by the columns of $A$ is denoted by $\mathop{\mathrm{s}pan}(A)$ . Given a linear subspace $L$ of $\mathbb{R}^{n}$ , the symbol $\Pi_{L}$ denotes the orthogonal projection onto $L$ , and $L^{\bot}$ denotes the orthogonal complement of $L$ . Given an $n\times m$ matrix $Z$ of rank $m$ with $0\leq m<n$ , we denote by $C_{Z}$ a matrix in $\mathbb{R}^{(n-m)\times n}$ such that $C_{Z}C_{Z}^{\prime}=I_{n-m}$ and $C_{Z}^{\prime}C_{Z}=\Pi_{\mathop{\mathrm{s}pan}(Z)^{\bot}}$ where $I_{r}$ denotes the identity matrix of dimension $r$ . We observe that every matrix whose rows form an orthonormal basis of $\mathop{\mathrm{s}pan}(Z)^{\bot}$ satisfies these two conditions and vice versa. Hence, any two choices for $C_{Z}$ are related by premultiplication by an orthogonal matrix. Let $l$ be a positive integer. If $A$ is an $l\times l$ matrix and $\lambda\in\mathbb{R}$ is an eigenvalue of $A$ we denote the corresponding eigenspace by $\mathop{\mathrm{E}ig}\left(A,\lambda\right)$ . The eigenvalues of a symmetric matrix $B\in\mathbb{R}^{l\times l}$ ordered from smallest to largest and counted with their multiplicities are denoted by $\lambda_{1}(B),\ldots,\lambda_{l}(B)$ . We shall sometimes denote $\lambda_{1}(B)$ by $\lambda_{\min}(B)$ , and $\lambda_{l}(B)$ by $\lambda_{\max}(B)$ . Lebesgue measure on the Borel $\sigma$ -algebra of $\mathbb{R}^{n\times l}$ shall be denoted by $\mu_{\mathbb{R}^{n\times l}}$ , and Pr is used as a generic symbol for a probability measure. The Euclidean norm of a vector is denoted by $\|.\|$ , a symbol that is also used to denote a matrix norm.

2.2.2 Invariance, an important class of tests, and size-controlling critical values

Given a matrix $Z\in\mathbb{R}^{n\times m}$ with column rank $m$ and where $1\leq m<n$ , define the group of bijective transformations (the group action being composition of functions)

[TABLE]

where $g_{\gamma,\theta}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ denotes the function $y\mapsto\gamma y+Z\theta$ .

Under our distributional assumptions (and if additionally all parameters of the model are identifiable) the testing problem in Equation (3) is invariant w.r.t. the group $G_{X}$ (cf. Section 6 in Lehmann and Romano (2005)). It thus appears reasonable to consider tests that are $G_{X}$ -invariant, a property shared by most commonly used tests. Recall that a function $f$ defined on the sample space (e.g., a test or a test statistic) is called invariant w.r.t. $G_{X}$ if and only if for every $y\in\mathbb{R}^{n}$ and every $g_{\gamma,\theta}\in G_{X}$ it holds that $f(y)=f(g_{\gamma,\theta}(y))$ . A subset $A$ of $\mathbb{R}^{n}$ will be called invariant w.r.t. $G_{X}$ if the indicator function $\mathbf{1}_{A}$ is $G_{X}$ -invariant.

In addition to being $G_{X}$ -invariant, most tests for (3) used in practice are non-randomized, i.e., they are indicator functions of Borel sets – their corresponding rejection regions. An important class of such tests is based on rejection regions of the form

[TABLE]

where $c\in\mathbb{R}$ is a critical value and the test statistic

[TABLE]

Here $B\in\mathbb{R}^{(n-k)\times(n-k)}$ is a symmetric matrix, which typically depends on $X$ and the function $\Sigma$ . Recall that the matrix ${C_{X}}$ satisfies ${C_{X}C}^{\prime}{{}_{X}=I}_{n-k}$ and ${C}^{\prime}{{}_{X}C_{X}=\Pi}_{\text{$ \mathop{\mathrm{s}pan} $}(X)^{\bot}}$ (cf. Section 2.2.1). Clearly, the test statistic $T_{B}$ is $G_{X}$ -invariant. Note furthermore that in case $\lambda_{1}(B)=\lambda_{n-k}(B)$ the test statistic $T_{B}$ is constant everywhere on $\mathbb{R}^{n}$ . Therefore, such a choice of $B$ is uninteresting for practical purposes. Note also that assigning the value $\lambda_{1}(B)$ (instead of any other value) to the test statistic on $\mathop{\mathrm{s}pan}(X)$ has no effect on rejection probabilities, because $P_{\beta,\sigma,\rho}$ is absolutely continuous w.r.t. $\mu_{\mathbb{R}^{n}}$ for every $\beta\in\mathbb{R}^{k}$ , $\sigma\in(0,\infty)$ and $\rho\in[0,a)$ , and $\mathop{\mathrm{s}pan}(X)$ being of dimension $k<n$ implies $\mu_{\mathbb{R}^{n}}(\mathop{\mathrm{s}pan}(X))=0$ .

The following remark discusses two particularly important choices of $B$ :

Remark 2.1.

Under regularity conditions and excluding degenerate cases, point-optimal invariant (w.r.t. $G_{X}$ ) tests and locally best invariant (w.r.t. $G_{X}$ ) tests for the testing problem (3) reject for large values of a test statistic $T_{B}$ as in Equation (5):

(a)

Point-optimal invariant tests against the alternative $\bar{\rho}\in(0,a)$ are obtained for $B=-\left(C_{X}\Sigma(\bar{\rho})C_{X}^{\prime}\right)^{-1}$ . 2. (b)

Locally best invariant tests are obtained for $B=C_{X}\dot{\Sigma}(0)C_{X}^{\prime}$ , for $\dot{\Sigma}(0)$ the derivative of $\Sigma$ at $\rho=0$ , ensured to exist under the aforementioned regularity conditions, see, e.g., King and Hillier (1985).

Note that a test statistic $T_{B}$ based on any of the two matrices $B$ in the preceding enumeration does not depend on the specific choice of $C_{X}$ , as any two choices of $C_{X}$ differ only by premultiplication of an orthogonal matrix. However, for matrices $B$ of a different form than (a) or (b) the test statistic $T_{B}$ may also depend on the choice of ${C_{X}}$ , a dependence which is typically suppressed in our notation.

The main focus of the present article concerns power properties of tests based on a test statistic as in (5) for the testing problem (3). Before investigating power properties of a test, one needs to ensure that its size does not exceed a given value of significance $\alpha$ . While this can be a nontrivial problem in general, achieving size control through the choice of a proper critical value turns out to be an easy task here. More specifically, the following lemma shows that exact size control for tests based on a test statistic $T_{B}$ introduced in Equation (5) is possible at all levels of significance in the leading case $\lambda_{1}(B)<\lambda_{n-k}(B)$ . The subsequent remark discusses numerical aspects.

Lemma 2.2.

Let $B\in\mathbb{R}^{(n-k)\times(n-k)}$ be symmetric and such that $\lambda_{1}(B)<\lambda_{n-k}(B)$ . Then, there exists a (unique) function $\kappa:[0,1]\to[\lambda_{1}(B),\lambda_{n-k}(B)]$ such that for every $\alpha\in[0,1]$

[TABLE]

Furthermore, $\kappa$ is a strictly decreasing and continuous bijection.

Remark 2.3.

The rejection probabilities of a $G_{X}$ -invariant test for (3) do not depend on the parameters $\beta$ and $\sigma$ (cf. Remark 2.3 in Preinerstorfer and Pötscher (2017)). As a consequence, the exact critical value $\kappa(\alpha)$ from Lemma 2.2 can easily be obtained numerically: To this end one can exploit the well-known fact that for every $c\in\mathbb{R}$ the rejection probability $P_{\beta,\sigma,0}(\Phi_{B,c})=P_{0,1,0}(\Phi_{B,c})$ can be rewritten as the probability that the quadratic form

[TABLE]

where $\mathbf{G}$ is an $(n-k)$ -variate Gaussian random vector with mean zero and covariance matrix $I_{n-k}$ . This probability can be determined efficiently through an application of standard algorithms, e.g., the algorithm by Davies (1980). The critical value $\kappa(\alpha)$ can then be obtained numerically by simply using a root-finding algorithm to determine the unique root $\kappa(\alpha)$ of $c\mapsto P_{0,1,0}(\Phi_{B,c})-\alpha$ on $[\lambda_{1}(B),\lambda_{n-k}(B)]$ .

3 The zero-power trap in testing for correlation

3.1 Definition and sufficient conditions

In the sequel, a test $\varphi:\mathbb{R}^{n}\to[0,1]$ (measurable) for testing problem (3) is said to be subject to (or suffer from) the zero-power trap, if there exist $\beta\in\mathbb{R}^{k}$ and $\sigma\in(0,\infty)$ such that

[TABLE]

that is, if the power function of $\varphi$ can get arbitrarily close to [math] as the strength of the correlation in the data, measured in terms of $\rho$ , increases. Recall from Remark 2.3 that if $\varphi$ is $G_{X}$ -invariant, which is the case for most tests considered in this article, then $E_{\beta,\sigma,\rho}(\varphi)$ does not depend on $\beta$ and $\sigma$ . In this case, if Equation (8) holds for some $\beta\in\mathbb{R}^{k}$ and some $\sigma\in(0,\infty)$ , it holds for every $\beta\in\mathbb{R}^{k}$ and every $0<\sigma<\infty$ .

A set of sufficient conditions that allows one to conclude whether a test $\varphi$ is subject to the zero-power trap was developed in Martellosio (2010) and Preinerstorfer and Pötscher (2017). The underlying effect leading to (8) described in the latter article is a concentration effect in the (rescaled) distribution $P_{\beta,\sigma,\rho}$ when $\rho$ is close to $a$ . Preinerstorfer and Pötscher (2017) obtained their sufficient conditions under the following property of the function $\Sigma$ (cf. also Assumption 1 in Preinerstorfer and Pötscher (2017) and the discussion there showing that this condition is weaker than the one previously used by Martellosio (2010)):

Assumption 1.

$\lambda_{n}^{-1}(\Sigma(\rho))\Sigma(\rho)\to ee^{\prime}$ as $\rho\to a$ for some $e\in\mathbb{R}^{n}$ .

For the convenience of the reader and for later use, we shall now formally state two immediate consequences of results in Preinerstorfer and Pötscher (2017). They provide sufficient conditions for the zero-power trap under Assumption 1. Specializing Theorem 2.7 and Remark 2.8 in Preinerstorfer and Pötscher (2017) one obtains the following “high-level”-result.

Theorem 3.1.

Suppose Assumption 1 holds. Let $\varphi$ be a $G_{X}$ -invariant test that is continuous at $e$ and satisfies $\varphi(e)=0$ , where $e$ is the vector figuring in Assumption 1. Then

[TABLE]

In particular, if $\varphi=\mathbf{1}_{W}$ holds for some $G_{X}$ -invariant Borel set $W\subseteq\mathbb{R}^{n}$ , then (9) holds if $e$ is not in the closure of $W$ .

For the test with rejection region $\Phi_{B,\kappa(\alpha)}$ as discussed in Section 2.2.2 and where $\kappa(\alpha)$ is defined through Lemma 2.2 one obtains the following result from Corollary 2.21 of Preinerstorfer and Pötscher (2017).

Theorem 3.2.

Suppose Assumption 1 holds and $e\notin\mathop{\mathrm{s}pan}(X)$ , where $e$ is the vector figuring in Assumption 1. Let $B\in\mathbb{R}^{(n-k)\times(n-k)}$ be symmetric and such that $\lambda_{1}(B)<\lambda_{n-k}(B)$ . Then, for every $\alpha\in(0,1)$ such that $T_{B}(e)<\kappa(\alpha)$ we have

[TABLE]

Note that the sufficient conditions for the zero-power trap phenomenon pointed out in Theorems 3.1 and 3.2 depend on observable quantities only, and that they are thus checkable by the user. Therefore, a researcher interested in testing problem (3) can use these conditions to check whether or not the given test suffers from the zero-power trap before actually using a test. In particular, one can decide not to use a test that suffers from the zero-power trap. Before addressing the question how to avoid the zero-power trap, which was raised already in the Introduction, we briefly pay some attention to the following question: “how often” does the zero-power trap actually arise? More specifically, in the important class of tests $\Phi_{B,c}$ introduced in Section 2.2.2, and most notably the tests discussed in Remark 2.1, the following question arises: For “how many” design matrices $X$ does the zero-power trap arise? Answering this question is the content of the next section.

3.2 For “how many” design matrices does the zero-power trap arise?

We shall focus on the class of tests with rejection regions $\Phi_{B(X),c}$ introduced in Section 2.2.2. Since the question in the section title depends on the design matrix $X$ , which is otherwise held fixed in this article, we shall make the dependence of $B$ on $X$ explicit by writing $B(X)$ . Furthermore, we shall also write $P_{\beta,\sigma,\rho}^{X}$ to emphasize its dependence on the design matrix $X$ . In our first attempt to answer the question under consideration, we shall use the following simple consequence of Lemma 2.2 and Theorem 3.2, which provides conditions on $X$ under which Equation (10) holds for all “small” levels $\alpha$ .

Lemma 3.3.

Suppose Assumption 1 holds and let $e$ denote the vector figuring in that assumption. Let $B$ be a function from the set of full column rank $n\times k$ matrices to the set of symmetric $(n-k)\times(n-k)$ -dimensional matrices. If an $n\times k$ matrix $X$ satisfies

[TABLE]

then $\lambda_{1}(B(X))<\lambda_{n-k}(B(X))$ , $P^{X}_{0,1,0}(\Phi_{B(X),T_{B(X)}(e)})>0$ and Equation (10) holds for every $\alpha\in(0,P^{X}_{0,1,0}(\Phi_{B(X),T_{B(X)}(e)}))$ .

For a class of functions $X\mapsto B(X)$ that includes the ones discussed in Remark 2.1 we shall now show that condition (11) is generically satisfied, unless the matrix $B(X)$ has a very exceptional form. The result is established under a restriction concerning the eigenspace corresponding to the largest eigenvalue of $B(X)$ .

Proposition 3.4.

Suppose that $k<n-1$ and that Assumption 1 holds. Let $B$ be a function from the set of full column rank $n\times k$ matrices to the set of symmetric $(n-k)\times(n-k)$ -dimensional matrices. Let $M\in\mathbb{R}^{n\times n}$ be a symmetric matrix that can not be written as $c_{1}I_{n}+c_{2}ee^{\prime}$ for real numbers $c_{1},c_{2}$ with $c_{2}\geq 0$ , where $e$ is the vector figuring in Assumption 1. Suppose further that for every $X\in\mathbb{R}^{n\times k}$ of full column rank a $C_{X}\in\mathbb{R}^{(n-k)\times n}$ satisfying $C_{X}C_{X}^{\prime}=I_{n-k}$ and $C_{X}^{\prime}C_{X}=\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}$ can be chosen such that

[TABLE]

Then, up to a $\mu_{\mathbb{R}^{n\times k}}$ -null set of exceptional matrices, every $X\in\mathbb{R}^{n\times k}$ satisfies (11). An immediate consequence is as follows: Given $\alpha\in(0,1)$ denote by $\mathscr{X}(\alpha;B)\subseteq\mathbb{R}^{n\times k}$ the set of all $X\in\mathbb{R}^{n\times k}$ of rank $k$ such that $\lambda_{1}(B(X))<\lambda_{n-k}(B(X))$ and such that

[TABLE]

Then, $\mathscr{X}(\alpha_{2};B)\subseteq\mathscr{X}(\alpha_{1};B)$ holds for $0<\alpha_{1}\leq\alpha_{2}<1$ , and for any sequence $\alpha_{m}$ in $(0,1)$ converging to [math] the complement of $\bigcup_{m\in\mathbb{N}}\mathscr{X}(\alpha_{m};B)$ is contained in a $\mu_{\mathbb{R}^{n\times k}}$ -null set.

Remark 3.5.

Note that for $B(X)=C_{X}\dot{\Sigma}(0)C_{X}^{\prime}$ Condition (12) in Proposition 3.4 is trivially satisfied with $M=\dot{\Sigma}(0)$ . For $B(X)=-\left(C_{X}\Sigma(\bar{\rho})C_{X}^{\prime}\right)^{-1}$ and $\bar{\rho}\in(0,a)$ it is easy to see that Condition (12) is satisfied with $M=\Sigma(\bar{\rho})$ . Therefore, if for any of these two specific choices the additional condition holds that the respective $M$ can not be written as $c_{1}I_{n}+c_{2}ee^{\prime}$ for real numbers $c_{1},c_{2}$ where $c_{2}\geq 0$ is satisfied, then Proposition 3.4 applies.

Proposition 3.4 shows that tests based on $T_{B(X)}$ suffer from the zero-power trap for “most” design matrices $X$ , at least for small choices of $\alpha$ . The discussion section of Krämer (1985) contains a corresponding statement (without proof) in a special case.

Choosing $\alpha$ small is not completely uncommon in practice: Due to the fact that testing for correlation is often just one part of the econometric analysis, the actual level $\alpha$ employed in this test can be quite small. One example is specification testing. Another example is the situation where tests for correlation are “inverted” to build a confidence interval for $\rho$ , which is then used for a Bonferroni-type construction of a data-dependent critical value of another test (cf. Leeb and Pötscher (2017) for further information concerning such critical values).

Nevertheless, the question remains as to how “large” the set $\mathscr{X}(\alpha;B)$ actually is for a fixed $\alpha$ , such as the conventional $\alpha=.05$ or $\alpha=.01$ . For example, Proposition 3.4 does not tell us whether or not the set of design matrices $\mathscr{X}(.01;B)$ is empty. Similarly, one can ask if $\mathscr{X}(.01;B)$ contains an open set, or if it has positive $\mu_{\mathbb{R}^{n\times k}}$ measure? The latter questions have already been considered in detail in the main results of Martellosio (2012) for point-optimal invariant and locally best invariant tests in the important context of spatial autoregressive regression models. Adopting his proof strategy, we establish the following proposition. The argument requires a different assumption on $B$ than the one used in Proposition 3.4. First, the condition used now concerns the eigenspace of $B(X)$ corresponding to its smallest eigenvalue (as opposed to the condition on the largest eigenvalue used in Proposition 3.4). Second, continuity conditions are imposed, which are required for limiting arguments in the proof. As discussed in Remark 3.7 below, the assumptions are again satisfied in the leading choices for $B$ discussed in Remark 2.1.

Proposition 3.6.

Suppose that $k<n-1$ and that Assumption 1 holds. Let $B$ be a function from the set of full column rank $n\times k$ matrices to the set of symmetric $(n-k)\times(n-k)$ -dimensional matrices. Suppose there exists a function $F$ from the set of $(n-k)\times(n-k)$ matrices to itself, such that for every $X\in\mathbb{R}^{n\times k}$ of full column rank $B(X)=F(C_{X}MC_{X}^{\prime})$ holds for a suitable choice of $C_{X}\in\mathbb{R}^{(n-k)\times n}$ satisfying $C_{X}C_{X}^{\prime}=I_{n-k}$ and $C_{X}^{\prime}C_{X}=\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}$ , and for $M\in\mathbb{R}^{n\times n}$ a symmetric matrix that can not be written as $c_{1}I_{n}+c_{2}ee^{\prime}$ for real numbers $c_{1},c_{2}$ where $c_{2}\geq 0$ . Here $e$ is the vector figuring in Assumption 1. Suppose further that $F$ is continuous at every element $A$ , say, of the closure of $\{C_{X}MC_{X}^{\prime}:X\in\mathbb{R}^{n\times k},~{}\mathop{\mathrm{r}ank}(X)=k\}\subseteq\mathbb{R}^{(n-k)\times(n-k)}$ , and that for every such $A$ we have

[TABLE]

Define $\mathscr{X}(\alpha;B)\subseteq\mathbb{R}^{n\times k}$ as in Proposition 3.4. Then, the following holds:

$\mathscr{X}(\alpha;B)\neq\emptyset$ * holds for every $\alpha\in(0,1)$ ;* 2. 2.

suppose that for every $z\in\mathbb{R}^{n}$ the function $X\mapsto T_{B(X),C_{X}}(z)$ is continuous at every $X\in\mathbb{R}^{n\times k}$ of full column rank such that $z\notin\mathop{\mathrm{s}pan}(X)$ . Then, for every $\alpha\in(0,1)$ the interior of $\mathscr{X}(\alpha;B)$ is nonempty (and thus has positive $\mu_{\mathbb{R}^{n\times k}}$ measure).

Remark 3.7.

Similar to Remark 3.5 we note that Proposition 3.6 can be applied to $B(X)=C_{X}\dot{\Sigma}(0)C_{X}^{\prime}$ (with $M=\dot{\Sigma}(0)$ and $F$ the identity function), or to $B(X)=-(C_{X}\Sigma(\overline{\rho})C_{X}^{\prime})^{-1}$ , where $\overline{\rho}\in(0,a)$ , (with $M=\Sigma(\overline{\rho})$ and $F$ the function $A\mapsto-A^{-1}$ , noting that this function satisfies the continuity requirement as $\Sigma(\overline{\rho})$ is positive definite) provided that the corresponding $M$ matrix is not of the exceptional form $c_{1}I_{n}+c_{2}ee^{\prime}$ for $c_{2}\geq 0$ . It is not difficult to show that the continuity requirement in Part 2 of the proposition is satisfied for these two choices of $B$ . For $B(X)=C_{X}\dot{\Sigma}(0)C_{X}^{\prime}$ this is trivial. For $B(X)=-(C_{X}\Sigma(\overline{\rho})C_{X}^{\prime})^{-1}$ , where $\overline{\rho}\in(0,a)$ , an argument is given in Appendix B. We can hence conclude that unless $\dot{\Sigma}(0)$ or $\Sigma(\overline{\rho})$ , respectively, is of the form $c_{1}I_{n}+c_{2}ee^{\prime}$ for some nonnegative $c_{2}$ , the test $\Phi_{B(X),\kappa(\alpha)}$ suffers from the zero-power trap for every $\alpha\in(0,1)$ for every $X$ in a non-empty open set of design matrices.

Remark 3.8.

We emphasize that Propositions 3.4 and 3.6 do not apply in case $M=c_{1}I_{n}+c_{2}ee^{\prime}$ holds for real numbers $c_{1},c_{2}$ where $c_{2}\geq 0$ . On the one hand, it is clear that in case $c_{2}=0$ a test as in these two propositions with $M=c_{1}I_{n}$ trivially breaks down, as the corresponding test statistics are then constant. But on the other hand, as already observed (for the special case $c_{1}=0$ and $c_{2}=1$ ) in Preinerstorfer and Pötscher (2017) in the discussion preceding their Remark 2.27, using tests based on $M=c_{1}I_{n}+c_{2}ee^{\prime}$ for a $c_{2}>0$ indeed presents an opportunity to avoid the zero power trap. This will be discussed more formally in Section 4.1.

From the results in the present section we learn that for tests that satisfy certain structural properties, the zero power trap arises for generic design matrices for $\alpha$ small enough. Furthermore, for every $\alpha$ there exists (under suitable assumptions) a nonempty open set of design matrices every element of which suffers from the zero-power trap. We would like to emphasize, however, that these results do not rule out the possibility that for a given $X$ the actual level $\alpha$ needed such that the zero-power trap arises can be low (far outside the commonly used range of levels), or that given $\alpha$ the open set of design matrices for which the zero-power trap occurs is “small”. Numerical results that illustrate the “practical severity” of the zero-power trap in spatial regression models are provided in Section 3 of Krämer (2005), in particular his Table 1 is very interesting in this context, and further discussion and examples can be found in Martellosio (2010) and Martellosio (2012). These results seem to suggest that the zero-power trap occurs frequently for commonly used levels of significance in case $n-k$ is “small”, i.e., in “high-dimensional” scenarios, whereas if $n-k$ is large the zero-power trap does not appear that frequently. However, this also depends on the dependence structure.

4 Avoiding the zero-power trap

Having provided some context and motivation, we now discuss three ways to avoid the zero-power trap: In Section 4.1 we expand on the observation just made in Remark 3.8. The strategy discussed in Section 4.2 is based on an idea involving artificial regressors. The method we recommend, however, builds on Section 4.1 and is introduced in Section 4.3. Our suggestion tries to overcome sub-optimality properties of the other methods. As discussed in the Introduction, the idea underlying our approach can be interpreted as a finite sample variant of the power enhancement principle of Fan et al. (2015).

4.1 Tests based on $T_{B}$ with $B=C_{X}ee^{\prime}C_{X}^{\prime}$

As discussed in Remark 3.8, tests based on the test statistic $T_{B}$ with $B=C_{X}ee^{\prime}C_{X}^{\prime}$ do not satisfy the assumptions underlying Propositions 3.4 and 3.6. Hence, these two propositions do not let us conclude anything concerning the question “how often” the zero-power trap occurs for such tests. It turns out that these tests do not suffer from the zero-power trap for any $\alpha\in(0,1)$ in case the additional condition $e\notin\mathop{\mathrm{s}pan}(X)$ holds (note that if $e\in\mathop{\mathrm{s}pan}(X)$ holds, the test statistic $T_{B}$ with $B=C_{X}ee^{\prime}C_{X}^{\prime}$ is useless as it equals [math] for every $y\in\mathbb{R}^{n}$ ). As pointed out in Remark 3.8, this was already noted in Preinerstorfer and Pötscher (2017). For later use in Section 4.3 we state a corresponding result (which is an immediate consequence of Part 1 of Proposition 2.26 in Preinerstorfer and Pötscher (2017) together with $G_{X}$ -invariance of $T_{B}$ and our Lemma 2.2):

Theorem 4.1.

Suppose that $k<n-1$ , that Assumption 1 holds and that $e\notin\mathop{\mathrm{s}pan}(X)$ , where $e$ is the vector figuring in Assumption 1. Then, for every $\alpha\in(0,1)$ , every $\beta\in\mathbb{R}^{k}$ and every $\sigma\in(0,\infty)$

[TABLE]

From this result we conclude that in case $e\notin\mathop{\mathrm{s}pan}(X)$ and whenever a test $\varphi$ with size $\alpha$ is subject to the zero-power trap, one can alternatively use the test with rejection region $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ instead, which does not suffer from the zero-power trap. Moreover, the power of the test $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ even increases to $1$ as $\rho\to a$ . This is a desirable property as it matches the intuition that strong correlations should be easily detectable from the data.

While avoiding the zero-power trap problem, the test $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ suffers from one major disadvantage: the power function of $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ can be, and often will be, quite low for values $\rho\in(0,a)$ distant from $a$ . If the initial test $\varphi$ , which was dismissed because it is subject to the zero-power trap, was chosen because of its good power properties in this region of the alternative, the test $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ will then not constitute a convincing alternative. This is illustrated in the example discussed in Section 5. A method that tries to take optimality properties of the initial test into account, at least for the classes of tests discussed in Remark 2.1, is discussed next.

4.2 Tests based on artificial regressors

The sufficient condition for the zero-power trap in Theorem 3.2 requires that the vector $e$ from Assumption 1 is not an element of $\mathop{\mathrm{s}pan}(X)$ . While this of course does not prove that the zero-power trap does not arise if $e\in\mathop{\mathrm{s}pan}(X)$ , this indeed turns out to be the case under an additional assumption (cf. Corollary 2.22 in Preinerstorfer and Pötscher (2017)). In this section we shall exploit this fact. The method of avoiding the zero-power trap we discuss in this section “enforces” the condition $e\in\mathop{\mathrm{s}pan}(X)$ . More specifically, it is based on adding the vector $e$ from Assumption 1 as an “artificial” regressor to the design matrix (if it is not already an element of $\mathop{\mathrm{s}pan}(X)$ ), and from then constructing tests as if this artificially expanded design matrix was the true one. As discussed in the Introduction, the idea underlying the construction in the present section can be traced back to Krämer (1985).

To formally describe the artificial regressor based method in our general setting, consider a situation where a researcher initially wants to use the test $\Phi_{B,\kappa(\alpha)}$ as in Section 2.2.2 with $\lambda_{1}(B)<\lambda_{n-k}(B)$ , but discovers (e.g., by checking the sufficient conditions in Theorem 3.2) that $\Phi_{B,\kappa(\alpha)}$ suffers from the zero-power trap. Suppose further that the initial test $\Phi_{B,\kappa(\alpha)}$ has certain optimality properties (cf. Remark 2.1). The researcher does not want to completely sacrifice the optimality properties of the initial test, which prevents him from using the test just discussed in Section 4.1. Assume further that $e\notin\mathop{\mathrm{s}pan}(X)$ .

The trick now is to work with the design matrix $\bar{X}=(X,e)$ in the construction of a test statistic, assuming that $k+1<n$ . More precisely, let $\bar{B}$ be a symmetric $(n-k-1)\times(n-k-1)$ matrix (cf. Remark 4.2 below), and define the adjusted test statistic

[TABLE]

Under the additional assumption that $\lambda_{1}(\bar{B})<\lambda_{n-k-1}(\bar{B})$ , one obtains111To obtain this statement one needs to apply Lemma 2.2 to model (1) but with design matrix $\bar{X}$ instead of $X$ . Note that this leads to an “enlarged” model that encompasses the true model as a submodel; and that the distributions satisfying the null hypothesis in the true model also satisfy the null hypothesis in the enlarged model. from Lemma 2.2 for every $\alpha\in(0,1)$ the existence and uniqueness of a critical value $\bar{\kappa}(\alpha)\in(\lambda_{1}(\bar{B}),\lambda_{n-k-1}(\bar{B}))$ , say, such that for every $\beta\in\mathbb{R}^{k}$ and every $\sigma\in(0,\infty)$ it holds that

[TABLE]

Finally, define the rejection region

[TABLE]

Remark 4.2.

We think about $\bar{B}$ as an “updated version” of $B$ , i.e., as the matrix one would use if $\bar{X}$ was the underlying design matrix. For example, if the initial matrix $B$ equals $C_{X}\dot{\Sigma}(0)C_{X}^{\prime}$ one could use $\bar{B}=C_{\bar{X}}\dot{\Sigma}(0)C_{\bar{X}}^{\prime}$ , or if the initial matrix $B=-(C_{X}\Sigma(\bar{\rho})C_{X}^{\prime})^{-1}$ one could use $\bar{B}=-(C_{\bar{X}}\Sigma(\bar{\rho})C_{\bar{X}}^{\prime})^{-1}$ . Recall that the rejection region (18) based on these two versions of $\bar{B}$ corresponds to locally best invariant tests and point-optimal invariant tests, respectively, in the model where the true design matrix is $\bar{X}$ (cf. Remark 2.1).

We shall now prove that the test with rejection region (18) does not suffer from the zero-power trap. The following result requires an additional assumption on $\Sigma(.)$ . This is Assumption 4 in Preinerstorfer and Pötscher (2017) to which we refer the reader for equivalent formulations, examples and further discussion.

Assumption 2.

There exists a function $c:[0,a)\to(0,\infty)$ , a normalized vector $e\in\mathbb{R}^{n}$ , and a square root $L_{*}(.)$ of $\Sigma(.)$ such that

[TABLE]

exists in $\mathbb{R}^{n\times n}$ and such that the linear map $\Lambda$ is injective when restricted to $\mathop{\mathrm{s}pan}(e)^{\bot}$ .

The main result concerning artificial regressor based tests is as follows:

Theorem 4.3.

Suppose Assumptions 1 and 2 are satisfied with the same vector $e$ , that $e\notin\mathop{\mathrm{s}pan}(X)$ , and that $k<n-1$ . Suppose further that $\bar{B}$ is a symmetric $(n-k-1)\times(n-k-1)$ matrix such that $\lambda_{1}(\bar{B})<\lambda_{n-k-1}(\bar{B})$ . Then, for every $\alpha\in(0,1)$ , every $\beta\in\mathbb{R}^{k}$ and every $\sigma\in(0,\infty)$ it holds that

[TABLE]

where $\mathbf{G}$ denotes a Gaussian random vector with mean [math] and covariance matrix $I_{n}$ .

Theorem 4.3 shows that $\bar{\Phi}_{\bar{B},\bar{\kappa}(\alpha)}$ is not subject to the zero-power trap. However, its “limiting power” $\lim_{\rho\to a}P_{\beta,\sigma,\rho}(\bar{\Phi}_{\bar{B},\bar{\kappa}(\alpha)})=\mathrm{Pr}(\bar{T}_{\bar{B}}(\Lambda\mathbf{G})>\bar{\kappa}(\alpha))$ can in principle be low. In particular, it is always smaller than one. This is different to the behavior of the test discussed in Section 4.1, which has limiting power equal to one. Another limitation of Theorem 4.3 is its reliance on the additional Assumption 2.

Following up on the examples discussed in Remark 4.2, an advantage of passing from $\Phi_{B,\kappa(\alpha)}$ to $\Phi_{\bar{B},\bar{\kappa}(\alpha)}$ , instead of passing from $\Phi_{B,\kappa(\alpha)}$ to the test discussed in Section 4.1, is that $\Phi_{\bar{B},\bar{\kappa}(\alpha)}$ “preserves” in some sense the optimality properties of $\Phi_{B,\kappa(\alpha)}$ , but with respect to the larger group $G_{\bar{X}}$ . Note, however, that this does not imply that the power functions of $\Phi_{B,\kappa(\alpha)}$ and $\bar{\Phi}_{\bar{B},\bar{\kappa}(\alpha)}$ are “close”.

4.3 Optimality-preserving tests that avoid the zero-power trap

The starting point in this section is an (initial) family of tests $\varphi_{\alpha}:\mathbb{R}^{n}\to[0,1]$ for the testing problem (3) indexed by $\alpha\in(0,1)$ . Given $\alpha\in(0,1)$ we interpret $\varphi_{\alpha}$ as the (initial) test one would like to use because of some optimality property. That is, the power function of $\varphi_{\alpha}$

[TABLE]

is “large” for certain parameter values $(\beta,\sigma,\rho)$ in a given subset pertaining to the alternative hypothesis $\{0\}\times(0,\infty)\times(0,a)$ .

We shall suppose that the initial test $\varphi_{\alpha}$ suffers from the zero-power trap, which one would like to avoid. Ideally a test should have limiting power equal to $1$ , a property of the test in Section 4.1, but not of the test in Section 4.2. Furthermore, we would like to keep, at least approximately, the optimal power properties of $\varphi_{\alpha}$ , which was the reason why $\varphi_{\alpha}$ was considered for use initially. This is a property of the test in Section 4.2 (at least to some extent), but not of the test in Section 4.1. We shall now present an approach that achieves these two goals.

In what follows, we assume that the family of tests $\{\varphi_{\alpha}\}$ under consideration satisfies Property A, i.e., satisfies the following:

{addmargin}

[1em]2em

A.1:

For every $\alpha\in(0,1)$ the test $\varphi_{\alpha}$ is $G_{X}$ -invariant. 2. A.2:

For every $\alpha\in(0,1)$ the test $\varphi_{\alpha}$ has size $\alpha$ , i.e.,

[TABLE] 3. A.3:

For every $\alpha\in(0,1)$ and every sequence $\alpha_{m}\in[0,\alpha]$ converging to $\alpha$ we have that $\varphi_{\alpha_{m}}(y)\to\varphi_{\alpha}(y)$ holds for $\mu_{\mathbb{R}^{n}}$ -almost every $y\in\mathbb{R}^{n}$ .

To illustrate the assumption, consider the following important example:

Example 4.1.

Let $T_{B}$ be as in (5) with $B$ an $(n-k)\times(n-k)$ symmetric matrix such that $\lambda_{1}(B)<\lambda_{n-k}(B)$ . For every $\alpha\in(0,1)$ let $\kappa(\alpha)$ be the critical value from Lemma 2.2. Set $\varphi_{\alpha}$ equal to the non-randomized test with rejection region $\Phi_{B,\kappa(\alpha)}$ , i.e., $\varphi_{\alpha}:=\mathbf{1}_{\Phi_{B,\kappa(\alpha)}}$ . We already know that $T_{B}$ is $G_{X}$ -invariant, and thus $\varphi_{\alpha}$ is $G_{X}$ -invariant for every $\alpha$ . Hence A.1 is satisfied. Furthermore, from Lemma 2.2 we see that $\varphi_{\alpha}$ satisfies A.2. That A.3 is satisfied is an immediate consequence of continuity of $\kappa(.)$ , which was established in Lemma 2.2, together with the fact that for every $\alpha\in(0,1)$ the set

[TABLE]

is a $\mu_{\mathbb{R}^{n}}$ -null set; the latter is a consequence of Lemma B.4 in Preinerstorfer and Pötscher (2017), which shows that the cdf. $F$ , say, corresponding to $P_{0,1,0}\circ T_{B}$ is continuous.

Remark 4.4.

While not required in Property A, typical families $\{\varphi_{\alpha}\}$ will also satisfy the condition that for any real numbers $\alpha_{1}\leq\alpha_{2}$ in $(0,1)$ it holds for $\mu_{\mathbb{R}^{n}}$ -almost every $y\in\mathbb{R}^{n}$ that $\varphi_{\alpha_{1}}(y)\leq\varphi_{\alpha_{2}}(y)$ . For instance, this is the case for the families of tests discussed in Example 4.1 (this follows from the monotonicity property of $\kappa(.)$ established in Lemma 2.2). One obvious consequence of this condition is that if $\varphi_{\alpha_{2}}$ suffers from the zero-power trap, then $\varphi_{\alpha_{1}}$ suffers from the zero-power trap as well. Therefore, for such families, if $\varphi_{\alpha}$ suffers from the zero-power trap, there is no hope that one can easily avoid the zero-power trap by using $\varphi_{\alpha-\varepsilon}$ for some $\varepsilon>0$ (which would at least be a test whose size does not exceed $\alpha$ ).

Suppose in the following discussion that $k<n-1$ , that Assumption 1 holds and that $e\notin\mathop{\mathrm{s}pan}(X)$ . Recall from Theorem 4.1 that under these conditions the $G_{X}$ -invariant test $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ does not suffer from the zero-power trap, in fact has limiting power one, at all levels $\alpha\in(0,1)$ . Using this property, we shall now define a $G_{X}$ -invariant test that has approximately the same power properties of $\varphi_{\alpha}$ with the advantage that it has limiting power $1$ just as the test $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\alpha)}$ .

The basic idea is as follows (precise statements are provided further below): From Property A.3 one obtains that for $\varepsilon\in(0,\alpha)$ small, the power functions of $\varphi_{\alpha}$ and $\varphi_{\alpha-\varepsilon}$ are similar. Theorem 4.1 tells us that the test with rejection region $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\varepsilon)}$ has limiting power (as $\rho\to a$ ) equal to $1$ , and Lemma 2.2 shows that this test has size equal to $\varepsilon$ . Hence, we could use the $G_{X}$ -invariant test

[TABLE]

whose power function is similar to $\varphi_{\alpha}$ (at least for $\varepsilon$ small), but which has limiting power equal to one (for every $0<\varepsilon<\alpha$ ). Trivially, this test has size not greater than $\alpha$ , but potentially its size is smaller than $\alpha$ , implying some unnecessary loss in power, which one can try to avoid by decreasing $\kappa(\varepsilon)$ .

More specifically, define the $G_{X}$ -invariant test

[TABLE]

where $0<c(\alpha,\varepsilon)\leq\kappa(\varepsilon)$ is chosen to be the smallest number such that $\varphi^{*}_{\alpha,\varepsilon}$ has size equal to $\alpha$ . That such a choice of $c(\alpha,\varepsilon)$ is indeed possible is the content of the next proposition. Note that $\varphi^{*}_{\alpha,\varepsilon}$ is non-randomized if the test $\varphi_{\alpha-\varepsilon}$ is non-randomized.

Proposition 4.5.

Suppose that $k<n-1$ , that $e\in\mathbb{R}^{n}$ satisfies $e\notin\mathop{\mathrm{s}pan}(X)$ , and that the family $\{\varphi_{\alpha}\}$ satisfies Properties A.1 and A.2. Then, for every $\alpha\in(0,1)$ and every $\varepsilon\in(0,\alpha)$ there exists a $c(\alpha,\varepsilon)\in(0,\kappa(\varepsilon)]$ such that

[TABLE]

and such that for every $c^{\prime}\in(0,c(\alpha,\varepsilon))$ it holds that the supremum in the previous display is greater than $\alpha$ ; here $\kappa(\varepsilon)\in(0,\|C_{X}e\|^{2})$ denotes the unique real number such that $\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\kappa(\varepsilon)}$ has size equal to $\varepsilon$ (cf. Lemma 2.2).

Note that the critical value $c(\alpha,\varepsilon)$ can be easily determined numerically by a simple line search algorithm, cf. also Remark 2.3.

Having established that the test in Equation (23) is actually well-defined, we now prove that it does not suffer from the zero-power trap but has limiting power $1$ for any choice of $\varepsilon$ . Furthermore, we show that the power function of $\varphi^{*}_{\alpha,\varepsilon}$ approximates (even uniformly over suitable subsets of the parameter space) the power function of $\varphi_{\alpha}$ as $\varepsilon$ converges to [math]. In this sense, choosing $\varepsilon>0$ small, the test $\varphi^{*}_{\alpha,\varepsilon}$ preserves “optimal” power properties (such as point-optimal invariance, or locally best invariance, cf. Example 4.1 above) from $\varphi_{\alpha}$ at least approximately. Furthermore, the degree of approximation can be tuned by the user via $\varepsilon$ .

Theorem 4.6.

Suppose that $k<n-1$ , that Assumption 1 holds and that $e\notin\mathop{\mathrm{s}pan}(X)$ , where $e$ is the vector figuring in Assumption 1. Assume that the family $\{\varphi_{\alpha}\}$ satisfies Properties A.1 and A.2. Let $\alpha\in(0,1)$ . Then, the following holds:

For every $\varepsilon\in(0,\alpha)$ , every $\beta\in\mathbb{R}^{k}$ and every $\sigma\in(0,\infty)$ we have

[TABLE]

in particular $\varphi^{*}_{\alpha,\varepsilon}$ does not suffer from the zero-power trap. 2. 2.

Suppose that the family $\{\varphi_{\alpha}\}$ also satisfies Property A.3. Let $A\subseteq[0,a)$ be such that the closure of the set

[TABLE]

is contained in the set of positive definite symmetric matrices. Then

[TABLE]

Remark 4.7.

In the leading case $\Sigma(.)$ is a continuous function. In this case one can choose the set $A$ in the second part of Theorem 4.6 equal to $[0,c]$ for any $0<c<a$ [recall that $\Sigma(\rho)$ is positive definite for every $\rho\in[0,a)$ by assumption]. Note further that since we are primarily interested in situations where the initial test $\varphi_{\alpha}$ suffers from the zero-power trap, while the adjusted tests $\varphi_{\alpha,\varepsilon}^{*}$ have limiting power $1$ , it is not restrictive to confine ourselves to intervals $[0,c]$ as above, as we do not want the power of the adjusted test to be close to the power of the initial test in a neighborhood of $a$ . Furthermore, the optimality properties of point-optimal invariant tests (against an alternative $\bar{\rho}\in(0,a)$ ) or of locally best invariant tests (which are characterized by favorable power properties in the neighborhood of [math]) concern only the power function over $[0,c]$ for a suitably chosen $c<a$ .

Remark 4.8.

The tuning parameter $\varepsilon$ needs to be chosen by the user in each particular application. In principle, the user can plot the power functions for various values of $\varepsilon$ , and can then decide upon inspection, which value of $\varepsilon$ provides the best solution. For a specific example we refer to Section 5 below.

Remark 4.9.

Finally, we point out that the construction of $\varphi^{*}_{\alpha,\varepsilon}$ in Equation (23) and the conditions in Proposition 4.5 and Theorem 4.6 do not require the initial test $\varphi_{\alpha}$ to suffer from the zero-power trap. While this is clearly our main focus, this observation shows that our method can also be applied in case the limiting-power of $\varphi_{\alpha}$ is greater than [math] but smaller than one. In such a situation, using $\varphi_{\alpha,\varepsilon}^{*}$ instead of $\varphi_{\alpha}$ can be advantageous as well.

5 Numerical results

In order to illustrate and compare the power properties of the tests introduced in Section 4, we now consider a simple example from spatial econometrics in which the zero-power trap occurs for a popular test. We focus on a situation where the correlation between the observations is a consequence of their proximity, which might be spatial, but could also be, e.g., social, and which is encoded in the adjacency (“weights”) matrix of a graph.

One important model in this case is the spatial (autoregressive) error model, which leads to

[TABLE]

for $W$ a fixed weights matrix which is assumed to be (elementwise) nonnegative and irreducible with zero elements on the main diagonal. By the Perron-Frobenius theorem (e.g., Horn and Johnson (1985), Theorem 8.4.4), the matrix $W$ then has a positive (real) eigenvalue $\lambda_{\max}(W)$ , say, with algebraic multiplicity (and thus also geometric multiplicity) equal to 1, such that any other real or complex zero of the characteristic polynomial of $W$ is in absolute value not larger than $\lambda_{\max}(W)$ . We assume that the parameter $\rho\in[0,\lambda_{\max}(W)^{-1})$ . For $f_{\max}$ a normalized eigenvector of $W$ w.r.t. $\lambda_{\max}(W)$ it is not too difficult to see that Assumption 1 is satisfied (with $e=f_{\max}$ ), and that Assumption 2 is satisfied. For details we refer to Section 4.1 in Preinerstorfer and Pötscher (2017).

The model depends, besides the design matrix $X$ , on the specific form of the weights matrix $W$ , which encodes the dependence relation of the observations. Subsequently we reconsider a simple example considered in Section 3 of Krämer (2005), who has observed (cf. his Figure 1) that for a weights matrix derived by the Queen criterion from a $4\times 4$ regular lattice, and for $X=(1,\ldots,1)^{\prime}\in\mathbb{R}^{16}$ the Cliff-Ord test suffers from the zero-power trap for $\alpha=5\%$ . We recall that the Cliff-Ord test is based on a test statistic as in Equation (5) and with $B=C_{X}(W+W^{\prime})C_{X}^{\prime}$ .

The power function of the Cliff-Ord test and the power functions of the tests described in Section 4 were obtained numerically (cf. also Remark 2.3), and are shown in Figure 1. The figure also shows the power envelope in the class of $G_{X}$ -invariant tests. That is, for each alternative $\bar{\rho}\in(0,\lambda_{\max}(W)^{-1})$ Figure 1 shows the power of the point-optimal $G_{X}$ -invariant level $\alpha=5\%$ test against the alternative $\bar{\rho}$ . Recall from Remark 2.1 that the point-optimal invariant test against alternative $\bar{\rho}$ is based on a test statistic as in (5) and with $B=-[C_{X}\Sigma(\bar{\rho})C_{X}^{\prime}]^{-1}$ . In this example the power envelope is not attained by any $G_{X}$ -invariant test, but it serves the purpose of providing an upper bound for comparison.

While Figure 1 illustrates that the approaches discussed in Sections 4.1 and 4.2 avoid the zero-power trap, it reveals at the same time that the power functions of these tests are not completely satisfying. On the one hand, even though the test introduced in Section 4.1 does not suffer from the zero-power trap, it has low power in a large region of the alternative. On the other hand, the test from Section 4.2 based on the Cliff-Ord test (i.e., as in Equation (18) with $\bar{B}=C_{(X,e)}(W+W^{\prime})C_{(X,e)}^{\prime}$ ) with artificial regressor $e=f_{\max}$ avoids the zero-power trap as well and has a power function that practically coincides with the power envelope for small values of $\rho$ . But its limiting power is smaller than one (in fact is only $0.619$ ).

Figure 1 also contains the power function of some tests corresponding to the procedure outlined in Section 4.3 applied to the family $\varphi_{\alpha}$ of level- $\alpha$ Cliff-Ord tests (cf. Example 4.1). It shows the power functions corresponding to $\varepsilon\in\{.002,.006,.01\}$ . These tests have very good power properties. The power functions are practically identical to the one of the Cliff-Ord test (and hence to the power envelope) for small values of $\rho$ . But for larger values of $\rho$ their power function is much closer to the power envelope than the power of the Cliff-Ord test. In particular, by construction, their power converges to $1$ as $\rho$ gets close to $a$ . One can also observe that smaller values of $\varepsilon$ lead to power functions that are closer to the power function of the Cliff-Ord test for $\rho$ close to [math], whereas larger values of $\varepsilon$ lead to power functions that are closer to the power envelope for $\rho$ close to $a$ .

6 Conclusion

In the present article we have re-considered the zero-power trap phenomenon in testing for correlation in a general framework. Most importantly, we have suggested a way to construct “approximately optimal tests” that avoid the trap. For practical purposes, if an initial test, such as the Cliff-Ord test in the example discussed in Section 5, turns out to suffer from the zero-power trap, we suggest to use the method introduced in Section 4.3 to obtain a modified test with the following properties: (i) it has a similar power function as the initial test, (ii) it does not suffer from the zero-power trap, and (iii) its limiting power equals one. The tuning parameter $\varepsilon$ involved in the construction of the modified test can be chosen by graphically comparing the power functions of modified tests corresponding to different values of the tuning parameter with the power envelope and the power function of the initial test. The heuristic underlying our construction can be interpreted as a finite sample variant of the power enhancement principle of Fan et al. (2015). The approach, which is not restricted to the testing problem under consideration, might be of some interest in its own right.

Appendix A Proofs for results in Section 1

Proof of Lemma 2.2:.

Lemma B.4 in Preinerstorfer and Pötscher (2017) shows that the cdf. $F$ , say, corresponding to $P_{0,1,0}\circ T_{B}$ is continuous, that $F(\lambda_{1}(B))=0$ , $F(\lambda_{n-k}(B))=1$ , and that $F$ is strictly increasing on $[\lambda_{1}(B),\lambda_{n-k}(B)]$ . Hence, the function $f:[\lambda_{1}(B),\lambda_{n-k}(B)]\to[0,1]$ defined via

[TABLE]

is continuous, strictly decreasing, and satisfies $f(\lambda_{1}(B))=1$ and $f(\lambda_{n-k}(B))=0$ . Set $\kappa=f^{-1}$ , i.e., the inverse of $f$ , which is continuous, strictly decreasing, and obviously satisfies $\kappa(0)=\lambda_{n-k}(B)$ and $\kappa(1)=\lambda_{1}(B)$ . Then, $P_{0,1,0}\left(\Phi_{B,\kappa(\alpha)}\right)=\alpha$ for every $\alpha\in[0,1]$ . Finally, recall that $T_{B}$ is $G_{X}$ -invariant, from which it follows (cf. Remark 2.3 in Preinerstorfer and Pötscher (2017)) that for every $c\in\mathbb{R}$ every $\beta\in\mathbb{R}^{k}$ and every $\sigma\in(0,\infty)$ we have $P_{\beta,\sigma,0}(\Phi_{B,c})=P_{0,1,0}(\Phi_{B,c}).$ Hence, $P_{\beta,\sigma,0}\left(\Phi_{B,\kappa(\alpha)}\right)=\alpha$ holds for every $\beta\in\mathbb{R}^{k}$ , every $\sigma\in(0,\infty)$ , and every $\alpha\in[0,1]$ . The uniqueness part is obvious.

Appendix B Proofs for results in Section 3

Proof of Theorem 3.1:.

We apply Theorem 2.7 in Preinerstorfer and Pötscher (2017). Their Assumption 1 coincides with ours and is thus satisfied. Furthermore, by our Gaussianity assumption, their Assumption 3 is satisfied in our framework (with $\mathbf{z}$ a normally distributed random vector with mean [math] and covariance matrix $I_{n}$ ), and we can use Part 1 of their Proposition 2.6 to conclude that their Assumption 2 is satisfied. The statement now follows from Theorem 2.7 in Preinerstorfer and Pötscher (2017) for the special case $\varphi(e)=0$ . The last statement follows from Remark 2.8(i) in the same reference.

Proof of Theorem 3.2:.

We use Corollary 2.21 in Preinerstorfer and Pötscher (2017). That their Assumptions 1 and 2 are satisfied follows as in the proof of Theorem 3.1 above. Recall from Lemma 2.2 that $\kappa$ is a strictly decreasing and continuous bijection from $[0,1]$ to $[\lambda_{1}(B),\lambda_{n-k}(B)]$ , implying that for $\alpha\in(0,1)$ we have $\kappa(\alpha)\in(\lambda_{1}(B),\lambda_{n-k}(B))$ . We can hence apply Corollary 2.21 in Preinerstorfer and Pötscher (2017) to conclude that (under our assumptions) for $\alpha\in(0,1)$ such that $T_{B}(e)<\kappa(\alpha)$ we have (10).

Proof of Lemma 3.3:.

Noting that both $e\notin\mathop{\mathrm{s}pan}(X)$ and $\lambda_{1}(B(X))<\lambda_{n-k}(B(X))$ follow from $C_{X}e\notin\mathop{\mathrm{E}ig}(B(X),\lambda_{n-k}(B(X)))$ , Condition (11) together with the definition of $T_{B(X)}$ in Equation (5) can be used to verify $\lambda_{1}(B(X))\leq T_{B(X)}(e)<\lambda_{n-k}(B(X))$ . Thus, Lemma 2.2 gives $P_{0,1,0}^{X}(\Phi_{B(X),T_{B(X)}(e)})\in(0,1]$ and $T_{B(X)}(e)<\kappa(\alpha)$ for every $\alpha\in(0,P_{0,1,0}^{X}(\Phi_{B(X),T_{B(X)}(e)}))$ . We can now apply Theorem 3.2 to conclude.

Lemma B.1.

Let $M\in\mathbb{R}^{n\times n}$ be symmetric, let $v\in\mathbb{R}^{n}$ be such that $\|v\|=1$ , and suppose that $1\leq d<n-1$ . Then,

[TABLE]

can be written as

[TABLE]

for $p_{M}:\mathbb{R}^{n\times d}\to\mathbb{R}$ a multivariate polynomial, which is given in the proof. Furthermore, $p_{M}\equiv 0$ if and only if $M=c_{1}I_{n}+c_{2}vv^{\prime}$ holds for real numbers $c_{1}$ and $c_{2}$ .

Proof.

Let $L\in\mathbb{R}^{n\times d}$ satisfy $\mathop{\mathrm{r}ank}(L)=d$ , or equivalently $\mathop{\mathrm{d}et}(L^{\prime}L)\neq 0$ . If $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v=0$ , the vector $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v$ can not be an eigenvector of $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ . If $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v\neq 0$ , $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v$ is an eigenvector of the symmetric matrix $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ if and only if

[TABLE]

We can write this rank condition equivalently as

[TABLE]

Writing $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}=I_{n}-\det(L^{\prime}L)^{-1}L\mathop{\mathrm{a}dj}(L^{\prime}L)L^{\prime}$ (throughout we use the convention that the adjoint of a $1\times 1$ matrix equals $1$ ), and premultiplying (30) by $\det(L^{\prime}L)^{16}\neq 0$ , one sees that (30) is equivalent to

[TABLE]

where $Q(L):=\det(L^{\prime}L)I_{n}-L\mathop{\mathrm{a}dj}(L^{\prime}L)L^{\prime}$ . Note that $L\mapsto p(L)$ defines a multivariate polynomial on $\mathbb{R}^{n\times d}$ . It follows that $\mathscr{D}(n,d)$ has the claimed form.

To prove the second statement, note that if $M$ is of the specific form $c_{1}I_{n}+c_{2}vv^{\prime}$ for real numbers $c_{1}$ and $c_{2}$ , one has for every $L\in\mathbb{R}^{n\times d}$ that

[TABLE]

For $L$ such that $\det(L^{\prime}L)\neq 0$ the statement $p_{M}(L)=0$ is equivalent to (29). But (29) holds because of the previous display. If $L$ satisfies $\det(L^{\prime}L)=0$ we obviously have $p_{M}(L)=0$ . Thus, $p_{M}\equiv 0$ for all $M$ of this specific form.

Now assume that $M$ can not be written as $c_{1}I_{n}+c_{2}vv^{\prime}$ for real numbers $c_{1}$ and $c_{2}$ . It suffices to construct a single $L$ such that $p_{M}(L)\neq 0$ holds. We consider two cases:

(a) We first show that one can find an $L$ as required in the special case where $v$ is not an eigenvector of $M$ . Let $u_{1},\ldots,u_{n}$ be an orthonormal basis of eigenvectors of $M$ with corresponding eigenvalues $\lambda_{1}(M),\ldots,\lambda_{n}(M)$ . Note that there then exist two indices $j\neq l$ , say, such that $\lambda_{j}(M)\neq\lambda_{l}(M)$ and such that $v^{\prime}u_{j}\neq 0$ and $v^{\prime}u_{l}\neq 0$ (otherwise $v$ would be an eigenvector of $M$ ; recall that $v\neq 0$ ). Now, define the matrix $L_{\bot}=(u_{j},u_{l},z_{1},\ldots,z_{n-d-2})$ for $z_{1},\ldots,z_{n-d-2}$ linearly independent elements of $\mathop{\mathrm{s}pan}(u_{j},u_{l},v)^{\bot}$ (with the convention that $L_{\bot}=(u_{j},u_{l})$ if $n-d=2$ ; note that $n-d\geq 2$ holds by assumption). Such a choice of $z_{1},\ldots,z_{n-d-2}$ is possible as $d\geq 1$ by assumption. Note that $\mathop{\mathrm{r}ank}(L_{\bot})=n-d$ . Next, let $L$ be an $n\times d$ matrix with $\mathop{\mathrm{s}pan}(L)=\mathop{\mathrm{s}pan}(L_{\bot})^{\bot}$ . Then, $L$ is of full column rank, and $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v\neq 0$ . From the discussion preceding the definition of $p_{M}$ we see that it thus remains to verify that $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v$ is not an eigenvector of $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ . But $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v=\Pi_{\mathop{\mathrm{s}pan}((u_{j},u_{l}))}v=u_{j}^{\prime}vu_{j}+u_{l}^{\prime}vu_{l}$ , implying $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v=\lambda_{j}(M)u_{j}^{\prime}vu_{j}+\lambda_{l}(M)u_{l}^{\prime}vu_{l}$ . Hence, if $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v$ was an eigenvector of $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ , we would have

[TABLE]

for some $c\in\mathbb{R}$ , which gives the contradiction $\lambda_{j}(M)=\lambda_{l}(M)=c$ .

(b) Next we consider the case where $v$ is an eigenvector of $M$ to the eigenvalue $\lambda_{i}(M)$ , say. Let $u_{1},\ldots,u_{n}$ be an orthonormal basis of eigenvectors of $M$ corresponding to its eigenvalues $\lambda_{1}(M),\ldots,\lambda_{n}(M)$ , and where $u_{i}=v$ holds. By assumption, $M$ is not of the form $c_{1}I_{n}+c_{2}vv^{\prime}$ . Together with $v$ being an eigenvector of $M$ this implies (via a diagonalization argument) existence of two indices $j$ and $l$ , say, such that $i,j,l$ are pairwise distinct and such that $\lambda_{j}(M)\neq\lambda_{l}(M)$ . Now, define $L_{\bot}=(x,y,z_{1},\ldots,z_{n-d-2})$ where $x=v+u_{j}$ , $y=v+u_{l}$ and where $z_{1},\ldots,z_{n-d-2}$ are linearly independent elements of $\mathop{\mathrm{s}pan}(u_{j},u_{l},v)^{\bot}$ (with the convention that $L_{\bot}=(x,y)$ if $n-d=2$ ; recall that $n-d\geq 2$ holds by assumption). Such a construction is possible as $d\geq 1$ by assumption. Note that $\mathop{\mathrm{r}ank}(L_{\bot})=n-d$ . Define $L$ as an $n\times d$ matrix with $\mathop{\mathrm{s}pan}(L)=\mathop{\mathrm{s}pan}(L_{\bot})^{\bot}$ . Then, $L$ is of full column rank, and $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v\neq 0$ . Arguing as in (a) it now remains to verify that $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v$ is not an eigenvector of $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ : It is easy to see that

[TABLE]

and that, using the expression in the previous display and a simple computation,

[TABLE]

Hence, for this choice of $L$ the vector $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}v$ is an eigenvector of $\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ if and only if

[TABLE]

for some $c\in\mathbb{R}$ . The number $c$ must then necessarily be nonzero. But this implies (premultiply both sides of (33) by $u_{j}^{\prime}$ , then by $u_{l}^{\prime}$ , and compare the two equations obtained) that $\lambda_{j}(M)=\lambda_{l}(M)$ , a contradiction.

Proof of Proposition 3.4:.

We start with the claim that up to a $\mu_{\mathbb{R}^{n\times k}}$ -null set of exceptional matrices, every $X\in\mathbb{R}^{n\times k}$ satisfies (11). From $k<n$ it follows that $\mu_{\mathbb{R}^{n\times k}}(\{X\in\mathbb{R}^{n\times k}:\mathrm{rank}(X)<k\})=0$ . Hence, it suffices to show that

[TABLE]

is a $\mu_{\mathbb{R}^{n\times k}}$ -null set. We consider two cases:

(a) Suppose first that $M=c_{1}I_{n}+c_{2}ee^{\prime}$ for real numbers $c_{1},c_{2}$ where $c_{2}<0$ . Then, the set in Equation (34) simplifies to

[TABLE]

To see this note that in this case and for $X\in\mathbb{R}^{n\times k}$ so that $\mathop{\mathrm{r}ank}(X)=k$ we have

[TABLE]

where we used the assumption in (12) to obtain the first equality, and the specific structure of $M$ and $c_{2}<0$ to obtain the second equality. Thus, $C_{X}e\in\mathop{\mathrm{E}ig}\left(B(X),~{}\lambda_{n-k}(B(X))\right)$ is possible only if $C_{X}e=0$ , which is equivalent to $e\in\mathop{\mathrm{s}pan}(X)$ . Therefore, (34) simplifies to (35). But, by assumption $1\leq k<n$ holds, from which it is easy to see, noting that $\|e\|=1$ , that $\mu_{\mathbb{R}^{n\times k}}(\{X\in\mathbb{R}^{n\times k}:e\in\mathop{\mathrm{s}pan}(X)\})=0$ . Therefore, the set in (35), and equivalently the set in (34), is a $\mu_{\mathbb{R}^{n\times k}}$ -null set in this case.

(b) Consider now the case where $M$ is not a linear combination of $I_{n}$ and $ee^{\prime}$ . Using Equation (12) we can write the set defined in (34) equivalently as

[TABLE]

For $X\in\mathbb{R}^{n\times k}$ of full column rank the property $C_{X}^{\prime}C_{X}=\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}=\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}^{2}$ can be used to verify that

[TABLE]

implies

[TABLE]

Thus, if $e\notin\mathop{\mathrm{s}pan}(X)$ then $\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}e\neq 0$ , and $C_{X}e\in\mathrm{Eig}(C_{X}MC_{X}^{\prime},\lambda_{n-k}(C_{X}MC_{X}^{\prime}))$ implies that $\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}e$ is an eigenvector of $\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}M\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}$ . Thus, the set in Equation (37) is contained in the union of the $\mu_{\mathbb{R}^{n\times k}}$ -null set $\{X\in\mathbb{R}^{n\times k}:e\in\mathop{\mathrm{s}pan}(X)\}$ and the set

[TABLE]

It thus remains to verify that the set in (38) is a $\mu_{\mathbb{R}^{n\times k}}$ -null set. Lemma B.1 (applied with $k=d$ and $v=e$ ) shows that (38) is the subset of an algebraic set. Note that the assumptions in Lemma B.1 are satisfied as $1\leq k<n-1$ is assumed. The lemma also provides the information that a multivariate polynomial defining this algebraic set does not vanish everywhere. Hence, it follows that the set in the previous display is contained in a $\mu_{\mathbb{R}^{n\times k}}$ -null set. Since the set is Borel measurable (cf., e.g., the representation obtained via Lemma B.1), it follows that it is itself a $\mu_{\mathbb{R}^{n\times k}}$ -null set.

We now prove the two remaining claims concerning $\mathscr{X}(\alpha;B)$ . For the monotonicity claim: If $\mathscr{X}(\alpha_{2};B)$ is empty, there is nothing to prove. Consider the case where $\mathscr{X}(\alpha_{2};B)\neq\emptyset$ . Let $X\in\mathscr{X}(\alpha_{2};B)$ . By definition of $\mathscr{X}(\alpha_{2};B)$ the matrix $X$ has full column rank and $\lambda_{1}(B(X))<\lambda_{n-k}(B(X))$ . From $0<\alpha_{1}\leq\alpha_{2}<1$ it thus follows from Lemma 2.2 that $\kappa(\alpha_{2})\leq\kappa(\alpha_{1})$ . Hence, $\Phi_{B(X),C_{X},\kappa(\alpha_{1})}\subseteq\Phi_{B(X),C_{X},\kappa(\alpha_{2})}$ and one obtains $X\in\mathscr{X}(\alpha_{1};B)$ . Finally, note that Lemma 3.3 shows that if $X$ satisfies (11), then $X\in\bigcup_{m\in\mathbb{N}}\mathscr{X}(\alpha_{m};B)$ . The first (already established) part of the current proposition hence proves the last claim.

Lemma B.2.

Let $M\in\mathbb{R}^{n\times n}$ be symmetric, let $v\in\mathbb{R}^{n}$ such that $\|v\|=1$ , and suppose that $M$ can not be written as $c_{1}I_{n}+c_{2}vv^{\prime}$ for real numbers $c_{1},c_{2}$ where $c_{2}\geq 0$ . Let $d\in\mathbb{N}$ such that $d<n-1$ . Then:

There exists a sequence $L_{m}\in\mathbb{R}^{n\times d}$ such that $L_{m}^{\prime}L_{m}=I_{d}$ and $L_{m}\to L^{*}$ as $m\to\infty$ , a vector $u\in\mathbb{R}^{n}$ with $\|u\|=1$ and a real number $c>\lambda_{\min}(M)$ , such that: $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}v\neq 0$ and $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}u\neq 0$ holds for every $m\in\mathbb{N}$ , such that

[TABLE]

and such that for every $m\in\mathbb{N}$ we have

[TABLE] 2. 2.

Let $B$ be a function from the set of full column rank $n\times d$ matrices to the set of symmetric $(n-d)\times(n-d)$ -dimensional matrices. Suppose there exists a function $F$ from the set of $(n-d)\times(n-d)$ matrices to itself, such that for every $L\in\mathbb{R}^{n\times d}$ of full column rank $B(L)=F(C_{L}MC_{L}^{\prime})$ holds for a suitable choice of $C_{L}\in\mathbb{R}^{(n-d)\times n}$ satisfying $C_{L}C_{L}^{\prime}=I_{n-d}$ and $C_{L}^{\prime}C_{L}=\Pi_{\mathop{\mathrm{s}pan}(L)^{\bot}}$ . Suppose further that $F$ is continuous at every element $A$ , say, of the closure of $\{C_{L}MC_{L}^{\prime}:L\in\mathbb{R}^{n\times d},~{}\mathop{\mathrm{r}ank}(L)=d\}\subseteq\mathbb{R}^{(n-d)\times(n-d)}$ , and that for every such $A$ we have

[TABLE]

Then, the sequence $L_{m}$ obtained in Part 1 satisfies $C_{L_{m}}v\neq 0$ for every $m\in\mathbb{N}$ ,

[TABLE]

and

[TABLE]

for some positive real number $\delta$ .

Proof.

Before we prove Part 1, we note that it suffices to verify the existence claim without the requirement that $L_{m}$ converges: Convergence of $L_{m}$ can then be achieved by passing to a subsequence.

1.a) Consider first the case where $v\in\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ : Let $u\in\mathop{\mathrm{E}ig}(M,\lambda_{\max}(M))$ such that $\|u\|=1$ , and set $L_{m,\bot}:=(u,v,w_{1},\ldots,w_{n-d-2})$ for $w_{1},\ldots,w_{n-d-2}$ linearly independent elements of $\mathop{\mathrm{s}pan}((u,v))^{\bot}$ (with the implicit understanding that $L_{m,\bot}=(u,v)$ in case $d=n-2$ ). By assumption $M$ is not a multiple of $I_{n}$ , thus $\lambda_{\min}(M)<\lambda_{\max}(M)$ , from which it also follows that $L_{m,\bot}$ has full column rank $n-d\geq 2$ for every $m\in\mathbb{N}$ . For every $m\in\mathbb{N}$ set $L_{m}$ equal to an $n\times d$ matrix such that $L_{m}^{\prime}L_{m}=I_{d}$ and $\mathop{\mathrm{s}pan}(L_{m})^{\bot}=\mathop{\mathrm{s}pan}(L_{m,\bot})$ . Then Equations (39) and (40) (with $c=\lambda_{\max}(M)$ ) follow immediately from $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}v=v$ and $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}u=u$ .

1.b) Next, we consider the case where $v\notin\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ : We first claim that there must exist an $x\in\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ such that $\|x\|=1$ and a vector $u\in\mathop{\mathrm{s}pan}(v,x)^{\bot}$ such that $\|u\|=1$ and such that $u^{\prime}Mu>\lambda_{\min}(M)$ . We argue by contradiction: First of all, if the claim was false, then $\mathop{\mathrm{d}im}(\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M)))=n-1$ would follow. We could then choose $v_{1},\ldots,v_{n-1}$ an orthonormal basis of $\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ . Under the assumption that the above claim was wrong, it would further follow that $\mathop{\mathrm{s}pan}(v,v_{i})^{\bot}\subseteq\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ for every $i=1,\ldots,n-1$ , implying $\mathop{\mathrm{s}pan}(v,v_{i})^{\bot}\subseteq\mathop{\mathrm{s}pan}(v_{1},\ldots,v_{i-1},v_{i+1},\ldots,v_{n-1})$ for every $i=1,\ldots,n-1$ , which, by a dimension argument using $v\notin\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ , is equivalent to

[TABLE]

or equivalently

[TABLE]

Since $n\geq 3$ , setting $i=1$ and $i=2$ in the previous display then shows that $v$ is orthogonal to $v_{1},\ldots,v_{n-1}$ , and hence $\mathop{\mathrm{s}pan}(v)=\mathop{\mathrm{E}ig}(M,\lambda_{\max}(M))$ would follow. But then we could conclude that $M=\lambda_{\min}(M)I_{n}+(\lambda_{\max}(M)-\lambda_{\min}(M))vv^{\prime}$ , a contradiction. Now, let $x\in\mathop{\mathrm{E}ig}(M,\lambda_{\min}(M))$ be such that $\|x\|=1$ and a corresponding $u\in\mathop{\mathrm{s}pan}(v,x)^{\bot}$ such that $\|u\|=1$ and such that $u^{\prime}Mu>\lambda_{\min}(M)$ . Let $b_{m}\neq 0$ be a sequence that converges to [math] and such that $b_{m}\neq-v^{\prime}x$ holds for every $m\in\mathbb{N}$ . Then, we define $v_{m}:=x+b_{m}v~{}\bot~{}u$ and set $L_{m,\bot}:=(u,v_{m},w_{1},\ldots,w_{n-d-2})$ (with $L_{m,\bot}=(u,v_{m})$ in case $d=n-2$ ), for $w_{1},\ldots,w_{n-d-2}$ linearly independent elements of $\mathop{\mathrm{s}pan}(u,v,x)^{\bot}$ (which is possible as $d\geq 1$ ). As $v_{m}\neq 0$ follows from $b_{m}\neq-v^{\prime}x$ , the matrix $L_{m,\bot}$ has full column rank $n-d\geq 2$ for every $m\in\mathbb{N}$ . Now, for every $m\in\mathbb{N}$ set $L_{m}$ equal to an $n\times d$ matrix such that $L_{m}^{\prime}L_{m}=I_{d}$ and $\mathop{\mathrm{s}pan}(L_{m})^{\bot}=\mathop{\mathrm{s}pan}(L_{m,\bot})$ . Then

[TABLE]

where $a_{m}=(v^{\prime}x+b_{m})/(v_{m}^{\prime}v_{m})\neq 0$ holds for all $m$ . From $v_{m}\neq 0$ , we thus obtain $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}v\neq 0$ for every $m\in\mathbb{N}$ . But $v_{m}\to x$ hence shows that

[TABLE]

which implies (39). Equation (40) follows because $u\in\mathop{\mathrm{s}pan}(L_{m})^{\bot}$ gives $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}u=u$ , and since $u$ was chosen such that $\|u\|=1$ and $u^{\prime}Mu>\lambda_{\min}(M)$ .

Obviously, $C_{L_{m}}v\neq 0$ follows from $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}v\neq 0$ . Consider first Equation (42). Let $m^{\prime}$ be an arbitrary subsequence of $m$ . Define $v_{m}:=C_{L_{m}}v/\|C_{L_{m}}v\|$ and $A_{m}:=C_{L_{m}}MC_{L_{m}}^{\prime}$ . Clearly $\|v_{m}\|=1$ , and $A_{m}$ is a norm-bounded sequence because $C_{L_{m}}C_{L_{m}}^{\prime}=I_{n-d}$ . The latter also implies

[TABLE]

Hence, we can choose a subsequence $m^{\prime\prime}$ of $m^{\prime}$ , say, along which $v_{m}$ and $A_{m}$ converge to $v_{*}$ and $A$ , say, respectively. Note that $\|v_{*}\|=1$ . Next, we use $C_{L_{m}}^{\prime}C_{L_{m}}=\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}$ to rewrite

[TABLE]

and use Equation (39) to conclude that along $m^{\prime\prime}$ we have $v_{m}^{\prime}A_{m}v_{m}\to v_{*}^{\prime}Av_{*}=\lambda_{\min}(M)$ . From Equation (48) we obtain $\lambda_{\min}(M)=\lambda_{\min}(A)$ , hence

[TABLE]

where the equality is obtained from (41). Finally, we observe that along $m^{\prime\prime}$ we have (using continuity of $F$ ) that $B(L_{m})=F(A_{m})\to F(A)$ , from which

[TABLE]

and $\lambda_{1}(B(L_{m}))\to\lambda_{1}(F(A))$ follows (along $m^{\prime\prime}$ ). Hence, we have shown that the statement in Equation (42) holds along the subsequence $m^{\prime\prime}$ of $m^{\prime}$ . But $m^{\prime}$ was arbitrary. Therefore, we are done.

For (43) we argue by contradiction. Note first that the limit inferior in (43) can not be infinite, because $B(L_{m})=F(C_{L_{m}}MC_{L_{m}}^{\prime})$ , and the continuity property of $F$ together with boundedness of $C_{L_{m}}MC_{L_{m}}^{\prime}$ . Now, assuming (43) were false, we could choose a subsequence $m^{\prime}$ of $m$ such that $\lambda_{n-k}(B(L_{m^{\prime}}))-\lambda_{1}(B(L_{m^{\prime}}))\to 0$ . Choose a subsequence $m^{\prime\prime}$ of $m^{\prime}$ along which $v_{m}$ just defined above, $u_{m}:=C_{L_{m}}u/\|C_{L_{m}}u\|$ (note that $C_{L_{m}}u\neq 0$ follows from $\Pi_{\mathop{\mathrm{s}pan}(L_{m})^{\bot}}u\neq 0$ ) and $A_{m}:=C_{L_{m}}MC_{L_{m}}^{\prime}$ converge to $v_{*}$ , $u_{*}$ and $A$ , respectively (where $v_{*}$ and $A$ might differ from the limits in the preceding paragraph where we established Equation (42)). Note also that $\|v_{*}\|=\|u_{*}\|=1$ . Recall that $B(L_{m})=F(A_{m})$ , note that

[TABLE]

and that, using $\lambda_{n-k}(F(A_{m^{\prime}}))-\lambda_{1}(F(A_{m^{\prime}}))\to 0$ together with continuity of $F$ at $A$ , the upper and lower bound in the previous display converge along $m^{\prime\prime}$ to $\lambda_{1}(F(A))$ . It follows that $u_{*}^{\prime}F(A)u_{*}^{\prime}=\lambda_{1}(F(A))$ , and hence $u_{*}\in\mathop{\mathrm{E}ig}(F(A),\lambda_{1}(F(A)))=\mathop{\mathrm{E}ig}(A,\lambda_{1}(A))$ , the equality following from Equation (41). But from Equation (40) we conclude that $\lambda_{\min}(M)<c=u_{m}^{\prime}A_{m}u_{m}=u_{*}^{\prime}Au_{*}=\lambda_{1}(A)$ holds. To arrive at a contradiction it suffices to show that $\lambda_{\min}(M)=\lambda_{\min}(A)$ . But (similar as argued above in the proof of (42)) this follows from Equation (39), showing that $v_{m}^{\prime}A_{m}v_{m}\to v_{*}^{\prime}Av_{*}=\lambda_{\min}(M)$ along $m^{\prime\prime}$ , together with Equation (48).

Proof of Proposition 3.6:.

We start with (1.): Let $\alpha\in(0,1)$ . Let $X_{m}$ be a sequence of $n\times k$ -dimensional orthonormal matrices converging to some $Z\in\mathbb{R}^{n\times k}$ orthonormal, such that $e\notin\mathop{\mathrm{s}pan}(X_{m})$ holds for every $m\in\mathbb{N}$ , such that

[TABLE]

and such that $\liminf_{m\to\infty}\lambda_{n-k}(B(X_{m}))-\lambda_{1}(B(X_{m}))=\delta>0$ , and where $\delta$ is a real number. Such a sequence exists as a consequence of Part 2 of Lemma B.2 (applied with $d=k$ and $v=e$ ). Without loss of generality, passing to a subsequence if necessary, we assume that $\lambda_{n-k}(B(X_{m}))-\lambda_{1}(B(X_{m}))>0$ holds for every $m\in\mathbb{N}$ . Denote by $\kappa_{m}$ the critical value $\kappa(\alpha)$ corresponding to $\Phi_{B(X_{m}),C_{X_{m}},\kappa(\alpha)}$ , cf. Lemma 2.2, and recall from that lemma that $\lambda_{1}(B(X_{m}))<\kappa_{m}<\lambda_{n-k}(B(X_{m}))$ then holds as $\alpha\in(0,1)$ . Passing to a subsequence if necessary, we can assume that $C_{X_{m}}$ converges to $D_{Z}$ , say, an $(n-k)\times n$ matrix the rows of which form an orthonormal basis of $\mathop{\mathrm{s}pan}(Z)^{\bot}$ . Recall the continuity property of $F$ and that $B(X_{m})=F(C_{X_{m}}MC_{X_{m}}^{\prime})$ . It follows that $B(X_{m})$ , $\lambda_{1}(B(X_{m}))$ , $\lambda_{n-k}(B(X_{m}))$ converge to $H:=F(D_{Z}MD_{Z}^{\prime})$ , $b:=\lambda_{1}(H)$ and $c:=\lambda_{n-k}(H)$ , respectively, with $c-b\geq\delta>0$ . Passing to another subsequence, if necessary, we can additionally achieve that $\kappa_{m}\to\kappa^{*}$ , say. Obviously, $b\leq\kappa^{*}\leq c$ holds. We now argue that $b<\kappa^{*}$ must hold: By the definition of $\kappa_{m}$

[TABLE]

Denoting by $G_{m}$ the cdf. of the image measure $P^{X_{m}}_{0,1,0}\circ T_{B(X_{m}),C_{X_{m}}}$ this implies $1-\alpha=G_{m}(\kappa_{m})$ . From Lemma B.4 of Preinerstorfer and Pötscher (2017) we obtain that the support of $P^{X_{m}}_{0,1,0}\circ T_{B(X_{m}),C_{X_{m}}}$ coincides with $[\lambda_{1}(B(X_{m})),\lambda_{n-k}(B(X_{m}))]$ , that $G_{m}$ is a continuous function, and that $G_{m}$ is strictly increasing on $[\lambda_{1}(B(X_{m})),\lambda_{n-k}(B(X_{m}))]$ . Hence, from $1-\alpha\in(0,1)$ , it follows that $G_{m}^{-1}(1-\alpha)=\kappa_{m}$ , where $G_{m}^{-1}$ denotes the quantile function corresponding to $G_{m}$ . It is easy to see that $G_{m}$ converges in distribution to the cdf. $G$ , say, of $P_{0,1,0}^{Z}\circ T_{H,D_{Z}}$ , where the function $T_{H,D_{Z}}:\mathbb{R}^{n}\to\mathbb{R}$ is defined as

[TABLE]

Again, Lemma B.4 of Preinerstorfer and Pötscher (2017) (with “ $B=H$ and $C_{X}=D_{Z}$ ”) shows that the support of $G$ is $[b,c]$ , that $G$ is continuous (recall that $c-b\geq\delta>0$ ), and that $G$ is strictly increasing on $[b,c]$ . This implies that the quantile function $G^{-1}$ corresponding to $G$ is continuous on $(0,1)$ , and that $G^{-1}(1-\alpha)>b$ . Using the convergence in distribution pointed out above, we conclude that the quantiles $\kappa_{m}=G_{m}^{-1}(1-\alpha)\to G^{-1}(1-\alpha)=\kappa^{*}>b$ . Using Equation (51) can now conclude that there exists an $m_{*}\in\mathbb{N}$ such that $X_{m_{*}}=:X_{*}$ is of full column rank, such that $e\notin\mathop{\mathrm{s}pan}(X_{*})$ , such that $\lambda_{1}(B(X_{*}))<\lambda_{n-k}(B(X_{*}))$ , and such that $T_{B(X_{*}),C_{X_{*}}}(e)<\kappa_{m_{*}}$ (with $\kappa_{m^{*}}$ the critical value $\kappa(\alpha)$ corresponding to $\Phi_{B(X_{*}),C_{X_{m}},\kappa(\alpha)}$ and $\alpha\in(0,1)$ ). Theorem 3.2 establishes $X_{*}\in\mathscr{X}(\alpha;B)$ .

We now prove (2.): Recall that $X_{*}$ has full column rank and $e\notin\mathop{\mathrm{s}pan}(X_{*})$ . We conclude that both statements (i) $X$ is of full column rank and (ii) $e\notin\mathop{\mathrm{s}pan}(X)$ hold for every $X$ in an open set $\mathscr{N}$ , say, containing $X_{*}$ . We now claim that

[TABLE]

holds for every $X$ in an open set $\mathscr{O}\subseteq\mathscr{N}$ containing $X_{*}$ (that $X_{*}$ satisfies the display was just shown above). Arguing as above, this claim and Theorem 3.2 (together with Lemma 2.2) would imply $\mathscr{O}\subseteq\mathscr{X}(\alpha;B)$ , and we were done. To prove the claim, it suffices to verify that $T_{B(X),C_{X}}(e)$ and $\overline{\kappa}(X)$ as in the previous display are (well defined) continuous functions of $X$ on a neighborhood of $X_{*}$ . First, in order to ensure via Lemma 2.2 that a $\overline{\kappa}(X)$ as in the previous display uniquely exists on a neighborhood of $X_{*}$ , we show that $\lambda_{1}(B(X))<\lambda_{n-k}(B(X))$ holds on an open subset of $\mathscr{N}$ containing $X_{*}$ . Recalling that $\lambda_{1}(B(X_{*}))<\lambda_{n-k}(B(X_{*}))$ , and noting that the map $y\mapsto C_{X_{*}}y$ is a surjection of $\mathbb{R}^{n}\backslash\mathop{\mathrm{s}pan}(X_{*})$ to $\mathbb{R}^{n-k}\backslash\{0\}$ , we conclude that there exist two vectors $y_{1}$ and $y_{2}$ in $\mathbb{R}^{n}\backslash\mathop{\mathrm{s}pan}(X_{*})$ , and such that

[TABLE]

holds. From the additional continuity property in (2.) it follows that $y_{1},y_{2}\notin\mathop{\mathrm{s}pan}(X)$ and $T_{B(X),C_{X}}(y_{1})<T_{B(X),C_{X}}(y_{2})$ hold on an open set $\mathscr{O}_{1}\ni X_{*}$ , say, such that $\mathscr{O}_{1}\subseteq\mathscr{N}$ , from which it follows that for every $X\in\mathscr{O}_{1}$ we have $\lambda_{1}(B(X))<\lambda_{n-k}(B(X))$ . From $\mathscr{O}_{1}\subseteq\mathscr{N}$ we conclude from Lemma 2.2 that a $\overline{\kappa}(X)$ satisfying the property to the right in penultimate display uniquely exists for every $X\in\mathscr{O}_{1}$ . Since $X\mapsto T_{B(X),C_{X}}(e)$ is continuous on $\mathscr{O}_{1}\subseteq\mathscr{N}$ by assumption, it remains to verify that $X\mapsto\overline{\kappa}(X)$ is continuous on $\mathscr{O}_{1}$ . Lemma B.4 of Preinerstorfer and Pötscher (2017) and the definition of $\overline{\kappa}(X)$ show that for $X\in\mathscr{O}_{1}$ we have $\overline{\kappa}(X)=F_{X}^{-1}(1-\alpha)$ , where $F_{X}$ denotes the cdf. of the image measure $P_{0,1,0}\circ T_{B(X),C_{X}}$ . It is easy to see (using the additional continuity condition in (2.)) that the map $X\mapsto F_{X}$ is continuous on $\mathscr{O}_{1}$ (equipping the co-domain with the topology of weak convergence). Furthermore, for every $X\in\mathscr{O}_{1}$ it holds (via Lemma B.4 in Preinerstorfer and Pötscher (2017)) that $P_{0,1,0}\circ T_{B(X),C_{X}}$ has support $[\lambda_{1}(B(X)),\lambda_{n-k}(B(X))]$ (which is non-degenerate), that the cdf. $F_{X}$ is continuous, and strictly increasing on $[\lambda_{1}(B(X)),\lambda_{n-k}(B(X))]$ . Hence, for every $X\in\mathscr{O}_{1}$ the quantile function $F_{X}^{-1}$ is continuous at $1-\alpha\in(0,1)$ . Continuity of $X\mapsto\overline{\kappa}(X)=F_{X}^{-1}(1-\alpha)$ on $\mathscr{O}_{1}$ follows.

Proof for the claim made in Remark 3.7:.

We verify that for $B(X)=-(C_{X}\Sigma(\overline{\rho})C_{X}^{\prime})^{-1}$ , $\overline{\rho}\in(0,a)$ , and every $z\in\mathbb{R}^{n}$ the function $X\mapsto T_{B(X),C_{X}}(z)$ is continuous at every $X\in\mathbb{R}^{n\times k}$ of full column rank such that $z\notin\mathop{\mathrm{s}pan}(X)$ . Fix $z\in\mathbb{R}^{n}$ . Let $X$ be of full column rank such that $z\notin\mathop{\mathrm{s}pan}(X)$ , and let $X_{m}$ be a sequence converging to $X$ . Eventually, $X_{m}$ is of full column rank and satisfies $z\notin\mathop{\mathrm{s}pan}(X_{m})$ , hence we may assume that this is the case for the whole sequence. We need to show that as $m\to\infty$ we have $T_{B(X_{m}),C_{X_{m}}}(z)\to T_{B(X),C_{X}}(z)$ , or equivalently that

[TABLE]

Since $X$ is of full column rank $z^{\prime}\Pi_{\mathop{\mathrm{s}pan}(X_{m})^{\bot}}z\to z^{\prime}\Pi_{\mathop{\mathrm{s}pan}(X)^{\bot}}z\neq 0$ obviously holds. For the numerators, let $m^{\prime}$ be an arbitrary subsequence of $m$ , and choose $m^{\prime\prime}$ a subsequence of $m^{\prime}$ such that along $m^{\prime\prime}$ the sequence $C_{X_{m}}$ converges to $D$ , say. Note that $D$ is necessarily orthonormal and $\mathop{\mathrm{s}pan}(D)=\mathop{\mathrm{s}pan}(X)^{\bot}$ . Hence, along $m^{\prime\prime}$ , noting that $\Sigma(\overline{\rho})$ is positive definite by assumption, we have $z^{\prime}C_{X_{m}}^{\prime}(C_{X_{m}}\Sigma(\overline{\rho})C_{X_{m}}^{\prime})^{-1}C_{X_{m}}z\to z^{\prime}D^{\prime}(D\Sigma(\overline{\rho})D^{\prime})^{-1}Dz$ . Since $D=UC_{X}$ holds for an $(n-k)\times(n-k)$ orthonormal matrix $U$ , say, it follows that

[TABLE]

Since the subsequence $m^{\prime}$ was arbitrary, we are done.

Appendix C Proofs for results in Section 4

Proof of Theorem 4.3:.

Denote by $\bar{P}_{(\beta,\gamma),\sigma,\rho}$ the distribution induced by (1), but where $X$ is replaced by $\bar{X}=(X,e)$ (a matrix with column rank $k+1<n$ ), and where $\gamma$ is the regression coefficient corresponding to $e$ . Note also that for every $\beta\in\mathbb{R}^{k}$ , every $\sigma\in(0,\infty)$ and every $\rho\in[0,a)$ the measure $\bar{P}_{(\beta,0),\sigma,\rho}$ coincides with $P_{\beta,\sigma,\rho}$ . An application of Corollary 2.22 in Preinerstorfer and Pötscher (2017) (recall that $\bar{\kappa}(\alpha)\in(\lambda_{1}(\bar{B})<\lambda_{n-k-1}(\bar{B}))$ from the discussion preceding Equation (17), and acting as if $\bar{X}$ was the underlying design matrix) one then immediately obtains that for every $\beta\in\mathbb{R}^{k}$ , every $\sigma\in(0,\infty)$ and every $\gamma\in\mathbb{R}$ it holds that

[TABLE]

Setting $\gamma=0$ then delivers the claim.

Proof of Proposition 4.5:.

We proceed in 3 steps:

By a simple $G_{X}$ -invariance argument (recall A.1 and that $T_{C_{X}ee^{\prime}C_{X}^{\prime}}$ is $G_{X}$ -invariant) it suffices to verify that for every $\alpha\in(0,1)$ and every $\varepsilon\in(0,\alpha)$ there exists a $c(\alpha,\varepsilon)\in(0,\kappa(\varepsilon)]$ such that

[TABLE]

and such that for every $c^{\prime}\in(0,c(\alpha,\varepsilon))$ it holds that the supremum in the previous display is greater than $\alpha$ .

We claim that the non-increasing function $g:\mathbb{R}\to\mathbb{R}$ defined via

[TABLE]

is continuous. To verify this claim let $c\in\mathbb{R}$ , and let $c_{m}\to c$ be a real sequence. By the Dominated Convergence Theorem, to show that $g(c_{m})\to g(c)$ holds, it is enough to verify

[TABLE]

for $P_{0,1,0}$ -almost every $y\in\mathbb{R}^{n}$ . It suffices to verify that

[TABLE]

holds for $P_{0,1,0}$ -almost every $y\in\mathbb{R}^{n}$ . The statement in the previous display holds for every $y$ such that $T_{C_{X}ee^{\prime}C_{X}^{\prime}}(y)\neq c$ . The claim now follows from $P_{0,1,0}(\{y\in\mathbb{R}^{n}:T_{C_{X}ee^{\prime}C_{X}^{\prime}}(y)=c\})=0$ , which can be obtained from Part 1 of Lemma B.4 in Preinerstorfer and Pötscher (2017) upon noting that $\lambda_{1}(C_{X}ee^{\prime}C_{X}^{\prime})=0$ (recall that $k<n-1$ ) and that $0<\|C_{X}e\|^{2}=\lambda_{n-k}(C_{X}ee^{\prime}C_{X}^{\prime})$ (the inequality following from $e\notin\mathop{\mathrm{s}pan}(X)$ ).

Next, note that $\alpha-\varepsilon\leq g\leq 1$ (using A.2 for the lower bound). Observe that $g(0)=1$ follows from $1\geq g(0)\geq P_{0,1,0}(\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},0})=1$ , the last equality following from Part 1 of Lemma B.4 in Preinerstorfer and Pötscher (2017). Observe also that $g(\|C_{X}e\|^{2})=\alpha-\varepsilon$ follows from $\alpha-\varepsilon\leq g(\|C_{X}e\|^{2})\leq\alpha-\varepsilon+P_{0,1,0}(\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},\|C_{X}e\|^{2}})=\alpha-\varepsilon$ , the last equality following again from Part 1 of Lemma B.4 in Preinerstorfer and Pötscher (2017). From these two observations, monotonicity of $g$ , and the continuity of $g$ it follows that $\{c\in\mathbb{R}:g(c)=\alpha\}$ is a closed interval contained in $(0,\|C_{X}e\|^{2})$ . Define $c(\alpha,\varepsilon)$ as the lower endpoint of this closed interval. Equation (56) and thus Equation (24) follows. Furthermore, since $c(\alpha,\varepsilon)$ was defined as the lower endpoint, monotonicity of $g$ implies that every $c^{\prime}\in(0,c(\alpha,\varepsilon))$ must satisfy $g(c^{\prime})>g(c)=\alpha$ . To finally show that $c(\alpha,\varepsilon)\leq\kappa(\varepsilon)$ holds, suppose the opposite, from which it follows from what was already shown that $g(\kappa(\varepsilon))>\alpha$ , which is obviously false (cf. the discussion surrounding (22)). Note also that $0<\kappa(\varepsilon)<\|C_{X}e\|^{2}$ follows from Lemma 2.2.

Proof of Theorem 4.6:.

1.) Let $\varepsilon\in(0,\alpha)$ . Obviously

[TABLE]

which shows that for every $\beta\in\mathbb{R}^{k}$ , every $\sigma\in(0,\infty)$ and every $\rho\in[0,a)$ we have

[TABLE]

From Proposition 4.5 we know that $0=\lambda_{1}(C_{X}ee^{\prime}C_{X}^{\prime})<c(\alpha,\varepsilon)<\lambda_{n-k}(C_{X}ee^{\prime}C_{X}^{\prime})=\|C_{X}e\|^{2}$ . We can therefore use Lemma 2.2 (with $B=C_{X}ee^{\prime}C_{X}^{\prime}$ ) to conclude that $c(\alpha,\varepsilon)=\kappa(\alpha^{*})$ for some $\alpha^{*}\in(0,1)$ , and apply Theorem 4.1 to conclude that for every $\beta\in\mathbb{R}^{k}$ and every $\sigma\in(0,\infty)$ we have $\lim_{\rho\to a}P_{\beta,\sigma,\rho}(\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},c(\alpha,\varepsilon)})=1$ , which together with the lower bound in the previous display proves the claim.

2.) Using $G_{X}$ -invariance of $\varphi^{*}_{\alpha,\varepsilon}$ (for every $\varepsilon\in(0,\alpha)$ ) and of $\varphi_{\alpha}$ , together with $\|\Sigma(\rho)\|>0$ for every $\rho\in[0,a)$ , it suffices to verify that

[TABLE]

Let $\varepsilon_{m}\to 0$ be a sequence in $(0,\alpha)$ and let $\rho_{m}$ be a sequence in $A$ . For convenience, set $\sigma_{m}:=\|\Sigma(\rho_{m})\|^{-1/2}$ . We verify that

[TABLE]

Let $m^{\prime}$ be an arbitrary subsequence of $m$ . By compactness of the unit sphere in $\mathbb{R}^{n\times n}$ , we can choose a subsequence $m^{\prime\prime}$ of $m^{\prime}$ along which $\|\Sigma(\rho_{m})\|^{-1}\Sigma(\rho_{m})$ converges to a symmetric matrix $\Gamma$ , say, which due to the additional assumption on the set $A$ is positive definite. It follows from Scheffé’s lemma that along $m^{\prime\prime}$ the sequence $P_{0,\sigma_{m},\rho_{m}}$ (i.e., the Gaussian probability measure with mean [math] and covariance matrix $\|\Sigma(\rho_{m})\|^{-1}\Sigma(\rho_{m})$ ) converges in total-variation-distance to $Q$ , a Gaussian probability measure with mean [math] and covariance matrix $\Gamma$ . Obviously $|E_{0,\sigma_{m},\rho_{m}}(\varphi^{*}_{\alpha,\varepsilon_{m}}-\varphi_{\alpha})|\leq 2E_{0,\sigma_{m},\rho_{m}}(.5|\varphi^{*}_{\alpha,\varepsilon_{m}}-\varphi_{\alpha}|)$ . By, e.g., Lemma 2.3 in Strasser (1985) and since $.5|\varphi^{*}_{\alpha,\varepsilon_{m}}-\varphi_{\alpha}|$ is a sequence of tests, it follows from the total variation convergence established above that along $m^{\prime\prime}$ we have

[TABLE]

where $E_{Q}$ denotes expectation w.r.t. $Q$ . We now claim that

[TABLE]

This claim, if true, then implies Equation (62) as the subsequence $m^{\prime}$ we started with was arbitrary. We first show that the sequence in the previous display converges to [math], when the expectation is taken w.r.t. $P_{0,1,0}$ instead of $Q$ . To this end write

[TABLE]

From A.3 and the Dominated Convergence Theorem we obtain $E_{0,1,0}[|\varphi_{\alpha-\varepsilon_{m}}-\varphi_{\alpha}|]\to 0$ . It remains to show that $E_{0,1,0}(\psi_{m})\to 0$ for $\psi_{m}:=(1-\varphi_{\alpha-\varepsilon_{m}}(y))\mathbf{1}_{\Phi_{C_{X}ee^{\prime}C_{X}^{\prime},c(\alpha,\varepsilon_{m})}}\geq 0$ . By construction and A.2, however, we have $E_{0,1,0}(\varphi^{*}_{\alpha,\varepsilon_{m}})=\alpha=E_{0,1,0}(\varphi_{\alpha})$ . Therefore, the preceding display shows that $-E_{0,1,0}[\varphi_{\alpha-\varepsilon_{m}}-\varphi_{\alpha}]=E_{0,1,0}(\psi_{m})$ . The statement hence follows from $E_{0,1,0}[\varphi_{\alpha-\varepsilon_{m}}-\varphi_{\alpha}]\to 0$ . Now, suppose (64) were false. Then, there would exist a subsequence $m^{\star}$ of $m$ along which the sequence in (64) converges to $b>0$ , say. Since $E_{0,1,0}(|\varphi^{*}_{\alpha,\varepsilon_{m^{\star}}}-\varphi_{\alpha}|)\to 0$ , there exists a subsequence $m^{\star\star}$ of $m^{\star}$ and a set $N$ such that $P_{0,1,0}(N)=0$ , and such that for every $y\in\mathbb{R}^{n}\backslash N$ it holds that $|\varphi^{*}_{\alpha,\varepsilon_{m^{\star\star}}}(y)-\varphi_{\alpha}(y)|\to 0$ (cf., e.g., Theorem 3.12 in Rudin (1987)). From positive-definiteness of $\Gamma$ it follows, however, that $Q(N)=0$ , and (by the Dominated Convergence Theorem) that $0=\lim_{m^{\star\star}\to\infty}E_{Q}(|\varphi^{*}_{\alpha,\varepsilon_{m^{\star\star}}}-\varphi_{\alpha}|)=b$ , a contradiction.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Davies (1980) Davies, R. B. (1980). Algorithm AS 155: The distribution of a linear combination of χ 2 superscript 𝜒 2 \chi^{2} random variables. Journal of the Royal Statistical Society. Series C (Applied Statistics) 29 (3), 323–333.
2Fan et al. (2015) Fan, J., Y. Liao, and J. Yao (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83 (4), 1497–1541.
3Horn and Johnson (1985) Horn, R. A. and C. R. Johnson (1985). Matrix analysis . Cambridge: Cambridge University Press.
4King and Hillier (1985) King, M. L. and G. H. Hillier (1985). Locally best invariant tests of the error covariance matrix of the linear regression model. Journal of the Royal Statistical Society. Series B (Methodological) 47 , 98–102.
5Kleiber and Krämer (2005) Kleiber, C. and W. Krämer (2005). Finite-sample power of the Durbin-Watson test against fractionally integrated disturbances. Econometrics Journal 8 (3), 406–417.
6Kock and Preinerstorfer (2017) Kock, A. B. and D. Preinerstorfer (2017). Power in high-dimensional testing problems. ar Xiv preprint ar Xiv:1709.04418 .
7Krämer (1985) Krämer, W. (1985). The power of the Durbin-Watson test for regressions without an intercept. Journal of Econometrics 28 (3), 363 – 370.
8Krämer (2005) Krämer, W. (2005). Finite sample power of Cliff-Ord-type tests for spatial disturbance correlation in linear regression. Journal of Statistical Planning and Inference 128 (2), 489–496.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

How to avoid the zero-power trap in testing for correlation

Abstract

1 Introduction

2 Framework

2.1 Model and testing problem

2.2 Notation, invariance and an important class of tests

2.2.1 Notation

2.2.2 Invariance, an important class of tests, and size-controlling critical values

Remark 2.1**.**

Lemma 2.2**.**

Remark 2.3**.**

3 The zero-power trap in testing for correlation

3.1 Definition and sufficient conditions

Assumption 1**.**

Theorem 3.1**.**

Theorem 3.2**.**

3.2 For “how many” design matrices does the zero-power trap arise?

Lemma 3.3**.**

Proposition 3.4**.**

Remark 3.5**.**

Proposition 3.6**.**

Remark 3.7**.**

Remark 3.8**.**

4 Avoiding the zero-power trap

4.1 Tests based on TBT_{B}TB​ with B=CXee′CX′B=C_{X}ee^{\prime}C_{X}^{\prime}B=CX​ee′CX′​

Theorem 4.1**.**

4.2 Tests based on artificial regressors

Remark 4.2**.**

Assumption 2**.**

Theorem 4.3**.**

4.3 Optimality-preserving tests that avoid the zero-power trap

Example 4.1**.**

Remark 4.4**.**

Proposition 4.5**.**

Theorem 4.6**.**

Remark 4.7**.**

Remark 4.8**.**

Remark 4.9**.**

5 Numerical results

6 Conclusion

Appendix A Proofs for results in Section 1

Proof of Lemma 2.2:.

Appendix B Proofs for results in Section 3

Proof of Theorem 3.1:.

Proof of Theorem 3.2:.

Proof of Lemma 3.3:.

Lemma B.1**.**

Proof.

Proof of Proposition 3.4:.

Lemma B.2**.**

Proof.

Proof of Proposition 3.6:.

Proof for the claim made in Remark 3.7:.

Appendix C Proofs for results in Section 4

Proof of Theorem 4.3:.

Proof of Proposition 4.5:.

Proof of Theorem 4.6:.

Remark 2.1.

Lemma 2.2.

Remark 2.3.

Assumption 1.

Theorem 3.1.

Theorem 3.2.

Lemma 3.3.

Proposition 3.4.

Remark 3.5.

Proposition 3.6.

Remark 3.7.

Remark 3.8.

4.1 Tests based on $T_{B}$ with $B=C_{X}ee^{\prime}C_{X}^{\prime}$

Theorem 4.1.

Remark 4.2.

Assumption 2.

Theorem 4.3.

Example 4.1.

Remark 4.4.

Proposition 4.5.

Theorem 4.6.

Remark 4.7.

Remark 4.8.

Remark 4.9.

Lemma B.1.

Lemma B.2.