How to avoid the zero-power trap in testing for correlation
David Preinerstorfer

TL;DR
This paper addresses the zero-power trap in correlation testing, proposing methods to modify tests so they maintain high power even with strongly correlated errors, thus improving test reliability.
Contribution
It introduces a practical modification to existing tests that avoids the zero-power trap while preserving their optimality properties.
Findings
Modified tests achieve power close to one for strong correlations
The approach preserves the original test's power function
Numerical illustrations demonstrate effectiveness in network correlation testing
Abstract
In testing for correlation of the errors in regression models the power of tests can be very low for strongly correlated errors. This counterintuitive phenomenon has become known as the "zero-power trap". Despite a considerable amount of literature devoted to this problem, mainly focusing on its detection, a convincing solution has not yet been found. In this article we first discuss theoretical results concerning the occurrence of the zero-power trap phenomenon. Then, we suggest and compare three ways to avoid it. Given an initial test that suffers from the zero-power trap, the method we recommend for practice leads to a modified test whose power converges to one as the correlation gets very strong. Furthermore, the modified test has approximately the same power function as the initial test, and thus approximately preserves all of its optimality properties. We also provide some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Statistical Methods and Inference
How to avoid the zero-power trap in testing for correlation
David Preinerstorfer
ECARES and SBS-EM
Université libre de Bruxelles
(August 2018)
Abstract
In testing for correlation of the errors in regression models the power of tests can be very low for strongly correlated errors. This counterintuitive phenomenon has become known as the “zero-power trap”. Despite a considerable amount of literature devoted to this problem, mainly focusing on its detection, a convincing solution has not yet been found. In this article we first discuss theoretical results concerning the occurrence of the zero-power trap phenomenon. Then, we suggest and compare three ways to avoid it. Given an initial test that suffers from the zero-power trap, the method we recommend for practice leads to a modified test whose power converges to one as the correlation gets very strong. Furthermore, the modified test has approximately the same power function as the initial test, and thus approximately preserves all of its optimality properties. We also provide some numerical illustrations in the context of testing for network generated correlation.
1 Introduction
Testing whether the errors in a regression model are uncorrelated is a standard problem in econometrics. For many forms of correlation under the alternative there are well-established tests available. Two prominent examples are the Durbin-Watson test for serial autocorrelation, and the Cliff-Ord test for spatial autocorrelation. Nevertheless, this type of testing problem is not completely solved, not even in the Gaussian case. This is partly due to the fact that tests for correlation, including the well-established tests mentioned before, do not always behave as they ideally should in finite samples: Whereas the size of most tests can be easily controlled, at least under suitable distributional assumptions such as Gaussianity, their power function can attain very small values in regions of the alternative where the correlation is very strong. This, however, does not match with the intuition that strong correlations should be easily detectable from the data, i.e., that the power of a test for correlation should be close to one if the degree of correlation in the errors is very strong.
That the power function of a test for correlation can drop to zero as the correlation increases was first formally established in Krämer (1985), who considered the power function of the Durbin-Watson test in testing for serial autocorrelation. The results in Krämer (1985) were extended in later work by Zeisel (1989), Krämer and Zeisel (1990) and Löbus and Ritter (2000). Kleiber and Krämer (2005) obtained similar results for the Durbin-Watson test when the disturbances are fractionally integrated. Krämer (2005) proved related results for Cliff-Ord-type tests in case the regression errors are spatially autocorrelated. A unifying general theory that neither relies on the specific form of correlation nor on very special structural properties of the tests was developed recently in Martellosio (2010) and Preinerstorfer and Pötscher (2017). We refer the interested reader to the latter articles for formal results and a thorough discussion of the literature.
The major practical value of the just mentioned articles is of a diagnostic nature: they provide conditions which depend on observable quantities only and which let a user decide whether a particular test is subject to the zero-power trap, i.e., whether its power function drops to zero as the correlation increases. This is important, because if it turns out that an initial test is subject to this trap, one may want to use another test. However, one is then confronted with the problem of finding a test that avoids the zero-power trap. One complication is as follows: Typically, the initial test was chosen for a reason, i.e., for its “optimal” power properties in certain regions of the parameter space (think of a locally best invariant test). In such situations, one would not just like to use some other test that avoids the zero-power trap. Much more likely, one would prefer to slightly modify the initial test in such a way that its optimality properties are preserved, at least approximately, but such that its modified version does not suffer from the zero-power trap. Compared to the amount of literature that concentrates on deriving diagnostic tools for detecting the zero-power trap, the attention that has been paid to the question how one can construct tests which do not suffer from the zero-power trap is much less. Furthermore, it is not clear how to obtain said “optimality-preserving” modifications. The main contribution of the present article is to fill this gap. In the following paragraphs we provide an overview of the article’s structure together with a more detailed summary of our contributions.
In Section 2 we introduce the framework: the model and the testing problem, some notational conventions and an important class of tests. In Section 3 we formally define the zero-power trap phenomenon, obtain some sufficient conditions for it from results in Preinerstorfer and Pötscher (2017), and then consider in our general framework the question how often, i.e., for “how many” design matrices, the zero-power trap actually arises. We answer this question in Propositions 3.4 and 3.6. The former proposition proves (and generalizes) an observation already made in the discussion section of Krämer (1985). The latter proposition is obtained by generalizing an argument in Martellosio (2012), who considered the same question in a spatial autoregressive setting. Essentially, these two propositions show (for the tests based on the specific family of test statistics and the corresponding critical values considered) respectively that (i) the zero-power trap arises for generic design matrices (i.e., up to a Lebesgue null set of exceptional matrices) for small enough critical values; and (ii) for any critical value that leads to a size in there exists an open set of design matrices for which the zero-power trap arises.
In Section 4 we present three ways to avoid the zero-power trap: In Section 4.1 we briefly discuss a test for which Preinerstorfer and Pötscher (2017) have shown that it does not suffer from the zero-power trap. This test typically does not have very favorable power properties, apart from the fact that it avoids the zero-power trap. We shall mainly use it later as a building block in our construction of “optimality-preserving” tests. In Section 4.2 we discuss tests that incorporate artificial regressors to avoid the zero-power trap. The suggestion of adding artificial regressors to the regression and to use “optimal” tests in this expanded model is present already in Krämer (1985), who observed numerically that adding the intercept to a regression without intercept helps to avoid the zero-power trap for the Durbin-Watson test. Our theoretical results in Section 4.2 exploit results in Preinerstorfer and Pötscher (2017), and are related to the methods in Preinerstorfer and Pötscher (2016) and Preinerstorfer (2017), who considered the construction of tests with good size and power properties for testing restrictions on the regression coefficient vector. While the tests in Section 4.2 are “optimality-preserving” to some extent (more specifically they often have the same optimality property as initial tests, but within a smaller class of tests), it turns out that this solution to the zero-power trap is not ideal. For example, the power function of these tests does not increase to one as the strength of the correlation increases (which is the case for the approach outlined in Section 4.1).
In Section 4.3 we construct optimality-preserving modifications avoiding the zero-power-trap out of an initial test that suffers from the zero-power trap. Our approach overcomes the limitations of the approaches discussed in Sections 4.1 and 4.2. In particular, our method leads to tests that have approximately the same power properties as the initial test. Furthermore, their power converges to one as the strength of the correlation increases. The construction is inspired by the power enhancement principle of Fan et al. (2015) in the formulation used in Section 3 of Kock and Preinerstorfer (2017). The basic idea of this principle is to improve the asymptotic power of an initial test by using another test, a power enhancement component, which has better asymptotic power properties than the initial test in certain regions of the alternative. Since the theory in Fan et al. (2015) and Kock and Preinerstorfer (2017) is asymptotic, and the present article is concerned exclusively with finite sample properties, their results do not apply here. Nevertheless, we can adapt the underlying heuristic to our context: given an initial test that suffers from the zero-power trap, but has favorable power properties in other regions of the alternative, we “combine” this initial test with the test from Section 4.1 to obtain an “enhanced” test.
In Section 5 we compare the approaches for avoiding the zero-power trap discussed in Section 4 numerically. We reconsider an example in Krämer (2005) in which the Cliff-Ord test turns out to suffer from the zero-power trap. Section 6 concludes. All proofs are collected in Appendices A-C.
2 Framework
In the present section we introduce the model, the testing problem and some notation, and we discuss an important class of tests. Most of the notational conventions and terminology we use are standard, and coincide to a large extent with the ones in Preinerstorfer and Pötscher (2017). We repeat them here for the convenience of the reader.
2.1 Model and testing problem
We consider the linear model
[TABLE]
where is a non-stochastic matrix of rank with , and where is the regression coefficient vector. The disturbance vector is assumed to be Gaussian with mean zero and covariance matrix . Here is a known function from to the set of symmetric and positive definite matrices, and is a prespecified positive real number. Without loss of generality we assume throughout that equals the identity matrix . The parameters , and are unknown.
The Gaussianity assumption could be relaxed considerably. It is imposed mainly to avoid technical conditions that do not deliver deeper insights into the problem. For example, we could replace the Gaussianity assumption by the assumption that the distribution of the error vector is elliptically symmetric without changing any of our results. This and other generalizations are discussed in detail in Section 3 of Preinerstorfer and Pötscher (2017).
Denoting the Gaussian probability measure with mean and covariance matrix by , we see that the model (1) induces the parametric family of distributions
[TABLE]
on the sample space equipped with its Borel -algebra. The expectation operator with respect to (w.r.t.) will be denoted by . Note that the set of probability measures in the previous display is dominated by Lebesgue measure on the Borel sets of , because is positive definite for every by assumption.
In the family of distributions (2) we are interested in the testing problem against . More precisely, the testing problem is
[TABLE]
with the implicit understanding that always . In this testing problem the parameter is the target of inference, and the regression coefficient vector and the parameter are nuisance parameters.
Two specific examples that received a considerable amount of attention in the econometrics literature and which fit into the above framework are testing for positive serial autocorrelation and testing for spatial autocorrelation, cf. Examples 2.1 and 2.2 in Preinerstorfer and Pötscher (2017) for details and a discussion of related literature. See also Section 5 below for more information on testing for spatial autocorrelation and related numerical results.
2.2 Notation, invariance and an important class of tests
2.2.1 Notation
All matrices we shall consider are real matrices, the transpose of a matrix is denoted by , and the space spanned by the columns of is denoted by . Given a linear subspace of , the symbol denotes the orthogonal projection onto , and denotes the orthogonal complement of . Given an matrix of rank with , we denote by a matrix in such that and where denotes the identity matrix of dimension . We observe that every matrix whose rows form an orthonormal basis of satisfies these two conditions and vice versa. Hence, any two choices for are related by premultiplication by an orthogonal matrix. Let be a positive integer. If is an matrix and is an eigenvalue of we denote the corresponding eigenspace by . The eigenvalues of a symmetric matrix ordered from smallest to largest and counted with their multiplicities are denoted by . We shall sometimes denote by , and by . Lebesgue measure on the Borel -algebra of shall be denoted by , and Pr is used as a generic symbol for a probability measure. The Euclidean norm of a vector is denoted by , a symbol that is also used to denote a matrix norm.
2.2.2 Invariance, an important class of tests, and size-controlling critical values
Given a matrix with column rank and where , define the group of bijective transformations (the group action being composition of functions)
[TABLE]
where denotes the function .
Under our distributional assumptions (and if additionally all parameters of the model are identifiable) the testing problem in Equation (3) is invariant w.r.t. the group (cf. Section 6 in Lehmann and Romano (2005)). It thus appears reasonable to consider tests that are -invariant, a property shared by most commonly used tests. Recall that a function defined on the sample space (e.g., a test or a test statistic) is called invariant w.r.t. if and only if for every and every it holds that . A subset of will be called invariant w.r.t. if the indicator function is -invariant.
In addition to being -invariant, most tests for (3) used in practice are non-randomized, i.e., they are indicator functions of Borel sets – their corresponding rejection regions. An important class of such tests is based on rejection regions of the form
[TABLE]
where is a critical value and the test statistic
[TABLE]
Here is a symmetric matrix, which typically depends on and the function . Recall that the matrix satisfies and {C}^{\prime}{{}_{X}C_{X}=\Pi}_{\text{\mathop{\mathrm{s}pan}}(X)^{\bot}} (cf. Section 2.2.1). Clearly, the test statistic is -invariant. Note furthermore that in case the test statistic is constant everywhere on . Therefore, such a choice of is uninteresting for practical purposes. Note also that assigning the value (instead of any other value) to the test statistic on has no effect on rejection probabilities, because is absolutely continuous w.r.t. for every , and , and being of dimension implies .
The following remark discusses two particularly important choices of :
Remark 2.1**.**
Under regularity conditions and excluding degenerate cases, point-optimal invariant (w.r.t. ) tests and locally best invariant (w.r.t. ) tests for the testing problem (3) reject for large values of a test statistic as in Equation (5):
- (a)
Point-optimal invariant tests against the alternative are obtained for . 2. (b)
Locally best invariant tests are obtained for , for the derivative of at , ensured to exist under the aforementioned regularity conditions, see, e.g., King and Hillier (1985).
Note that a test statistic based on any of the two matrices in the preceding enumeration does not depend on the specific choice of , as any two choices of differ only by premultiplication of an orthogonal matrix. However, for matrices of a different form than (a) or (b) the test statistic may also depend on the choice of , a dependence which is typically suppressed in our notation.
The main focus of the present article concerns power properties of tests based on a test statistic as in (5) for the testing problem (3). Before investigating power properties of a test, one needs to ensure that its size does not exceed a given value of significance . While this can be a nontrivial problem in general, achieving size control through the choice of a proper critical value turns out to be an easy task here. More specifically, the following lemma shows that exact size control for tests based on a test statistic introduced in Equation (5) is possible at all levels of significance in the leading case . The subsequent remark discusses numerical aspects.
Lemma 2.2**.**
Let be symmetric and such that . Then, there exists a (unique) function such that for every
[TABLE]
Furthermore, is a strictly decreasing and continuous bijection.
Remark 2.3**.**
The rejection probabilities of a -invariant test for (3) do not depend on the parameters and (cf. Remark 2.3 in Preinerstorfer and Pötscher (2017)). As a consequence, the exact critical value from Lemma 2.2 can easily be obtained numerically: To this end one can exploit the well-known fact that for every the rejection probability can be rewritten as the probability that the quadratic form
[TABLE]
where is an -variate Gaussian random vector with mean zero and covariance matrix . This probability can be determined efficiently through an application of standard algorithms, e.g., the algorithm by Davies (1980). The critical value can then be obtained numerically by simply using a root-finding algorithm to determine the unique root of on .
3 The zero-power trap in testing for correlation
3.1 Definition and sufficient conditions
In the sequel, a test (measurable) for testing problem (3) is said to be subject to (or suffer from) the zero-power trap, if there exist and such that
[TABLE]
that is, if the power function of can get arbitrarily close to [math] as the strength of the correlation in the data, measured in terms of , increases. Recall from Remark 2.3 that if is -invariant, which is the case for most tests considered in this article, then does not depend on and . In this case, if Equation (8) holds for some and some , it holds for every and every .
A set of sufficient conditions that allows one to conclude whether a test is subject to the zero-power trap was developed in Martellosio (2010) and Preinerstorfer and Pötscher (2017). The underlying effect leading to (8) described in the latter article is a concentration effect in the (rescaled) distribution when is close to . Preinerstorfer and Pötscher (2017) obtained their sufficient conditions under the following property of the function (cf. also Assumption 1 in Preinerstorfer and Pötscher (2017) and the discussion there showing that this condition is weaker than the one previously used by Martellosio (2010)):
Assumption 1**.**
as for some .
For the convenience of the reader and for later use, we shall now formally state two immediate consequences of results in Preinerstorfer and Pötscher (2017). They provide sufficient conditions for the zero-power trap under Assumption 1. Specializing Theorem 2.7 and Remark 2.8 in Preinerstorfer and Pötscher (2017) one obtains the following “high-level”-result.
Theorem 3.1**.**
Suppose Assumption 1 holds. Let be a -invariant test that is continuous at and satisfies , where is the vector figuring in Assumption 1. Then
[TABLE]
In particular, if holds for some -invariant Borel set , then (9) holds if is not in the closure of .
For the test with rejection region as discussed in Section 2.2.2 and where is defined through Lemma 2.2 one obtains the following result from Corollary 2.21 of Preinerstorfer and Pötscher (2017).
Theorem 3.2**.**
Suppose Assumption 1 holds and , where is the vector figuring in Assumption 1. Let be symmetric and such that . Then, for every such that we have
[TABLE]
Note that the sufficient conditions for the zero-power trap phenomenon pointed out in Theorems 3.1 and 3.2 depend on observable quantities only, and that they are thus checkable by the user. Therefore, a researcher interested in testing problem (3) can use these conditions to check whether or not the given test suffers from the zero-power trap before actually using a test. In particular, one can decide not to use a test that suffers from the zero-power trap. Before addressing the question how to avoid the zero-power trap, which was raised already in the Introduction, we briefly pay some attention to the following question: “how often” does the zero-power trap actually arise? More specifically, in the important class of tests introduced in Section 2.2.2, and most notably the tests discussed in Remark 2.1, the following question arises: For “how many” design matrices does the zero-power trap arise? Answering this question is the content of the next section.
3.2 For “how many” design matrices does the zero-power trap arise?
We shall focus on the class of tests with rejection regions introduced in Section 2.2.2. Since the question in the section title depends on the design matrix , which is otherwise held fixed in this article, we shall make the dependence of on explicit by writing . Furthermore, we shall also write to emphasize its dependence on the design matrix . In our first attempt to answer the question under consideration, we shall use the following simple consequence of Lemma 2.2 and Theorem 3.2, which provides conditions on under which Equation (10) holds for all “small” levels .
Lemma 3.3**.**
Suppose Assumption 1 holds and let denote the vector figuring in that assumption. Let be a function from the set of full column rank matrices to the set of symmetric -dimensional matrices. If an matrix satisfies
[TABLE]
then , and Equation (10) holds for every .
For a class of functions that includes the ones discussed in Remark 2.1 we shall now show that condition (11) is generically satisfied, unless the matrix has a very exceptional form. The result is established under a restriction concerning the eigenspace corresponding to the largest eigenvalue of .
Proposition 3.4**.**
Suppose that and that Assumption 1 holds. Let be a function from the set of full column rank matrices to the set of symmetric -dimensional matrices. Let be a symmetric matrix that can not be written as for real numbers with , where is the vector figuring in Assumption 1. Suppose further that for every of full column rank a satisfying and can be chosen such that
[TABLE]
Then, up to a -null set of exceptional matrices, every satisfies (11). An immediate consequence is as follows: Given denote by the set of all of rank such that and such that
[TABLE]
Then, holds for , and for any sequence in converging to [math] the complement of is contained in a -null set.
Remark 3.5**.**
Note that for Condition (12) in Proposition 3.4 is trivially satisfied with . For and it is easy to see that Condition (12) is satisfied with . Therefore, if for any of these two specific choices the additional condition holds that the respective can not be written as for real numbers where is satisfied, then Proposition 3.4 applies.
Proposition 3.4 shows that tests based on suffer from the zero-power trap for “most” design matrices , at least for small choices of . The discussion section of Krämer (1985) contains a corresponding statement (without proof) in a special case.
Choosing small is not completely uncommon in practice: Due to the fact that testing for correlation is often just one part of the econometric analysis, the actual level employed in this test can be quite small. One example is specification testing. Another example is the situation where tests for correlation are “inverted” to build a confidence interval for , which is then used for a Bonferroni-type construction of a data-dependent critical value of another test (cf. Leeb and Pötscher (2017) for further information concerning such critical values).
Nevertheless, the question remains as to how “large” the set actually is for a fixed , such as the conventional or . For example, Proposition 3.4 does not tell us whether or not the set of design matrices is empty. Similarly, one can ask if contains an open set, or if it has positive measure? The latter questions have already been considered in detail in the main results of Martellosio (2012) for point-optimal invariant and locally best invariant tests in the important context of spatial autoregressive regression models. Adopting his proof strategy, we establish the following proposition. The argument requires a different assumption on than the one used in Proposition 3.4. First, the condition used now concerns the eigenspace of corresponding to its smallest eigenvalue (as opposed to the condition on the largest eigenvalue used in Proposition 3.4). Second, continuity conditions are imposed, which are required for limiting arguments in the proof. As discussed in Remark 3.7 below, the assumptions are again satisfied in the leading choices for discussed in Remark 2.1.
Proposition 3.6**.**
Suppose that and that Assumption 1 holds. Let be a function from the set of full column rank matrices to the set of symmetric -dimensional matrices. Suppose there exists a function from the set of matrices to itself, such that for every of full column rank holds for a suitable choice of satisfying and , and for a symmetric matrix that can not be written as for real numbers where . Here is the vector figuring in Assumption 1. Suppose further that is continuous at every element , say, of the closure of , and that for every such we have
[TABLE]
Define as in Proposition 3.4. Then, the following holds:
* holds for every ;* 2. 2.
suppose that for every the function is continuous at every of full column rank such that . Then, for every the interior of is nonempty (and thus has positive measure).
Remark 3.7**.**
Similar to Remark 3.5 we note that Proposition 3.6 can be applied to (with and the identity function), or to , where , (with and the function , noting that this function satisfies the continuity requirement as is positive definite) provided that the corresponding matrix is not of the exceptional form for . It is not difficult to show that the continuity requirement in Part 2 of the proposition is satisfied for these two choices of . For this is trivial. For , where , an argument is given in Appendix B. We can hence conclude that unless or , respectively, is of the form for some nonnegative , the test suffers from the zero-power trap for every for every in a non-empty open set of design matrices.
Remark 3.8**.**
We emphasize that Propositions 3.4 and 3.6 do not apply in case holds for real numbers where . On the one hand, it is clear that in case a test as in these two propositions with trivially breaks down, as the corresponding test statistics are then constant. But on the other hand, as already observed (for the special case and ) in Preinerstorfer and Pötscher (2017) in the discussion preceding their Remark 2.27, using tests based on for a indeed presents an opportunity to avoid the zero power trap. This will be discussed more formally in Section 4.1.
From the results in the present section we learn that for tests that satisfy certain structural properties, the zero power trap arises for generic design matrices for small enough. Furthermore, for every there exists (under suitable assumptions) a nonempty open set of design matrices every element of which suffers from the zero-power trap. We would like to emphasize, however, that these results do not rule out the possibility that for a given the actual level needed such that the zero-power trap arises can be low (far outside the commonly used range of levels), or that given the open set of design matrices for which the zero-power trap occurs is “small”. Numerical results that illustrate the “practical severity” of the zero-power trap in spatial regression models are provided in Section 3 of Krämer (2005), in particular his Table 1 is very interesting in this context, and further discussion and examples can be found in Martellosio (2010) and Martellosio (2012). These results seem to suggest that the zero-power trap occurs frequently for commonly used levels of significance in case is “small”, i.e., in “high-dimensional” scenarios, whereas if is large the zero-power trap does not appear that frequently. However, this also depends on the dependence structure.
4 Avoiding the zero-power trap
Having provided some context and motivation, we now discuss three ways to avoid the zero-power trap: In Section 4.1 we expand on the observation just made in Remark 3.8. The strategy discussed in Section 4.2 is based on an idea involving artificial regressors. The method we recommend, however, builds on Section 4.1 and is introduced in Section 4.3. Our suggestion tries to overcome sub-optimality properties of the other methods. As discussed in the Introduction, the idea underlying our approach can be interpreted as a finite sample variant of the power enhancement principle of Fan et al. (2015).
4.1 Tests based on with
As discussed in Remark 3.8, tests based on the test statistic with do not satisfy the assumptions underlying Propositions 3.4 and 3.6. Hence, these two propositions do not let us conclude anything concerning the question “how often” the zero-power trap occurs for such tests. It turns out that these tests do not suffer from the zero-power trap for any in case the additional condition holds (note that if holds, the test statistic with is useless as it equals [math] for every ). As pointed out in Remark 3.8, this was already noted in Preinerstorfer and Pötscher (2017). For later use in Section 4.3 we state a corresponding result (which is an immediate consequence of Part 1 of Proposition 2.26 in Preinerstorfer and Pötscher (2017) together with -invariance of and our Lemma 2.2):
Theorem 4.1**.**
Suppose that , that Assumption 1 holds and that , where is the vector figuring in Assumption 1. Then, for every , every and every
[TABLE]
From this result we conclude that in case and whenever a test with size is subject to the zero-power trap, one can alternatively use the test with rejection region instead, which does not suffer from the zero-power trap. Moreover, the power of the test even increases to as . This is a desirable property as it matches the intuition that strong correlations should be easily detectable from the data.
While avoiding the zero-power trap problem, the test suffers from one major disadvantage: the power function of can be, and often will be, quite low for values distant from . If the initial test , which was dismissed because it is subject to the zero-power trap, was chosen because of its good power properties in this region of the alternative, the test will then not constitute a convincing alternative. This is illustrated in the example discussed in Section 5. A method that tries to take optimality properties of the initial test into account, at least for the classes of tests discussed in Remark 2.1, is discussed next.
4.2 Tests based on artificial regressors
The sufficient condition for the zero-power trap in Theorem 3.2 requires that the vector from Assumption 1 is not an element of . While this of course does not prove that the zero-power trap does not arise if , this indeed turns out to be the case under an additional assumption (cf. Corollary 2.22 in Preinerstorfer and Pötscher (2017)). In this section we shall exploit this fact. The method of avoiding the zero-power trap we discuss in this section “enforces” the condition . More specifically, it is based on adding the vector from Assumption 1 as an “artificial” regressor to the design matrix (if it is not already an element of ), and from then constructing tests as if this artificially expanded design matrix was the true one. As discussed in the Introduction, the idea underlying the construction in the present section can be traced back to Krämer (1985).
To formally describe the artificial regressor based method in our general setting, consider a situation where a researcher initially wants to use the test as in Section 2.2.2 with , but discovers (e.g., by checking the sufficient conditions in Theorem 3.2) that suffers from the zero-power trap. Suppose further that the initial test has certain optimality properties (cf. Remark 2.1). The researcher does not want to completely sacrifice the optimality properties of the initial test, which prevents him from using the test just discussed in Section 4.1. Assume further that .
The trick now is to work with the design matrix in the construction of a test statistic, assuming that . More precisely, let be a symmetric matrix (cf. Remark 4.2 below), and define the adjusted test statistic
[TABLE]
Under the additional assumption that , one obtains111To obtain this statement one needs to apply Lemma 2.2 to model (1) but with design matrix instead of . Note that this leads to an “enlarged” model that encompasses the true model as a submodel; and that the distributions satisfying the null hypothesis in the true model also satisfy the null hypothesis in the enlarged model. from Lemma 2.2 for every the existence and uniqueness of a critical value , say, such that for every and every it holds that
[TABLE]
Finally, define the rejection region
[TABLE]
Remark 4.2**.**
We think about as an “updated version” of , i.e., as the matrix one would use if was the underlying design matrix. For example, if the initial matrix equals one could use , or if the initial matrix one could use . Recall that the rejection region (18) based on these two versions of corresponds to locally best invariant tests and point-optimal invariant tests, respectively, in the model where the true design matrix is (cf. Remark 2.1).
We shall now prove that the test with rejection region (18) does not suffer from the zero-power trap. The following result requires an additional assumption on . This is Assumption 4 in Preinerstorfer and Pötscher (2017) to which we refer the reader for equivalent formulations, examples and further discussion.
Assumption 2**.**
There exists a function , a normalized vector , and a square root of such that
[TABLE]
exists in and such that the linear map is injective when restricted to .
The main result concerning artificial regressor based tests is as follows:
Theorem 4.3**.**
Suppose Assumptions 1 and 2 are satisfied with the same vector , that , and that . Suppose further that is a symmetric matrix such that . Then, for every , every and every it holds that
[TABLE]
where denotes a Gaussian random vector with mean [math] and covariance matrix .
Theorem 4.3 shows that is not subject to the zero-power trap. However, its “limiting power” can in principle be low. In particular, it is always smaller than one. This is different to the behavior of the test discussed in Section 4.1, which has limiting power equal to one. Another limitation of Theorem 4.3 is its reliance on the additional Assumption 2.
Following up on the examples discussed in Remark 4.2, an advantage of passing from to , instead of passing from to the test discussed in Section 4.1, is that “preserves” in some sense the optimality properties of , but with respect to the larger group . Note, however, that this does not imply that the power functions of and are “close”.
4.3 Optimality-preserving tests that avoid the zero-power trap
The starting point in this section is an (initial) family of tests for the testing problem (3) indexed by . Given we interpret as the (initial) test one would like to use because of some optimality property. That is, the power function of
[TABLE]
is “large” for certain parameter values in a given subset pertaining to the alternative hypothesis .
We shall suppose that the initial test suffers from the zero-power trap, which one would like to avoid. Ideally a test should have limiting power equal to , a property of the test in Section 4.1, but not of the test in Section 4.2. Furthermore, we would like to keep, at least approximately, the optimal power properties of , which was the reason why was considered for use initially. This is a property of the test in Section 4.2 (at least to some extent), but not of the test in Section 4.1. We shall now present an approach that achieves these two goals.
In what follows, we assume that the family of tests under consideration satisfies Property A, i.e., satisfies the following:
{addmargin}
[1em]2em
- A.1:
For every the test is -invariant. 2. A.2:
For every the test has size , i.e.,
[TABLE] 3. A.3:
For every and every sequence converging to we have that holds for -almost every .
To illustrate the assumption, consider the following important example:
Example 4.1**.**
Let be as in (5) with an symmetric matrix such that . For every let be the critical value from Lemma 2.2. Set equal to the non-randomized test with rejection region , i.e., . We already know that is -invariant, and thus is -invariant for every . Hence A.1 is satisfied. Furthermore, from Lemma 2.2 we see that satisfies A.2. That A.3 is satisfied is an immediate consequence of continuity of , which was established in Lemma 2.2, together with the fact that for every the set
[TABLE]
is a -null set; the latter is a consequence of Lemma B.4 in Preinerstorfer and Pötscher (2017), which shows that the cdf. , say, corresponding to is continuous.
Remark 4.4**.**
While not required in Property A, typical families will also satisfy the condition that for any real numbers in it holds for -almost every that . For instance, this is the case for the families of tests discussed in Example 4.1 (this follows from the monotonicity property of established in Lemma 2.2). One obvious consequence of this condition is that if suffers from the zero-power trap, then suffers from the zero-power trap as well. Therefore, for such families, if suffers from the zero-power trap, there is no hope that one can easily avoid the zero-power trap by using for some (which would at least be a test whose size does not exceed ).
Suppose in the following discussion that , that Assumption 1 holds and that . Recall from Theorem 4.1 that under these conditions the -invariant test does not suffer from the zero-power trap, in fact has limiting power one, at all levels . Using this property, we shall now define a -invariant test that has approximately the same power properties of with the advantage that it has limiting power just as the test .
The basic idea is as follows (precise statements are provided further below): From Property A.3 one obtains that for small, the power functions of and are similar. Theorem 4.1 tells us that the test with rejection region has limiting power (as ) equal to , and Lemma 2.2 shows that this test has size equal to . Hence, we could use the -invariant test
[TABLE]
whose power function is similar to (at least for small), but which has limiting power equal to one (for every ). Trivially, this test has size not greater than , but potentially its size is smaller than , implying some unnecessary loss in power, which one can try to avoid by decreasing .
More specifically, define the -invariant test
[TABLE]
where is chosen to be the smallest number such that has size equal to . That such a choice of is indeed possible is the content of the next proposition. Note that is non-randomized if the test is non-randomized.
Proposition 4.5**.**
Suppose that , that satisfies , and that the family satisfies Properties A.1 and A.2. Then, for every and every there exists a such that
[TABLE]
and such that for every it holds that the supremum in the previous display is greater than ; here denotes the unique real number such that has size equal to (cf. Lemma 2.2).
Note that the critical value can be easily determined numerically by a simple line search algorithm, cf. also Remark 2.3.
Having established that the test in Equation (23) is actually well-defined, we now prove that it does not suffer from the zero-power trap but has limiting power for any choice of . Furthermore, we show that the power function of approximates (even uniformly over suitable subsets of the parameter space) the power function of as converges to [math]. In this sense, choosing small, the test preserves “optimal” power properties (such as point-optimal invariance, or locally best invariance, cf. Example 4.1 above) from at least approximately. Furthermore, the degree of approximation can be tuned by the user via .
Theorem 4.6**.**
Suppose that , that Assumption 1 holds and that , where is the vector figuring in Assumption 1. Assume that the family satisfies Properties A.1 and A.2. Let . Then, the following holds:
For every , every and every we have
[TABLE]
in particular does not suffer from the zero-power trap. 2. 2.
Suppose that the family also satisfies Property A.3. Let be such that the closure of the set
[TABLE]
is contained in the set of positive definite symmetric matrices. Then
[TABLE]
Remark 4.7**.**
In the leading case is a continuous function. In this case one can choose the set in the second part of Theorem 4.6 equal to for any [recall that is positive definite for every by assumption]. Note further that since we are primarily interested in situations where the initial test suffers from the zero-power trap, while the adjusted tests have limiting power , it is not restrictive to confine ourselves to intervals as above, as we do not want the power of the adjusted test to be close to the power of the initial test in a neighborhood of . Furthermore, the optimality properties of point-optimal invariant tests (against an alternative ) or of locally best invariant tests (which are characterized by favorable power properties in the neighborhood of [math]) concern only the power function over for a suitably chosen .
Remark 4.8**.**
The tuning parameter needs to be chosen by the user in each particular application. In principle, the user can plot the power functions for various values of , and can then decide upon inspection, which value of provides the best solution. For a specific example we refer to Section 5 below.
Remark 4.9**.**
Finally, we point out that the construction of in Equation (23) and the conditions in Proposition 4.5 and Theorem 4.6 do not require the initial test to suffer from the zero-power trap. While this is clearly our main focus, this observation shows that our method can also be applied in case the limiting-power of is greater than [math] but smaller than one. In such a situation, using instead of can be advantageous as well.
5 Numerical results
In order to illustrate and compare the power properties of the tests introduced in Section 4, we now consider a simple example from spatial econometrics in which the zero-power trap occurs for a popular test. We focus on a situation where the correlation between the observations is a consequence of their proximity, which might be spatial, but could also be, e.g., social, and which is encoded in the adjacency (“weights”) matrix of a graph.
One important model in this case is the spatial (autoregressive) error model, which leads to
[TABLE]
for a fixed weights matrix which is assumed to be (elementwise) nonnegative and irreducible with zero elements on the main diagonal. By the Perron-Frobenius theorem (e.g., Horn and Johnson (1985), Theorem 8.4.4), the matrix then has a positive (real) eigenvalue , say, with algebraic multiplicity (and thus also geometric multiplicity) equal to 1, such that any other real or complex zero of the characteristic polynomial of is in absolute value not larger than . We assume that the parameter . For a normalized eigenvector of w.r.t. it is not too difficult to see that Assumption 1 is satisfied (with ), and that Assumption 2 is satisfied. For details we refer to Section 4.1 in Preinerstorfer and Pötscher (2017).
The model depends, besides the design matrix , on the specific form of the weights matrix , which encodes the dependence relation of the observations. Subsequently we reconsider a simple example considered in Section 3 of Krämer (2005), who has observed (cf. his Figure 1) that for a weights matrix derived by the Queen criterion from a regular lattice, and for the Cliff-Ord test suffers from the zero-power trap for . We recall that the Cliff-Ord test is based on a test statistic as in Equation (5) and with .
The power function of the Cliff-Ord test and the power functions of the tests described in Section 4 were obtained numerically (cf. also Remark 2.3), and are shown in Figure 1. The figure also shows the power envelope in the class of -invariant tests. That is, for each alternative Figure 1 shows the power of the point-optimal -invariant level test against the alternative . Recall from Remark 2.1 that the point-optimal invariant test against alternative is based on a test statistic as in (5) and with . In this example the power envelope is not attained by any -invariant test, but it serves the purpose of providing an upper bound for comparison.
While Figure 1 illustrates that the approaches discussed in Sections 4.1 and 4.2 avoid the zero-power trap, it reveals at the same time that the power functions of these tests are not completely satisfying. On the one hand, even though the test introduced in Section 4.1 does not suffer from the zero-power trap, it has low power in a large region of the alternative. On the other hand, the test from Section 4.2 based on the Cliff-Ord test (i.e., as in Equation (18) with ) with artificial regressor avoids the zero-power trap as well and has a power function that practically coincides with the power envelope for small values of . But its limiting power is smaller than one (in fact is only ).
Figure 1 also contains the power function of some tests corresponding to the procedure outlined in Section 4.3 applied to the family of level- Cliff-Ord tests (cf. Example 4.1). It shows the power functions corresponding to . These tests have very good power properties. The power functions are practically identical to the one of the Cliff-Ord test (and hence to the power envelope) for small values of . But for larger values of their power function is much closer to the power envelope than the power of the Cliff-Ord test. In particular, by construction, their power converges to as gets close to . One can also observe that smaller values of lead to power functions that are closer to the power function of the Cliff-Ord test for close to [math], whereas larger values of lead to power functions that are closer to the power envelope for close to .
6 Conclusion
In the present article we have re-considered the zero-power trap phenomenon in testing for correlation in a general framework. Most importantly, we have suggested a way to construct “approximately optimal tests” that avoid the trap. For practical purposes, if an initial test, such as the Cliff-Ord test in the example discussed in Section 5, turns out to suffer from the zero-power trap, we suggest to use the method introduced in Section 4.3 to obtain a modified test with the following properties: (i) it has a similar power function as the initial test, (ii) it does not suffer from the zero-power trap, and (iii) its limiting power equals one. The tuning parameter involved in the construction of the modified test can be chosen by graphically comparing the power functions of modified tests corresponding to different values of the tuning parameter with the power envelope and the power function of the initial test. The heuristic underlying our construction can be interpreted as a finite sample variant of the power enhancement principle of Fan et al. (2015). The approach, which is not restricted to the testing problem under consideration, might be of some interest in its own right.
Appendix A Proofs for results in Section 1
Proof of Lemma 2.2:.
Lemma B.4 in Preinerstorfer and Pötscher (2017) shows that the cdf. , say, corresponding to is continuous, that , , and that is strictly increasing on . Hence, the function defined via
[TABLE]
is continuous, strictly decreasing, and satisfies and . Set , i.e., the inverse of , which is continuous, strictly decreasing, and obviously satisfies and . Then, for every . Finally, recall that is -invariant, from which it follows (cf. Remark 2.3 in Preinerstorfer and Pötscher (2017)) that for every every and every we have Hence, holds for every , every , and every . The uniqueness part is obvious.
Appendix B Proofs for results in Section 3
Proof of Theorem 3.1:.
We apply Theorem 2.7 in Preinerstorfer and Pötscher (2017). Their Assumption 1 coincides with ours and is thus satisfied. Furthermore, by our Gaussianity assumption, their Assumption 3 is satisfied in our framework (with a normally distributed random vector with mean [math] and covariance matrix ), and we can use Part 1 of their Proposition 2.6 to conclude that their Assumption 2 is satisfied. The statement now follows from Theorem 2.7 in Preinerstorfer and Pötscher (2017) for the special case . The last statement follows from Remark 2.8(i) in the same reference.
Proof of Theorem 3.2:.
We use Corollary 2.21 in Preinerstorfer and Pötscher (2017). That their Assumptions 1 and 2 are satisfied follows as in the proof of Theorem 3.1 above. Recall from Lemma 2.2 that is a strictly decreasing and continuous bijection from to , implying that for we have . We can hence apply Corollary 2.21 in Preinerstorfer and Pötscher (2017) to conclude that (under our assumptions) for such that we have (10).
Proof of Lemma 3.3:.
Noting that both and follow from , Condition (11) together with the definition of in Equation (5) can be used to verify . Thus, Lemma 2.2 gives and for every . We can now apply Theorem 3.2 to conclude.
Lemma B.1**.**
Let be symmetric, let be such that , and suppose that . Then,
[TABLE]
can be written as
[TABLE]
for a multivariate polynomial, which is given in the proof. Furthermore, if and only if holds for real numbers and .
Proof.
Let satisfy , or equivalently . If , the vector can not be an eigenvector of . If , is an eigenvector of the symmetric matrix if and only if
[TABLE]
We can write this rank condition equivalently as
[TABLE]
Writing (throughout we use the convention that the adjoint of a matrix equals ), and premultiplying (30) by , one sees that (30) is equivalent to
[TABLE]
where . Note that defines a multivariate polynomial on . It follows that has the claimed form.
To prove the second statement, note that if is of the specific form for real numbers and , one has for every that
[TABLE]
For such that the statement is equivalent to (29). But (29) holds because of the previous display. If satisfies we obviously have . Thus, for all of this specific form.
Now assume that can not be written as for real numbers and . It suffices to construct a single such that holds. We consider two cases:
(a) We first show that one can find an as required in the special case where is not an eigenvector of . Let be an orthonormal basis of eigenvectors of with corresponding eigenvalues . Note that there then exist two indices , say, such that and such that and (otherwise would be an eigenvector of ; recall that ). Now, define the matrix for linearly independent elements of (with the convention that if ; note that holds by assumption). Such a choice of is possible as by assumption. Note that . Next, let be an matrix with . Then, is of full column rank, and . From the discussion preceding the definition of we see that it thus remains to verify that is not an eigenvector of . But , implying . Hence, if was an eigenvector of , we would have
[TABLE]
for some , which gives the contradiction .
(b) Next we consider the case where is an eigenvector of to the eigenvalue , say. Let be an orthonormal basis of eigenvectors of corresponding to its eigenvalues , and where holds. By assumption, is not of the form . Together with being an eigenvector of this implies (via a diagonalization argument) existence of two indices and , say, such that are pairwise distinct and such that . Now, define where , and where are linearly independent elements of (with the convention that if ; recall that holds by assumption). Such a construction is possible as by assumption. Note that . Define as an matrix with . Then, is of full column rank, and . Arguing as in (a) it now remains to verify that is not an eigenvector of : It is easy to see that
[TABLE]
and that, using the expression in the previous display and a simple computation,
[TABLE]
Hence, for this choice of the vector is an eigenvector of if and only if
[TABLE]
for some . The number must then necessarily be nonzero. But this implies (premultiply both sides of (33) by , then by , and compare the two equations obtained) that , a contradiction.
Proof of Proposition 3.4:.
We start with the claim that up to a -null set of exceptional matrices, every satisfies (11). From it follows that . Hence, it suffices to show that
[TABLE]
is a -null set. We consider two cases:
(a) Suppose first that for real numbers where . Then, the set in Equation (34) simplifies to
[TABLE]
To see this note that in this case and for so that we have
[TABLE]
where we used the assumption in (12) to obtain the first equality, and the specific structure of and to obtain the second equality. Thus, is possible only if , which is equivalent to . Therefore, (34) simplifies to (35). But, by assumption holds, from which it is easy to see, noting that , that . Therefore, the set in (35), and equivalently the set in (34), is a -null set in this case.
(b) Consider now the case where is not a linear combination of and . Using Equation (12) we can write the set defined in (34) equivalently as
[TABLE]
For of full column rank the property can be used to verify that
[TABLE]
implies
[TABLE]
Thus, if then , and implies that is an eigenvector of . Thus, the set in Equation (37) is contained in the union of the -null set and the set
[TABLE]
It thus remains to verify that the set in (38) is a -null set. Lemma B.1 (applied with and ) shows that (38) is the subset of an algebraic set. Note that the assumptions in Lemma B.1 are satisfied as is assumed. The lemma also provides the information that a multivariate polynomial defining this algebraic set does not vanish everywhere. Hence, it follows that the set in the previous display is contained in a -null set. Since the set is Borel measurable (cf., e.g., the representation obtained via Lemma B.1), it follows that it is itself a -null set.
We now prove the two remaining claims concerning . For the monotonicity claim: If is empty, there is nothing to prove. Consider the case where . Let . By definition of the matrix has full column rank and . From it thus follows from Lemma 2.2 that . Hence, and one obtains . Finally, note that Lemma 3.3 shows that if satisfies (11), then . The first (already established) part of the current proposition hence proves the last claim.
Lemma B.2**.**
Let be symmetric, let such that , and suppose that can not be written as for real numbers where . Let such that . Then:
There exists a sequence such that and as , a vector with and a real number , such that: and holds for every , such that
[TABLE]
and such that for every we have
[TABLE] 2. 2.
Let be a function from the set of full column rank matrices to the set of symmetric -dimensional matrices. Suppose there exists a function from the set of matrices to itself, such that for every of full column rank holds for a suitable choice of satisfying and . Suppose further that is continuous at every element , say, of the closure of , and that for every such we have
[TABLE]
Then, the sequence obtained in Part 1 satisfies for every ,
[TABLE]
and
[TABLE]
for some positive real number .
Proof.
Before we prove Part 1, we note that it suffices to verify the existence claim without the requirement that converges: Convergence of can then be achieved by passing to a subsequence.
1.a) Consider first the case where : Let such that , and set for linearly independent elements of (with the implicit understanding that in case ). By assumption is not a multiple of , thus , from which it also follows that has full column rank for every . For every set equal to an matrix such that and . Then Equations (39) and (40) (with ) follow immediately from and .
1.b) Next, we consider the case where : We first claim that there must exist an such that and a vector such that and such that . We argue by contradiction: First of all, if the claim was false, then would follow. We could then choose an orthonormal basis of . Under the assumption that the above claim was wrong, it would further follow that for every , implying for every , which, by a dimension argument using , is equivalent to
[TABLE]
or equivalently
[TABLE]
Since , setting and in the previous display then shows that is orthogonal to , and hence would follow. But then we could conclude that , a contradiction. Now, let be such that and a corresponding such that and such that . Let be a sequence that converges to [math] and such that holds for every . Then, we define and set (with in case ), for linearly independent elements of (which is possible as ). As follows from , the matrix has full column rank for every . Now, for every set equal to an matrix such that and . Then
[TABLE]
where holds for all . From , we thus obtain for every . But hence shows that
[TABLE]
which implies (39). Equation (40) follows because gives , and since was chosen such that and .
- Obviously, follows from . Consider first Equation (42). Let be an arbitrary subsequence of . Define and . Clearly , and is a norm-bounded sequence because . The latter also implies
[TABLE]
Hence, we can choose a subsequence of , say, along which and converge to and , say, respectively. Note that . Next, we use to rewrite
[TABLE]
and use Equation (39) to conclude that along we have . From Equation (48) we obtain , hence
[TABLE]
where the equality is obtained from (41). Finally, we observe that along we have (using continuity of ) that , from which
[TABLE]
and follows (along ). Hence, we have shown that the statement in Equation (42) holds along the subsequence of . But was arbitrary. Therefore, we are done.
For (43) we argue by contradiction. Note first that the limit inferior in (43) can not be infinite, because , and the continuity property of together with boundedness of . Now, assuming (43) were false, we could choose a subsequence of such that . Choose a subsequence of along which just defined above, (note that follows from ) and converge to , and , respectively (where and might differ from the limits in the preceding paragraph where we established Equation (42)). Note also that . Recall that , note that
[TABLE]
and that, using together with continuity of at , the upper and lower bound in the previous display converge along to . It follows that , and hence , the equality following from Equation (41). But from Equation (40) we conclude that holds. To arrive at a contradiction it suffices to show that . But (similar as argued above in the proof of (42)) this follows from Equation (39), showing that along , together with Equation (48).
Proof of Proposition 3.6:.
We start with (1.): Let . Let be a sequence of -dimensional orthonormal matrices converging to some orthonormal, such that holds for every , such that
[TABLE]
and such that , and where is a real number. Such a sequence exists as a consequence of Part 2 of Lemma B.2 (applied with and ). Without loss of generality, passing to a subsequence if necessary, we assume that holds for every . Denote by the critical value corresponding to , cf. Lemma 2.2, and recall from that lemma that then holds as . Passing to a subsequence if necessary, we can assume that converges to , say, an matrix the rows of which form an orthonormal basis of . Recall the continuity property of and that . It follows that , , converge to , and , respectively, with . Passing to another subsequence, if necessary, we can additionally achieve that , say. Obviously, holds. We now argue that must hold: By the definition of
[TABLE]
Denoting by the cdf. of the image measure this implies . From Lemma B.4 of Preinerstorfer and Pötscher (2017) we obtain that the support of coincides with , that is a continuous function, and that is strictly increasing on . Hence, from , it follows that , where denotes the quantile function corresponding to . It is easy to see that converges in distribution to the cdf. , say, of , where the function is defined as
[TABLE]
Again, Lemma B.4 of Preinerstorfer and Pötscher (2017) (with “ and ”) shows that the support of is , that is continuous (recall that ), and that is strictly increasing on . This implies that the quantile function corresponding to is continuous on , and that . Using the convergence in distribution pointed out above, we conclude that the quantiles . Using Equation (51) can now conclude that there exists an such that is of full column rank, such that , such that , and such that (with the critical value corresponding to and ). Theorem 3.2 establishes .
We now prove (2.): Recall that has full column rank and . We conclude that both statements (i) is of full column rank and (ii) hold for every in an open set , say, containing . We now claim that
[TABLE]
holds for every in an open set containing (that satisfies the display was just shown above). Arguing as above, this claim and Theorem 3.2 (together with Lemma 2.2) would imply , and we were done. To prove the claim, it suffices to verify that and as in the previous display are (well defined) continuous functions of on a neighborhood of . First, in order to ensure via Lemma 2.2 that a as in the previous display uniquely exists on a neighborhood of , we show that holds on an open subset of containing . Recalling that , and noting that the map is a surjection of to , we conclude that there exist two vectors and in , and such that
[TABLE]
holds. From the additional continuity property in (2.) it follows that and hold on an open set , say, such that , from which it follows that for every we have . From we conclude from Lemma 2.2 that a satisfying the property to the right in penultimate display uniquely exists for every . Since is continuous on by assumption, it remains to verify that is continuous on . Lemma B.4 of Preinerstorfer and Pötscher (2017) and the definition of show that for we have , where denotes the cdf. of the image measure . It is easy to see (using the additional continuity condition in (2.)) that the map is continuous on (equipping the co-domain with the topology of weak convergence). Furthermore, for every it holds (via Lemma B.4 in Preinerstorfer and Pötscher (2017)) that has support (which is non-degenerate), that the cdf. is continuous, and strictly increasing on . Hence, for every the quantile function is continuous at . Continuity of on follows.
Proof for the claim made in Remark 3.7:.
We verify that for , , and every the function is continuous at every of full column rank such that . Fix . Let be of full column rank such that , and let be a sequence converging to . Eventually, is of full column rank and satisfies , hence we may assume that this is the case for the whole sequence. We need to show that as we have , or equivalently that
[TABLE]
Since is of full column rank obviously holds. For the numerators, let be an arbitrary subsequence of , and choose a subsequence of such that along the sequence converges to , say. Note that is necessarily orthonormal and . Hence, along , noting that is positive definite by assumption, we have . Since holds for an orthonormal matrix , say, it follows that
[TABLE]
Since the subsequence was arbitrary, we are done.
Appendix C Proofs for results in Section 4
Proof of Theorem 4.3:.
Denote by the distribution induced by (1), but where is replaced by (a matrix with column rank ), and where is the regression coefficient corresponding to . Note also that for every , every and every the measure coincides with . An application of Corollary 2.22 in Preinerstorfer and Pötscher (2017) (recall that from the discussion preceding Equation (17), and acting as if was the underlying design matrix) one then immediately obtains that for every , every and every it holds that
[TABLE]
Setting then delivers the claim.
Proof of Proposition 4.5:.
We proceed in 3 steps:
- By a simple -invariance argument (recall A.1 and that is -invariant) it suffices to verify that for every and every there exists a such that
[TABLE]
and such that for every it holds that the supremum in the previous display is greater than .
- We claim that the non-increasing function defined via
[TABLE]
is continuous. To verify this claim let , and let be a real sequence. By the Dominated Convergence Theorem, to show that holds, it is enough to verify
[TABLE]
for -almost every . It suffices to verify that
[TABLE]
holds for -almost every . The statement in the previous display holds for every such that . The claim now follows from , which can be obtained from Part 1 of Lemma B.4 in Preinerstorfer and Pötscher (2017) upon noting that (recall that ) and that (the inequality following from ).
- Next, note that (using A.2 for the lower bound). Observe that follows from , the last equality following from Part 1 of Lemma B.4 in Preinerstorfer and Pötscher (2017). Observe also that follows from , the last equality following again from Part 1 of Lemma B.4 in Preinerstorfer and Pötscher (2017). From these two observations, monotonicity of , and the continuity of it follows that is a closed interval contained in . Define as the lower endpoint of this closed interval. Equation (56) and thus Equation (24) follows. Furthermore, since was defined as the lower endpoint, monotonicity of implies that every must satisfy . To finally show that holds, suppose the opposite, from which it follows from what was already shown that , which is obviously false (cf. the discussion surrounding (22)). Note also that follows from Lemma 2.2.
Proof of Theorem 4.6:.
1.) Let . Obviously
[TABLE]
which shows that for every , every and every we have
[TABLE]
From Proposition 4.5 we know that . We can therefore use Lemma 2.2 (with ) to conclude that for some , and apply Theorem 4.1 to conclude that for every and every we have , which together with the lower bound in the previous display proves the claim.
2.) Using -invariance of (for every ) and of , together with for every , it suffices to verify that
[TABLE]
Let be a sequence in and let be a sequence in . For convenience, set . We verify that
[TABLE]
Let be an arbitrary subsequence of . By compactness of the unit sphere in , we can choose a subsequence of along which converges to a symmetric matrix , say, which due to the additional assumption on the set is positive definite. It follows from Scheffé’s lemma that along the sequence (i.e., the Gaussian probability measure with mean [math] and covariance matrix ) converges in total-variation-distance to , a Gaussian probability measure with mean [math] and covariance matrix . Obviously . By, e.g., Lemma 2.3 in Strasser (1985) and since is a sequence of tests, it follows from the total variation convergence established above that along we have
[TABLE]
where denotes expectation w.r.t. . We now claim that
[TABLE]
This claim, if true, then implies Equation (62) as the subsequence we started with was arbitrary. We first show that the sequence in the previous display converges to [math], when the expectation is taken w.r.t. instead of . To this end write
[TABLE]
From A.3 and the Dominated Convergence Theorem we obtain . It remains to show that for . By construction and A.2, however, we have . Therefore, the preceding display shows that . The statement hence follows from . Now, suppose (64) were false. Then, there would exist a subsequence of along which the sequence in (64) converges to , say. Since , there exists a subsequence of and a set such that , and such that for every it holds that (cf., e.g., Theorem 3.12 in Rudin (1987)). From positive-definiteness of it follows, however, that , and (by the Dominated Convergence Theorem) that , a contradiction.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Davies (1980) Davies, R. B. (1980). Algorithm AS 155: The distribution of a linear combination of χ 2 superscript 𝜒 2 \chi^{2} random variables. Journal of the Royal Statistical Society. Series C (Applied Statistics) 29 (3), 323–333.
- 2Fan et al. (2015) Fan, J., Y. Liao, and J. Yao (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83 (4), 1497–1541.
- 3Horn and Johnson (1985) Horn, R. A. and C. R. Johnson (1985). Matrix analysis . Cambridge: Cambridge University Press.
- 4King and Hillier (1985) King, M. L. and G. H. Hillier (1985). Locally best invariant tests of the error covariance matrix of the linear regression model. Journal of the Royal Statistical Society. Series B (Methodological) 47 , 98–102.
- 5Kleiber and Krämer (2005) Kleiber, C. and W. Krämer (2005). Finite-sample power of the Durbin-Watson test against fractionally integrated disturbances. Econometrics Journal 8 (3), 406–417.
- 6Kock and Preinerstorfer (2017) Kock, A. B. and D. Preinerstorfer (2017). Power in high-dimensional testing problems. ar Xiv preprint ar Xiv:1709.04418 .
- 7Krämer (1985) Krämer, W. (1985). The power of the Durbin-Watson test for regressions without an intercept. Journal of Econometrics 28 (3), 363 – 370.
- 8Krämer (2005) Krämer, W. (2005). Finite sample power of Cliff-Ord-type tests for spatial disturbance correlation in linear regression. Journal of Statistical Planning and Inference 128 (2), 489–496.
