Likelihood ratio tests for many groups in high dimensions
Holger Dette, Nina D\"ornemann

TL;DR
This paper studies the asymptotic behavior of likelihood ratio tests in high-dimensional models with many groups, deriving central limit theorems for test statistics as both the number of groups and data dimensions grow large.
Contribution
It introduces new asymptotic results for likelihood ratio tests in high-dimensional, multi-group settings, extending classical theory to many groups.
Findings
Derived CLTs for test statistics in high dimensions
Compared asymptotic distributions with two-step approximation
Provided insights into the behavior of likelihood ratio tests with many groups
Abstract
In this paper we investigate the asymptotic distribution of likelihood ratio tests in models with several groups, when the number of groups converges with the dimension and sample size to infinity. We derive central limit theorems for the logarithm of various test statistics and compare our results with the approximations obtained from a central limit theorem using a two step procedure: first consider the number of groups fixed and assume that the sample size and dimension converge to infinity, secondly investigating the resulting distribution if the number of groups converges to infinity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Likelihood ratio tests for many groups in high dimensions
Holger Dette, Nina Dörnemann
Fakultät für Mathematik
Ruhr-Universität Bochum
44799 Bochum, Germany
Abstract
In this paper we investigate the asymptotic distribution of likelihood ratio tests in models with several groups, when the number of groups converges with the dimension and sample size to infinity. We derive central limit theorems for the logarithm of various test statistics and compare our results with the approximations obtained from a central limit theorem where the number of groups is fixed.
Keywords: likelihood ratio test, high-dimensional inference
AMS subject classification: 62H15, 62H10
1 Introduction
Classical multivariate analysis tools as can be found in the text books of Muirhead, (1982) or Anderson, (1984) are developed under the paradigm that the dimension is substantially smaller than the sample size and do not yield to a reliable statistical inference if this assumption is not satisfied. Because modern datasets, as they occur in biostatistics, wireless communications and finance, are high-dimensional [see, e.g., Fan and Li, (2006), Johnstone, (2006) and references therein] there exists an enormous amount of literature developing statistical methods in the case where the dimension of the data is of comparable size (or even larger) than the sample size. Many authors have worked on this problem and a large part of the literature investigates the asymptotic properties of “classical” test procedures under the assumption that the dimension is proportional to the sample size [see Ledoit and Wolf, (2002); Fujikoshi et al., (2004); Birke and Dette, (2005); Schott, (2007); Bai et al., (2009); Chen et al., (2010); Kakizawa and Iwashita, (2008); Jiang and Yang, (2013); Jiang et al., (2013); Li and Qin, (2014); Wang, (2014); Jiang and Qi, (2015); Hyodo et al., (2015); Bao et al., (2017); Yamada et al., (2017); Yang and Pan, (2017); Han et al., (2017); Chen and Jiang, (2018); Chen and Liu, (2018); Bodnar et al., (2019) among many others].
In the case likelihood ratio tests are still well defined and it is shown in many papers that the asymptotic theory under the assumption yields a substantially better approximation for the nominal level of corresponding tests as classical asymptotic considerations keeping the dimension fixed. In this paper we continue this discussion and investigate approximations for the likelihood ratio test statistics in cases where high-dimensional inference has to be performed for a large number of groups. Consider, for example, the problem of testing if the covariance matrix of a -dimensional normal distributed vector has a block diagonal structure with blocks. In Figure 1 we show the -values of the corresponding likelihood ratio test [see Wilks, (1935)] under the null hypothesis with blocks of size (thus the total dimension is ) and sample size . The components of all vectors are independent identically standard normal distributed and thus the null hypothesis is obviously satisfied. The left panel shows the simulated -values (based on simulation runs) using the approximation provided by Jiang and Yang, (2013) considering the number of blocks as fixed, while the right panel shows the -values using an approximation with as derived in this paper. The two figures look very similarly.
On the other hand, in Figure 2 we show the -values of the likelihood ratio test for testing the equality of normal distributions and the null hypothesis [see Wilks, (1946)]. The sample size in each group is , (), the dimension is and different groups are considered (again all components of all vectors are independent identically standard normal distributed and simulation runs have been performed). The left panel of the figure shows the results obtained using the quantiles for the asymptotic distribution obtained for fixed [see Theorem 3 in Jiang and Yang, (2013)] while the right one corresponds to an asymptotic distribution derived in this paper under the assumption that (see Theorem 3.2 for more details). In this case we observe that the latter approach provides a better approximation of the nominal level.
The present paper is devoted to give some (partial) explanation of observations of this type. We consider classical testing problems in high-dimensional statistical inference, where data can be decomposed in groups, and investigate the asymptotic properties of likelihood ratio tests for various hypotheses if the dimension and the number of groups converge to infinity with increasing sample size. In all cases we establish the asymptotic normality of the log-likelihood ratio after appropriate standardization.
The work, which is most similar in spirit to our paper is the paper of Jiang and Yang, (2013), who considered the corresponding problems for a fixed number of groups. In contrast to these authors, who used the fact that the moment generating function of the log-likelihood ratio statistic can essentially be expressed as a product of ratios of Gamma functions, we use a central limit theorem for sums of a triangular array of independent random variables (see Theorem A.1 in Section 4) to establish asymptotic normality. This approach is also applicable for other high-dimensional problems. As an example, we revisit the problem of testing a linear hypothesis about regression coefficients as considered in Bai et al., (2013). These authors showed the asymptotic normality of the (standardized) log-likelihood ratio test statistic by using recent results about linear spectral statistics of large dimensional -matrices. With our approach we are able to extend their result and also provide a more handy representation of the asymptotic bias.
2 One sample problems
2.1 Testing for independence
A very prominent problem in high-dimensional data analysis is the problem of testing for the independence of sub-vectors of a multivariate normal distribution. To be precise, let denote a -dimensional normal distributed vector with mean and positive definite variance and assume that is decomposed as
[TABLE]
where are vectors of dimension () such that . Let
[TABLE]
denote the corresponding decomposition of the covariance matrix, where . The hypotheses of independent sub-vectors is formulated as
[TABLE]
Several authors have developed tests for the hypothesis (2.2) [see Jiang et al., (2013); Hyodo et al., (2015); Bao et al., (2017); Jiang and Qi, (2015); Yamada et al., (2017); Chen and Liu, (2018); Bodnar et al., (2019) among others], and in this section we focus on the likelihood ratio test based on a sample of independent identically distributed observations . Wilks, (1935) showed that the likelihood ratio statistic for the hypotheses (2.2) is given by
[TABLE]
where
[TABLE]
is the common estimator of the covariance matrix, the sample mean and denotes the block in the th row and th column of the estimate corresponding to the decomposition (2.1). The following result specifies the asymptotic distribution of the likelihood ratio test under the null hypothesis of independent blocks, if the number of blocks is increasing with the sample size. A proof can be found in Section 4.2. Here and throughout this paper the symbol denotes weak convergence.
Theorem 2.1**.**
If , , and , then under the null hypothesis (2.2)
[TABLE]
where
[TABLE]
Remark 2.1**.**
- (a)
Theorem 2.1 provides an asymptotic level test for the null hypothesis (2.2) by rejecting , whenever
[TABLE]
where denotes the -quantile of the standard normal distribution and
[TABLE]
- (b)
Jiang and Yang, (2013) derived the asymptotic distribution of the statistic in the case, where the number of groups is fixed, that is and the dimension is proportional to the sample size. In particular they showed that under the null hypothesis
[TABLE]
where is defined in (2.6) and
[TABLE]
(note that these authors use a slightly different notation). The corresponding asymptotic level test for the null hypothesis (2.2) rejects , whenever
[TABLE]
It is easy to see that under the assumptions of Theorem 2.1
[TABLE]
where is defined in (2.4). Moreover, recalling the definition of in (2.4) we obtain by a straightforward calculation
[TABLE]
These results explain why in Figure 1 the simulated -values of the likelihood ratio test (2.8) obtained by a central limit theorem with , fixed and the likelihood ratio test (2.5) obtained by a central limit theorem using are very similar.
2.2 Testing a linear hypothesis about regression coefficients
A further problem appears if the -dimensional (independent) random variables depend linearly on -dimensional regressors, say . To be precisely, assume (), where the covariance matrix is positive definite and are known design vectors such that the matrix has rank . Consider the decomposition
[TABLE]
with and matrices and , respectively, such that . We are interested in the hypothesis that the matrix coincides with a given matrix , that is
[TABLE]
The likelihood ratio test statistic for this hypothesis is given by
[TABLE]
where the matrices and are defined by
[TABLE]
respectively, and
[TABLE]
are the maximum likelihood estimators of under the null hypothesis and alternative, respectively [see Sugiura and Fujikoshi, (1969) or Anderson, (1984)]. Here we use the partition of the vector in vectors and of dimension and , respectively. In the following theorem, we present the asymptotic null distribution of the likelihood ratio test statistic for a general linear hypothesis (2.9) in a high-dimensional regression model, where the dimensions , , and increase with the sample size. A part of this result, namely the case , has been established by Bai et al., (2013) using random matrix theory. In contrast to these authors we are also able to deal with the case .
Theorem 2.2**.**
If , and , then under the null hypothesis (2.9)
[TABLE]
where
[TABLE]
Remark 2.2**.**
Bai et al., (2013) considered the testing problem (2.9) in a similar high-dimensional framework. Note that the authors use the negative log likelihood ratio test statistic , while Theorem 2.2 is formulated for . They made use of recent results about linear spectral statistics of large dimensional -matrices and require a more restrictive condition on the ratio to apply this theory, that is . To be more precise, Bai et al., (2013) proved that under the null hypothesis (2.9)
[TABLE]
whenever , and . For an explicit definition of the expression and , we refer the reader to formulas (26), (27) and (29) in their paper. Theorem 2.2 extends the result of Bai et al., (2013) to the case where and provides a simpler representation of the bias. Moreover, we have checked numerically that the standardizing terms in the central limit theorem stated in (2.12) and Theorem 2.2 behave similarly.
Consequently, the likelihood ratio tests based on the asymptotic distribution of Theorem 2.2 and Theorem 3.1 of Bai et al., (2013) have very similar properties. Numerical results, which confirm this observation are not displayed for the the sake of brevity.**
3 Some -sample problems
In this section we consider the comparison of normal distributions with mean vectors and covariance matrices , where for each group a sample of size is available () and the dimension and number of groups are increasing with the sample size.
3.1 Testing equality of several covariance matrices
An important assumption for multivariate analysis of variance (MANOVA) is that of equal covariances in the different groups. Thus we are interested in a test of the hypothesis
[TABLE]
This problem has been considered by several authors in the context of high-dimensional inference [see O’Brien, (1992), Schott, (2007), Srivastava and Yanagihara, (2010) or Jiang and Yang, (2013) among others].
In this section we add to this line of literature and investigate the asymptotic distribution of the likelihood ratio test based on samples of independent distributed observations , , when the number of groups is large, i.e. . To be precise, let be the total sample size, then the test statistic of the likelihood ratio test for the hypothesis (3.1) was derived by Wilks, (1932) and is given by
[TABLE]
where the matrices and are defined as
[TABLE]
As proposed by Bartlett, (1937) we consider the modified likelihood ratio test statistic
[TABLE]
where each sample size is substituted by its degree of freedom. Our next result deals with asymptotic distribution of the test statistic for an increasing dimension and an increasing number of groups.
Theorem 3.1**.**
Let for all , , , , assume that
[TABLE]
and that
[TABLE]
converges with a positive limit, say . Then, under the null hypothesis (3.1),
[TABLE]
where
[TABLE]
Remark 3.1**.**
- (a)
Note that under the assumptions of Theorem 3.1 the asymptotic distributions of and are not identical. In fact we have
[TABLE]
and in general this is not of order (consider for example the case , , ).
- (b)
Jiang and Yang, (2013) determined the asymptotic distribution of the statistic for a fixed number assuming that the limit does not vanish. In particular, they showed that under the null hypothesis (3.1)
[TABLE]
as , where the asymptotic bias and variance are given by
[TABLE]
respectively. As being fixed, the authors assumed for their result that for all and . Note that the order of standardization in Theorem 3.1 is different than in (3.4). The standardization is of order which is, under the assumptions of Theorem 3.1, substantially smaller than as used in (3.4). Comparing the variance in Theorem 3.1 with an adjusted version of (such that the different standardizations are canceled out) yields under the assumptions of Theorem 3.1
[TABLE]
On the other hand the difference of the means is given by
[TABLE]
Note that the first summand divided by the standardization vanishes under the assumptions of Theorem 3.1, while the other terms give a notable contribution to the expected value. Thus we expect that the corresponding likelihood ratio tests behave differently, if the number of groups is large.
3.2 Testing equality of several normal distributions
We consider the same setting as in Section 3.1 but this time want to test whether normal distributions are identical, that is,
[TABLE]
The test statistic of the likelihood ratio test for the hypothesis (3.5) is given by
[TABLE]
where the matrix is defined in (3.3) and
[TABLE]
Note that is the product of the likelihood ratio statistic in (3.2) for testing equality of covariance matrices and the likelihood ratio test statistic for testing equality of the means [see Yao et al., (2015), Gregory et al., (2015)]. Several authors dealt with testing [e.g. see Wilks, (1946), Jiang and Yang, (2013)]. The following result specifies the asymptotic distribution of the statistic for increasing dimension and an increasing number of groups.
Theorem 3.2**.**
Let for all , , , , assume that
[TABLE]
and that
[TABLE]
converges to a positive limit, say Then, under the null hypothesis (3.5) we have
[TABLE]
where
[TABLE]
Remark 3.2**.**
The asymptotic distribution of the statistic in the case where is fixed was determined in Theorem 3 of Jiang and Yang, (2013) who showed that under the null hypothesis (3.5)
[TABLE]
if (). Here the asymptotic bias and variance are given by
[TABLE]
(note that these authors use a slightly different notation). It is important to note that the orders in the standardizations in both results are different. While the standardizing factor in (3.7) is of order , it is of order in Theorem 3.2.
Similarly, as in Remark 3.1 it can be shown that
[TABLE]
under the assumptions of Theorem 3.2, while in general the difference is not of order (consider, for example, the case , , , ). Based on these observations we expect differences in the likelihood ratio test, if the quantiles from the normal approximation for fixed as derived Jiang and Yang, (2013) or the quantiles from Theorem 3.2 are used as critical values. This is illustrated in Figure 3b and 4, where we display the simulated -values for the tests under the null hypothesis. In Figure 3b we consider the case , and a relatively small number of groups . We observe that both approximations yield histograms close to the expected uniform distribution. On the other hand in Figure 4 we consider the cases , and and and we observe larger differences in both approximations. In particular the critical values derived from Theorem 3.2 yield a likelihood ratio test for the hypothesis (3.1) with a better performance than the test using the quantiles from fixed asymptotics. **
4 Some proofs
In this section we present proofs of our results, where we restrict ourselves to the proofs of Theorem 2.1 and 2.2. The other statements are shown by similar arguments, which are omitted for the sake of brevity.
4.1 A central limit theorem
We begin stating a central limit theorem, which is used in the proofs of Theorem 2.1 - 3.2. We make extensive use of a central limit theorem for triangular array of independent random variables, which follows by similar arguments as given in Dette and Tomecki, (2019). Therefore the proof is omitted.
Theorem A.1**.**
Let be a sequence of finite sets denote an array of random variables and an array of weights satisfying the following conditions:
- (A.1)
The random variables are independent for all . 2. (A.2)
The random variables are centered, that is, . 3. (A.3)
* for some universal constant and for all .* 4. (A.4)
. 5. (A.5)
There exists a constant such that
[TABLE]
Then the random variable
[TABLE]
converges in distribution to a normal distribution with mean [math] and variance .
4.2 Proof of Theorem 2.1
Define
[TABLE]
and note that under the null hypothesis (2.2) the distribution of is given by a product of independent Beta-distributions [see Anderson, (1984)], that is
[TABLE]
where the random variables
[TABLE]
are independent and . Consequently, with the notation
[TABLE]
the assertion follows from
[TABLE]
where and are defined in (2.3) and (2.4), respecively.
For a proof of this statement we use Theorem A.1 and show that the conditions (A.1)-(A.5) in this result are satisfied. We begin with a calculation of the variance of noting that the variance of logarithm of a Beta distributed random variable is given by
[TABLE]
where () denotes the polygamma function of order [see Abramowitz and Stegun, (1964)]. This yields
[TABLE]
where
[TABLE]
Observing the expansion for the logarithmic Gamma function of order , [see Abramowitz and Stegun, (1964)] we obtain for the first term
[TABLE]
where we used the expansion
[TABLE]
(here denotes the Euler-Mascheroni constant). For the second term we use the same expansion as above and obtain
[TABLE]
where we used for last equality that for sufficiently large and the fact that for all . Observing the expansion for the harmonic series in (A.5) it follows that
[TABLE]
Here the last equality is a consequence of the theorem of the dominated convergence [see Kallenberg, (1997), Theorem 1.21]. Combining (A.3), (A.4) and (A.7) finally shows
[TABLE]
which yields the asymptotic variance in (2.3) and assumption (A.5) in Theorem A.1.
Moreover, as the function is positive and decreasing we have
[TABLE]
(note that ), which proves (A.4). The conditions (A.1) und (A.2) are obviously satisfied and the remaining inequality (A.3) for the moments is a a consequence of Lemma A.7 and Theorem A.8. in Dette and Tomecki, (2019). Consequently, we obtain from Theorem A.1 the weak convergence
[TABLE]
and remains to calculate the representation of the expectation. For this purpose note that
[TABLE]
where
[TABLE]
Observing the expansion and (A.5) we obtain
[TABLE]
and
[TABLE]
where we used similar arguments as in the derivation of (A.6) and (A.7). Combining these results with (A.8) finally yields
[TABLE]
Finally an application of Stirling’s formula
[TABLE]
yields for the first term
[TABLE]
and the representation for the expectation in (2.4) follows, completes the proof Theorem 2.1.
4.3 Proof of Theorem 2.2
Define and note that under the null hypothesis, the distribution of is given by a product of independent Beta distributions [see Anderson, (1984)], that is,
[TABLE]
where the random variables
[TABLE]
Now consider the transformation
[TABLE]
then the assertion follows from
[TABLE]
where and are defined in (2.10) and (2.11). In order to prove the asymptotic normality of , we show that the conditions (A.1)-(A.5) hold, beginning with a derivation of the variance.
[TABLE]
Regarding the error terms, note that the assumptions of Theorem 2.2 imply and . We continue expanding the expected value
[TABLE]
Furthermore, we have
[TABLE]
which is condition (A.4). Obviously, (A.1) and (A.2) are also satisfied. The inequality for the moments in (A.3) follows from Lemma A.7 and Theorem A.8 in Dette and Tomecki, (2019). Therefore, all conditions (A.1)-(A.5) are satisfied and the assertion follows from Theorem A.1.
Acknowledgements. The authors would like to thank M. Stein who typed this manuscript with considerable technical expertise. The work of H. Dette and N. Dörnemann was partially supported by the Deutsche Forschungsgemeinschaft (DFG Research Unit 1735, DE 502/26-2, RTG 2131).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abramowitz and Stegun, (1964) Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables . Dover, New York, ninth dover printing, tenth gpo printing edition.
- 2Anderson, (1984) Anderson, T. (1984). An Introduction to Multivariate Statistical Analysis . Wiley Series in Probability and Statistics - Applied Probability and Statistics Section Series. Wiley.
- 3Bai et al., (2009) Bai, Z., Jiang, D., Yao, J.-F., and Zheng, S. (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Annals of Statistics , 37:3822–3840.
- 4Bai et al., (2013) Bai, Z., Jiang, D., Yao, J.-F., and Zheng, S. (2013). Testing linear hypotheses in high-dimensional regressions. A Journal of Theoretical and Applied Statistics , 47:1207–1223.
- 5Bao et al., (2017) Bao, Z., Hu, J., Pan, G., and Zhou, W. (2017). Test of independence for high-dimensional random vectors based on freeness in block correlation matrices. Electron. J. Statist. , 11(1):1527–1548.
- 6Bartlett, (1937) Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. R. Soc. London Ser. A , 160:268–282.
- 7Birke and Dette, (2005) Birke, M. and Dette, H. (2005). A note on testing the covariance matrix for large dimension. Statistics & Probability Letters , 74:281–289.
- 8Bodnar et al., (2019) Bodnar, T., Dette, H., and Parolya, N. (2019). Testing for independence of large dimensional vectors. To appear in:Annals of Statistics, ar Xiv:1708.03964 v 3 .
