A More Powerful Two-Sample Test in High Dimensions using Random Projection
Miles E. Lopes, Laurent J. Jacob, Martin J. Wainwright

TL;DR
This paper introduces a new high-dimensional two-sample test combining random projection with Hotelling's T^2, demonstrating superior power and false positive control in simulations and gene expression data.
Contribution
It proposes a novel test statistic that integrates random projection with Hotelling's T^2 for high-dimensional mean comparison, with theoretical power analysis and empirical validation.
Findings
The new test outperforms existing methods in power in high-dimensional regimes.
The test maintains controlled false positive rates in gene expression data.
Empirical results confirm theoretical advantages in synthetic and real datasets.
Abstract
We consider the hypothesis testing problem of detecting a shift between the means of two multivariate normal distributions in the high-dimensional setting, allowing for the data dimension p to exceed the sample size n. Specifically, we propose a new test statistic for the two-sample test of means that integrates a random projection with the classical Hotelling T^2 statistic. Working under a high-dimensional framework with (p,n) tending to infinity, we first derive an asymptotic power function for our test, and then provide sufficient conditions for it to achieve greater power than other state-of-the-art tests. Using ROC curves generated from synthetic data, we demonstrate superior performance against competing tests in the parameter regimes anticipated by our theoretical results. Lastly, we illustrate an advantage of our procedure's false positive rate with comparisons on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Gene expression and cancer classification
