A high-dimensional two-sample test for the mean using random subspaces
M{\aa}ns Thulin

TL;DR
This paper introduces a permutation-based high-dimensional two-sample mean test using random subspaces, improving power and invariance over existing methods, especially for dependent gene expression data.
Contribution
It develops a new invariant test based on random subspaces and permutation p-values, addressing limitations of previous methods in high-dimensional, dependent data settings.
Findings
The new test outperforms existing methods in simulation studies.
It maintains invariance under linear transformations.
The R implementation enables efficient application to gene expression data.
Abstract
A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most high-dimensional tests for the equality of two mean vectors rely on naive diagonal or trace estimators of the covariance matrix, ignoring dependencies between variables. A test recently proposed by Lopes et al. (2012) implicitly incorporates dependencies by using random pseudo-projections to a lower-dimensional space. Their test offers higher power when the variables are dependent, but lacks desirable invariance properties and relies on asymptotic p-values that are too conservative. We illustrate how a permutation approach can be used to obtain p-values for the Lopes et al. test and how modifying the test using random subspaces leads to a test statistic that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Bayesian Methods and Mixture Models
