A high-dimensional two-sample test for the mean using random subspaces

M{\aa}ns Thulin

arXiv:1304.4564·stat.ME·March 11, 2015·Comput. Stat. Data Anal.

A high-dimensional two-sample test for the mean using random subspaces

M{\aa}ns Thulin

PDF

Open Access

TL;DR

This paper introduces a permutation-based high-dimensional two-sample mean test using random subspaces, improving power and invariance over existing methods, especially for dependent gene expression data.

Contribution

It develops a new invariant test based on random subspaces and permutation p-values, addressing limitations of previous methods in high-dimensional, dependent data settings.

Findings

01

The new test outperforms existing methods in simulation studies.

02

It maintains invariance under linear transformations.

03

The R implementation enables efficient application to gene expression data.

Abstract

A common problem in genetics is that of testing whether a set of highly dependent gene expressions differ between two populations, typically in a high-dimensional setting where the data dimension is larger than the sample size. Most high-dimensional tests for the equality of two mean vectors rely on naive diagonal or trace estimators of the covariance matrix, ignoring dependencies between variables. A test recently proposed by Lopes et al. (2012) implicitly incorporates dependencies by using random pseudo-projections to a lower-dimensional space. Their test offers higher power when the variables are dependent, but lacks desirable invariance properties and relies on asymptotic p-values that are too conservative. We illustrate how a permutation approach can be used to obtain p-values for the Lopes et al. test and how modifying the test using random subspaces leads to a test statistic that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Bayesian Methods and Mixture Models