A two-sample test for high-dimensional data with applications to gene-set testing
Song Xi Chen, Ying-Li Qin

TL;DR
This paper introduces a new two-sample test designed for high-dimensional data where traditional methods fail, providing a flexible approach for gene-set significance testing in genomics.
Contribution
It proposes a novel two-sample test that works without strict conditions on data dimension and sample size, suitable for high-dimensional gene-set analysis.
Findings
Effective in high-dimensional gene-set testing
Applicable to leukemia gene expression data
Does not require explicit p-n relationship
Abstract
We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical test does not work for this "large , small " situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Molecular Biology Techniques and Applications
