Hotelling's test for highly correlated data
Peter Bubeliny

TL;DR
This paper investigates the properties of Hotelling's test applied to highly correlated gene expression data, revealing unexpected behaviors and conditions under which the test maximizes power, especially in the context of differential gene expression analysis.
Contribution
It uncovers novel properties of Hotelling's test in highly correlated data, showing how correlation affects its power and optimal conditions for detecting differential expression.
Findings
Hotelling's test does not always have maximum power when all marginals differ.
Maximum power occurs when about half of the marginals are different in highly correlated data.
Increasing correlation enhances the test's power, especially when only one marginal is shifted.
Abstract
This paper is motivated by the analysis of gene expression sets, especially by finding differentially expressed gene sets between two phenotypes. Gene expression levels are highly correlated and, very likely, have approximately normal distribution. Therefore, it seems reasonable to use two-sample Hotelling's test for such data. We discover some unexpected properties of the test making it different from the majority of tests previously used for such data. It appears that the Hotelling's test does not always reach maximal power when all marginal distributions are differentially expressed. For highly correlated data its maximal power is attained when about a half of marginal distributions are essentially different. For the case when the correlation coefficient is greater than 0.5 this test is more powerful if only one marginal distribution is shifted, omparing to the case when all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Gene Regulatory Network Analysis · Bioinformatics and Genomic Networks
