Distributions associated with simultaneous multiple hypothesis testing
Chang Yu, Daniel Zelterman

TL;DR
This paper derives the distribution of significant hypotheses identified by FDR control procedures, introduces a parametric p-value distribution model, and demonstrates its application in cancer studies with dependence modeling for power analysis.
Contribution
It provides a new distributional framework for understanding the number of discoveries in multiple testing, including dependence and non-uniform alternative hypotheses.
Findings
Distribution of significant hypotheses approximates a mixture of normal and Borel-Tanner distributions.
The proposed parametric distribution fits p-value data from cancer studies.
Dependence among p-values can be modeled with copulas and latent variables.
Abstract
We develop the distribution of the number of hypotheses found to be statistically significant using the rule from Benjamini and Hochberg (1995) for controlling the false discovery rate (FDR). This distribution has both a small sample form and an asymptotic expression for testing many independent hypotheses simultaneously. We propose a parametric distribution to approximate the marginal distribution of p-values under a non-uniform alternative hypothesis. This distribution is useful when there are many different alternative hypotheses and these are not individually well understood. We fit to data from three cancer studies and use it to illustrate the distribution of the number of notable hypotheses observed in these examples. We model dependence of sampled p-values using a copula model and a latent variable approach. These methods can be combined to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Statistical Methods and Bayesian Inference · Optimal Experimental Design Methods
