The more you test, the more you find: Smallest P-values become increasingly enriched with real findings as more tests are conducted
Olga A. Vsevolozhskaya, Chia-Ling Kuo, Gabriel Ruiz, Luda Diatchenko,, Dmitri V. Zaykin

TL;DR
This paper demonstrates that as the number of statistical tests increases, the smallest P-values become more enriched with true associations, highlighting a statistical phenomenon relevant to large-scale data analysis.
Contribution
The study quantifies how the proportion of genuine signals among top P-values increases with the number of tests, providing insights into large-scale hypothesis testing.
Findings
Smallest P-values become more likely to be genuine as tests increase.
Enrichment of true signals occurs when the rate of genuine signals remains relatively stable.
Implications for interpreting results in large-scale genetic and data analyses.
Abstract
Increasing accessibility of data to researchers makes it possible to conduct massive amounts of statistical testing. Rather than follow a carefully crafted set of scientific hypotheses with statistical analysis, researchers can now test many possible relations and let P-values or other statistical summaries generate hypotheses for them. Genetic epidemiology field is an illustrative case in this paradigm shift. Driven by technological advances, testing a handful of genetic variants in relation to a health outcome has been abandoned in favor of agnostic screening of the entire genome, followed by selection of top hits, e.g., by selection of genetic variants with the smallest association P-values. At the same time, nearly total lack of replication of claimed associations that has been shaming the field turned to a flow of reports whose findings have been robustly replicating. Researchers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Associations and Epidemiology · Statistical Methods in Clinical Trials · Gene expression and cancer classification
