The Power of Tests for Detecting $p$-Hacking
Graham Elliott, Nikolay Kudrin, Kaspar W\"uthrich

TL;DR
This paper analyzes the effectiveness of statistical tests in detecting p-hacking by examining how different hacking strategies influence p-value distributions and identifying the most powerful detection methods.
Contribution
It provides a theoretical assessment of the power of various tests for detecting p-hacking, highlighting the influence of hacking strategies and true effect distributions.
Findings
Power of detection tests varies with hacking strategies
Combined tests for bounds and monotonicity are most effective
Detection power depends on true effect distribution
Abstract
A flourishing empirical literature investigates the prevalence of -hacking based on the distribution of -values across studies. Interpreting results in this literature requires a careful understanding of the power of methods for detecting -hacking. We theoretically study the implications of likely forms of -hacking on the distribution of -values to understand the power of tests for detecting it. Power can be low and depends crucially on the -hacking strategy and the distribution of true effects. Combined tests for upper bounds and monotonicity and tests for continuity of the -curve tend to have the highest power for detecting -hacking.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security
