Detecting p-hacking
Graham Elliott, Nikolay Kudrin, Kaspar Wuthrich

TL;DR
This paper develops new statistical tests to detect p-hacking by analyzing the distribution of p-values across studies, incorporating power functions and publication bias considerations, with demonstrated effectiveness on real datasets.
Contribution
The paper introduces novel testable restrictions for p-value distributions under p-hacking, especially for t-tests, enhancing detection power over existing methods.
Findings
New tests based on shape restrictions of p-value distributions.
Tests are effective even with publication bias.
Reanalysis confirms practical utility of the methods.
Abstract
We theoretically analyze the problem of testing for -hacking based on distributions of -values across multiple studies. We provide general results for when such distributions have testable restrictions (are non-increasing) under the null of no -hacking. We find novel additional testable restrictions for -values based on -tests. Specifically, the shape of the power functions results in both complete monotonicity as well as bounds on the distribution of -values. These testable restrictions result in more powerful tests for the null hypothesis of no -hacking. When there is also publication bias, our tests are joint tests for -hacking and publication bias. A reanalysis of two prominent datasets shows the usefulness of our new tests.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Bayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI)
