The Power of Tests for Detecting $p$-Hacking

Graham Elliott; Nikolay Kudrin; Kaspar W\"uthrich

arXiv:2205.07950·econ.EM·August 12, 2025·1 cites

The Power of Tests for Detecting $p$-Hacking

Graham Elliott, Nikolay Kudrin, Kaspar W\"uthrich

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the effectiveness of statistical tests in detecting p-hacking by examining how different hacking strategies influence p-value distributions and identifying the most powerful detection methods.

Contribution

It provides a theoretical assessment of the power of various tests for detecting p-hacking, highlighting the influence of hacking strategies and true effect distributions.

Findings

01

Power of detection tests varies with hacking strategies

02

Combined tests for bounds and monotonicity are most effective

03

Detection power depends on true effect distribution

Abstract

A flourishing empirical literature investigates the prevalence of $p$ -hacking based on the distribution of $p$ -values across studies. Interpreting results in this literature requires a careful understanding of the power of methods for detecting $p$ -hacking. We theoretically study the implications of likely forms of $p$ -hacking on the distribution of $p$ -values to understand the power of tests for detecting it. Power can be low and depends crucially on the $p$ -hacking strategy and the distribution of true effects. Combined tests for upper bounds and monotonicity and tests for continuity of the $p$ -curve tend to have the highest power for detecting $p$ -hacking.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvkudrin/phackingpower
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security