A Short Note on P-Value Hacking

Nassim Nicholas Taleb

arXiv:1603.07532·stat.AP·January 29, 2018·1 cites

A Short Note on P-Value Hacking

Nassim Nicholas Taleb

PDF

Open Access

TL;DR

This paper analyzes how p-value hacking affects the interpretation of statistical tests, revealing that the minimum p-value obtained through multiple tests can be highly skewed and volatile, impacting reproducibility and meta-analysis.

Contribution

It provides an exact distribution for p-values under multiple testing scenarios, highlighting the extreme skewness and volatility caused by p-hacking and small sample sizes.

Findings

01

P-values are highly skewed and volatile across repetitions.

02

Minimum p-values can significantly underestimate the true p-value.

03

Increasing sample size or lowering p-value threshold reduces volatility.

Abstract

We present the expected values from p-value hacking as a choice of the minimum p-value among $m$ independents tests, which can be considerably lower than the "true" p-value, even with a single trial, owing to the extreme skewness of the meta-distribution. We first present an exact probability distribution (meta-distribution) for p-values across ensembles of statistically identical phenomena. We derive the distribution for small samples $2 < n \leq n^{*} \approx 30$ as well as the limiting one as the sample size $n$ becomes large. We also look at the properties of the "power" of a test through the distribution of its inverse for a given p-value and parametrization. The formulas allow the investigation of the stability of the reproduction of results and "p-hacking" and other aspects of meta-analysis. P-values are shown to be extremely skewed and volatile, regardless of the sample size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMeta-analysis and systematic reviews · Statistical Methods in Clinical Trials · Data Analysis with R