The statistical significance filter leads to overconfident expectations of replicability
Shravan Vasishth, Andrew Gelman

TL;DR
This paper demonstrates that relying solely on the statistical significance filter (p<0.05) inflates expectations of replicability, leading to overconfidence and an illusion of robustness in scientific findings.
Contribution
It analytically and empirically shows how the significance filter causes overestimation of power and replicability, highlighting a bias in published research.
Findings
Significance filter inflates perceived replicability.
Low true power leads to overestimated power from significant results.
Case study confirms illusion of replicability from significance-based assessments.
Abstract
We show that publishing results using the statistical significance filter---publishing only when the p-value is less than 0.05---leads to a vicious cycle of overoptimistic expectation of the replicability of results. First, we show analytically that when true statistical power is relatively low, computing power based on statistically significant results will lead to overestimates of power. Then, we present a case study using 10 experimental comparisons drawn from a recently published meta-analysis in psycholinguistics (J\"ager et al., 2017). We show that the statistically significant results yield an illusion of replicability. This illusion holds even if the researcher doesn't conduct any formal power analysis but just uses statistical significance to informally assess robustness (i.e., replicability) of results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Opinion Dynamics and Social Influence · Computational and Text Analysis Methods
