Statistical Significance Revisited

Reason Machete

arXiv:2605.06568·stat.OT·May 8, 2026

Statistical Significance Revisited

Reason Machete

PDF

TL;DR

This paper reviews the historical development, current debates, and potential reforms of statistical significance testing, highlighting its evolution, controversies, and alternative approaches.

Contribution

It critically examines recent calls for reform in significance testing, analyzing their strengths and limitations based on historical and contemporary perspectives.

Findings

01

Significance testing has evolved from Fisher to Neyman-Pearson frameworks.

02

Recent debates focus on thresholds, null hypothesis dichotomy, and alternative methods.

03

Reforms propose abandoning thresholds and adopting Bayesian or confidence interval approaches.

Abstract

Since its introduction by Fisher, the method of hypothesis testing that relies on computing error probabilities has witnessed several developments. Perhaps the most significant development was the seminal contributions of Neyman and Pearson who brought in the concept of the alternative hypothesis with its corresponding error of the second kind. Significance tests have played a major role in various scientific and technological developments, but not without controversies. Although originally cast as frequentist approaches, Bayesian ideas have been incorporated into significance tests, widening access to them. The quantities central to computations of error probabilities are the sampling distributions, which can be computed even without thresholds or alternative hypotheses. Even though Fisher used the significance threshold of 0.05 in his calculations, he cautioned against prescribing any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.