
TL;DR
This paper reviews the historical development, current debates, and potential reforms of statistical significance testing, highlighting its evolution, controversies, and alternative approaches.
Contribution
It critically examines recent calls for reform in significance testing, analyzing their strengths and limitations based on historical and contemporary perspectives.
Findings
Significance testing has evolved from Fisher to Neyman-Pearson frameworks.
Recent debates focus on thresholds, null hypothesis dichotomy, and alternative methods.
Reforms propose abandoning thresholds and adopting Bayesian or confidence interval approaches.
Abstract
Since its introduction by Fisher, the method of hypothesis testing that relies on computing error probabilities has witnessed several developments. Perhaps the most significant development was the seminal contributions of Neyman and Pearson who brought in the concept of the alternative hypothesis with its corresponding error of the second kind. Significance tests have played a major role in various scientific and technological developments, but not without controversies. Although originally cast as frequentist approaches, Bayesian ideas have been incorporated into significance tests, widening access to them. The quantities central to computations of error probabilities are the sampling distributions, which can be computed even without thresholds or alternative hypotheses. Even though Fisher used the significance threshold of 0.05 in his calculations, he cautioned against prescribing any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
