Caveats for using statistical significance tests in research assessments
Jesper W. Schneider

TL;DR
This paper critically examines the use of statistical significance tests in research assessments, highlighting their limitations, misconceptions, and potential harm, and advocates for reform in their application within scientometrics.
Contribution
The paper provides a detailed critique of statistical significance tests in research assessments and argues against their mechanical use, promoting a need for methodological reform.
Findings
Statistical significance tests are often misused and misunderstood.
Their application can be detrimental to critical thinking and decision making.
The paper advocates for reform and cautious use of significance tests in research evaluations.
Abstract
This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical significance tests are highly controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice of such tests, their dichotomous application in decision making, the difference between statistical and substantive significance, the implausibility of most null hypotheses, the crucial assumption of randomness, as well as the utility of standard errors and confidence intervals for inferential purposes. We argue that applying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
