Replicability: Terminology, Measuring Success, and Strategy
Werner A. Stahel (ETH Zurich, Switzerland)

TL;DR
This paper discusses the importance of clear terminology, relevance thresholds, and strategies for successful scientific replication, emphasizing effect size estimation and heterogeneity in empirical research.
Contribution
It clarifies key issues in replication, proposing a focus on effect size relevance and heterogeneity, and offers a strategy to improve replicability in empirical science.
Findings
Effect size estimation with precision is crucial for scientific claims.
Testing for effect relevance is more meaningful than zero-effect tests.
Multiple replications are needed to assess heterogeneity accurately.
Abstract
Empirical science needs to be based on facts and claims that can be reproduced. This calls for replicating the studies that proclaim the claims, but practice in most fields still fails to implement this idea. When such studies emerged in the past decade, the results were generally disappointing. There have been an overwhelming number of papers addressing the ``reproducibility crisis'' in the last 20 years. Nevertheless, terminology is not yet settled, and there is no consensus about when a replication should be called successful. This paper intends to clarify such issues. A fundamental problem in empirical science is that usual claims only state that effects are non-zero, and such statements are scientifically void. An effect must have a \emph{relevant} size to become a reasonable item of knowledge. Therefore, estimation of an effect, with an indication of precision, forms a substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
