
TL;DR
This paper argues that many replication studies in software engineering are uninformative due to wide prediction intervals, and advocates for meta-analysis instead of replication to better estimate effects.
Contribution
It demonstrates through simulation that replication often confirms results without adding meaningful knowledge, and promotes meta-analysis as a more effective approach.
Findings
Most replications are confirmatory due to wide prediction intervals.
Replication efforts with under-powered studies are often scientifically wasteful.
Meta-analysis provides a better estimate of the true effect size.
Abstract
CONTEXT: There is growing interest in establishing software engineering as an evidence-based discipline. To that end, replication is often used to gain confidence in empirical findings, as opposed to reproduction where the goal is showing the correctness, or validity of the published results. OBJECTIVE: To consider what is required for a replication study to confirm the original experiment and apply this understanding in software engineering. METHOD: Simulation is used to demonstrate why the prediction interval for confirmation can be surprisingly wide. This analysis is applied to three recent replications. RESULTS: It is shown that because the prediction intervals are wide, almost all replications are confirmatory, so in that sense there is no 'replication crisis', however, the contributions to knowledge are negligible. CONCLUSIONS: Replicating empirical software engineering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
