Diagnosing the role of observable distribution shift in scientific replications
Ying Jin, Kevin Guo, Dominik Rothenh\"ausler

TL;DR
This paper investigates how much observable distribution shifts explain discrepancies in scientific replications, finding that such shifts often do not significantly contribute, with other factors like noise and moderators playing roles.
Contribution
It introduces a method to decompose replication effect size differences into components, highlighting the limited role of observable distribution shifts in non-replicability.
Findings
Observable distribution shifts often do not explain replication failures.
Statistical noise can obscure the impact of distribution shifts.
Controlling for additional moderators may improve replication reliability.
Abstract
Many researchers have identified distribution shift as a likely contributor to the reproducibility crisis in behavioral and biomedical sciences. The idea is that if treatment effects vary across individual characteristics and experimental contexts, then studies conducted in different populations will estimate different average effects. This paper uses ``generalizability" methods to quantify how much of the effect size discrepancy between an original study and its replication can be explained by distribution shift on observed unit-level characteristics. More specifically, we decompose this discrepancy into ``components" attributable to sampling variability (including publication bias), observable distribution shifts, and residual factors. We compute this decomposition for several directly-replicated behavioral science experiments and find little evidence that observable distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics
