PaperRepro: Automated Computational Reproducibility Assessment for Social Science Papers
Linhao Zhang, Tong Xia, Jinghua Piao, Lizhen Cui, Yong Li

TL;DR
PaperRepro introduces a two-stage multi-agent system that automates the assessment of computational reproducibility in social science papers, significantly improving accuracy over previous methods.
Contribution
It presents a novel multi-agent approach that separates execution from evaluation, enhancing reproducibility assessment with task-specific tools and expert prompts.
Findings
Achieves 21.9% improvement in score-agreement accuracy on REPRO-Bench
Introduces REPRO-Bench-S for more diagnostic evaluation
Maximizes language model coding capabilities for better result capture
Abstract
Computational reproducibility is essential for the credibility of scientific findings, particularly in the social sciences, where findings often inform real-world decisions. Manual reproducibility assessment is costly and time-consuming, as it is nontrivial to reproduce the reported findings using the authors' released code and data. Recent advances in large models (LMs) have inspired agent-based approaches for automated reproducibility assessment. However, existing approaches often struggle due to limited context capacity, inadequate task-specific tooling, and insufficient result capture. To address these, we propose PaperRepro, a novel two-stage, multi-agent approach that separates execution from evaluation. In the execution stage, agents execute the reproduction package and edit the code to capture reproduced results as explicit artifacts. In the evaluation stage, agents evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Biomedical Text Mining and Ontologies · Data Visualization and Analytics
