What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction
Lehui Li, Ruining Wang, Haochen Song, Yaoxin Mao, Tong Zhang, Yuyao Wang, Jiayi Fan, Yitong Zhang, Jieping Ye, Chengqi Zhang, Yongshun Gong

TL;DR
This paper introduces extbackslash method, a graph-based framework that recovers implicit tacit knowledge from academic papers to improve automated reproduction of research code, achieving significant performance gains.
Contribution
It formalizes the challenge of recovering tacit knowledge in paper reproduction and proposes a novel graph-based agent framework with specialized mechanisms for relational, somatic, and collective knowledge recovery.
Findings
extbackslash method reduces the performance gap by 10.04",
It outperforms the strongest baseline by 24.68",
Achieves consistent improvements across multiple domains and tasks.
Abstract
Automated paper reproduction -- generating executable code from academic papers -- is bottlenecked not by information retrieval but by the tacit knowledge that papers inevitably leave implicit. We formalize this challenge as the progressive recovery of three types of tacit knowledge -- relational, somatic, and collective -- and propose \method, a graph-based agent framework with a dedicated mechanism for each: node-level relation-aware aggregation recovers relational knowledge by analyzing implementation-unit-level reuse and adaptation relationships between the target paper and its citation neighbors; execution-feedback refinement recovers somatic knowledge through iterative debugging driven by runtime signals; and graph-level knowledge induction distills collective knowledge from clusters of papers sharing similar implementations. On an extended ReproduceBench spanning 3 domains, 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Web Data Mining and Analysis
