Artisan: Agentic Artifact Evaluation
Doehyun Baek, Michael Pradel

TL;DR
Artisan is an automated LLM agent designed to generate reproduction scripts for research artifacts in software engineering, improving reproducibility and evaluation efficiency through a novel framing and judging mechanism.
Contribution
We introduce Artisan, an automated agent that formulates artifact reproduction as a code generation task and includes an automated judging system, along with the first benchmark for artifact evaluation in software engineering.
Findings
Artisan successfully generated 44 out of 60 reproduction scripts.
It outperformed baseline models by 3.14 times in script generation.
Uncovered 20 new errors in papers or artifacts.
Abstract
Artifact evaluation has become standard practice in the software engineering community to ensure the reproducibility of research results. However, the current manual process is labor-intensive, and hence, done only as a one-time assessment for a subset of all papers. To support the artifact evaluation effort, we present Artisan, an automated LLM agent for reproducing research results given a paper and its artifact. The approach is enabled by two key contributions: First, we frame the reproduction problem as a code generation task where the goal is to generate a reproduction script that, when executed, reproduces the results reported in a paper. Unlike prior work on automatically reproducing research results in other domains, this formulation allows for running the script independently of the agent and for assessing the reproduction process at a fine-grained level. Second, we design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Engineering Techniques and Practices
