Agent-Based Software Artifact Evaluation
Zhaonan Wu, Yanjie Zhao, Zhenpeng Chen, Zheng Wang, Haoyu Wang

TL;DR
This paper introduces ArtifactCopilot, an automated agent-based framework that significantly reduces human effort in software artifact evaluation by automating environment setup, execution, and error recovery, achieving high accuracy and low cost.
Contribution
We propose ArtifactCopilot, the first end-to-end agent-based system for automated artifact evaluation in software engineering, addressing scalability challenges in manual review processes.
Findings
Matches human evaluation outcomes for 85.42% of artifacts
Outperforms Claude Code by 52.09 percentage points
Costs only $0.091 per artifact on average
Abstract
Artifact evaluation has been adopted in the Software Engineering (SE) research community for 15 years, substantially improving research reproducibility across major SE conferences. However, this success has introduced a growing scalability challenge, as artifact evaluation relies heavily on reviewers' manual execution and debugging, leading to escalating human effort amid rapidly increasing paper submissions. To address this problem, we investigate automated artifact evaluation. We first conduct a preliminary study on artifacts from top-tier SE conferences and identify three key challenges: perceiving execution states, maintaining stable execution environments, and recovering from execution errors. Inspired by these findings, we propose ArtifactCopilot, the first end-to-end agent-based framework for automated artifact evaluation. ArtifactCopilot automates environment construction,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software System Performance and Reliability
