Evaluating and Improving Automated Repository-Level Rust Issue Resolution with LLM-based Agents
Jiahong Xiang, Wenxiao He, Xihua Wang, Hongliang Tian, Yuqun Zhang

TL;DR
This paper introduces Rust-SWE-bench, a large benchmark for Rust issue resolution, evaluates LLM-based agents' performance, identifies key challenges, and proposes RUSTFORGER, a novel agent that significantly improves resolution success.
Contribution
The paper presents Rust-SWE-bench, the first large-scale Rust repository benchmark, and introduces RUSTFORGER, a new agentic approach that outperforms existing methods in resolving issues.
Findings
ReAct-style agents resolve up to 21.2% of issues.
Issue reproduction is critical for task resolution.
RUSTFORGER resolves 28.6% of tasks, a 34.9% improvement.
Abstract
The Rust programming language presents a steep learning curve and significant coding challenges, making the automation of issue resolution essential for its broader adoption. Recently, LLM-powered code agents have shown remarkable success in resolving complex software engineering tasks, yet their application to Rust has been limited by the absence of a large-scale, repository-level benchmark. To bridge this gap, we introduce Rust-SWE-bench, a benchmark comprising 500 real-world, repository-level software engineering tasks from 34 diverse and popular Rust repositories. We then perform a comprehensive study on Rust-SWE-bench with four representative agents and four state-of-the-art LLMs to establish a foundational understanding of their capabilities and limitations in the Rust ecosystem. Our extensive study reveals that while ReAct-style agents are promising, i.e., resolving up to 21.2%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Engineering Techniques and Practices
