Evaluating and Improving Automated Repository-Level Rust Issue Resolution with LLM-based Agents

Jiahong Xiang; Wenxiao He; Xihua Wang; Hongliang Tian; Yuqun Zhang

arXiv:2602.22764·cs.SE·February 27, 2026

Evaluating and Improving Automated Repository-Level Rust Issue Resolution with LLM-based Agents

Jiahong Xiang, Wenxiao He, Xihua Wang, Hongliang Tian, Yuqun Zhang

PDF

Open Access

TL;DR

This paper introduces Rust-SWE-bench, a large benchmark for Rust issue resolution, evaluates LLM-based agents' performance, identifies key challenges, and proposes RUSTFORGER, a novel agent that significantly improves resolution success.

Contribution

The paper presents Rust-SWE-bench, the first large-scale Rust repository benchmark, and introduces RUSTFORGER, a new agentic approach that outperforms existing methods in resolving issues.

Findings

01

ReAct-style agents resolve up to 21.2% of issues.

02

Issue reproduction is critical for task resolution.

03

RUSTFORGER resolves 28.6% of tasks, a 34.9% improvement.

Abstract

The Rust programming language presents a steep learning curve and significant coding challenges, making the automation of issue resolution essential for its broader adoption. Recently, LLM-powered code agents have shown remarkable success in resolving complex software engineering tasks, yet their application to Rust has been limited by the absence of a large-scale, repository-level benchmark. To bridge this gap, we introduce Rust-SWE-bench, a benchmark comprising 500 real-world, repository-level software engineering tasks from 34 diverse and popular Rust repositories. We then perform a comprehensive study on Rust-SWE-bench with four representative agents and four state-of-the-art LLMs to establish a foundational understanding of their capabilities and limitations in the Rust ecosystem. Our extensive study reveals that while ReAct-style agents are promising, i.e., resolving up to 21.2%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Engineering Techniques and Practices