SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
Zexiong Ma, Chao Peng, Pengfei Gao, Xiangxin Meng, Yanzhen Zou, Bing, Xie

TL;DR
SoRFT introduces a structured, two-stage training method for large language models that significantly improves issue resolution capabilities by decomposing tasks and leveraging reinforcement learning, outperforming existing open-source models.
Contribution
The paper presents a novel subtask-oriented reinforcement fine-tuning approach that enhances open-source LLMs for issue resolving, addressing generalization and cost issues of commercial models.
Findings
Achieves state-of-the-art performance on SWE-Bench datasets.
Significantly improves issue resolution accuracy.
Provides a cost-effective alternative to commercial models.
Abstract
Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
MethodsEntropy Regularization · Proximal Policy Optimization
