SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

Zexiong Ma; Chao Peng; Pengfei Gao; Xiangxin Meng; Yanzhen Zou; Bing; Xie

arXiv:2502.20127·cs.SE·February 28, 2025

SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

Zexiong Ma, Chao Peng, Pengfei Gao, Xiangxin Meng, Yanzhen Zou, Bing, Xie

PDF

Open Access 1 Video

TL;DR

SoRFT introduces a structured, two-stage training method for large language models that significantly improves issue resolution capabilities by decomposing tasks and leveraging reinforcement learning, outperforming existing open-source models.

Contribution

The paper presents a novel subtask-oriented reinforcement fine-tuning approach that enhances open-source LLMs for issue resolving, addressing generalization and cost issues of commercial models.

Findings

01

Achieves state-of-the-art performance on SWE-Bench datasets.

02

Significantly improves issue resolution accuracy.

03

Provides a cost-effective alternative to commercial models.

Abstract

Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning· underline

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques

MethodsEntropy Regularization · Proximal Policy Optimization