OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao,, Jitao Sang

TL;DR
OpenRFT adapts reasoning foundation models for domain-specific tasks using reinforcement fine-tuning, addressing data scarcity and reasoning step limitations to improve performance with minimal samples.
Contribution
This paper introduces OpenRFT, a novel method for fine-tuning reasoning models on domain-specific tasks using reinforcement fine-tuning with data augmentation techniques.
Findings
OpenRFT achieves significant performance improvements on SciKnowEval.
Effective use of only 100 domain-specific samples per task.
Demonstrates the potential of reinforcement fine-tuning for domain adaptation.
Abstract
OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation. This technical report presents \emph{OpenRFT}, our attempt to fine-tune generalist reasoning models for domain-specific tasks under the same settings as RFT. OpenRFT addresses two key challenges of lacking reasoning step data and the limited quantity of training samples, by leveraging the domain-specific samples in three ways: question augmentation, synthesizing reasoning-process data, and few-shot ICL. The evaluation is conducted on SciKnowEval, where OpenRFT achieves notable performance gains with only domain-specific samples for each task. More experimental results will be updated continuously in later versions. Source codes, datasets, and models are disclosed at:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Software Engineering Methodologies · Reinforcement Learning in Robotics · Software Engineering Research
