ReFT: Reasoning with Reinforced Fine-Tuning
Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin,, Hang Li

TL;DR
ReFT enhances reasoning in Large Language Models by combining supervised fine-tuning with reinforcement learning, allowing models to learn from multiple reasoning paths and significantly improve performance on math problem-solving tasks.
Contribution
The paper introduces Reinforced Fine-Tuning (ReFT), a novel method that improves reasoning generalization in LLMs by integrating reinforcement learning with supervised fine-tuning without extra data.
Findings
ReFT outperforms standard SFT on GSM8K, MathQA, and SVAMP datasets.
ReFT's performance can be further improved with inference-time strategies.
ReFT demonstrates superior generalization ability without additional training data.
Abstract
One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This approach does not show sufficiently strong generalization ability, however, because the training only relies on the given CoT data. In math problem-solving, for example, there is usually only one annotated reasoning path for each question in the training data. Intuitively, it would be better for the algorithm to learn from multiple annotated reasoning paths given a question. To address this issue, we propose a simple yet effective approach called Reinforced Fine-Tuning (ReFT) to enhance the generalizability of learning LLMs for reasoning, with math problem-solving as an example. ReFT first warmups the model with SFT, and then employs on-line reinforcement learning, specifically the PPO algorithm in this paper, to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗lqtrung1998/Codellama-7b-hf-ReFT-GSM8kmodel· 220 dl· ♡ 1220 dl♡ 1
- 🤗lqtrung1998/Codellama-7b-hf-ReFT-Rerank-GSM8kmodel· 103 dl· ♡ 2103 dl♡ 2
- 🤗lqtrung1998/Codellama-7b-hf-SFT-warmup-GSM8kmodel· 5 dl5 dl
- 🤗lqtrung1998/Codellama-7b-hf-SFT-GSM8kmodel· 5 dl5 dl
- 🤗lqtrung1998/galactica-6.7b-SFT-GSM8kmodel· 2 dl2 dl
- 🤗lqtrung1998/galactica-6.7b-SFT-warmup-GSM8kmodel· 82 dl82 dl
- 🤗lqtrung1998/galactica-6.7b-ReFT-GSM8kmodel· 409 dl409 dl
- 🤗lqtrung1998/galactica-6.7b-ReFT-Rerank-GSM8kmodel· 329 dl329 dl
- 🤗lqtrung1998/galactica-6.7b-SFT-Rerank-GSM8kmodel· 3 dl3 dl
- 🤗lqtrung1998/Codellama-7b-hf-SFT-Rerank-GSM8kmodel· 2 dl2 dl
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsEntropy Regularization · Shrink and Fine-Tune · Proximal Policy Optimization
