ReFT: Reasoning with Reinforced Fine-Tuning

Trung Quoc Luong; Xinbo Zhang; Zhanming Jie; Peng Sun; Xiaoran Jin,; Hang Li

arXiv:2401.08967·cs.CL·December 16, 2024·1 cites

ReFT: Reasoning with Reinforced Fine-Tuning

Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin,, Hang Li

PDF

Open Access 1 Repo 10 Models 1 Video

TL;DR

ReFT enhances reasoning in Large Language Models by combining supervised fine-tuning with reinforcement learning, allowing models to learn from multiple reasoning paths and significantly improve performance on math problem-solving tasks.

Contribution

The paper introduces Reinforced Fine-Tuning (ReFT), a novel method that improves reasoning generalization in LLMs by integrating reinforcement learning with supervised fine-tuning without extra data.

Findings

01

ReFT outperforms standard SFT on GSM8K, MathQA, and SVAMP datasets.

02

ReFT's performance can be further improved with inference-time strategies.

03

ReFT demonstrates superior generalization ability without additional training data.

Abstract

One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This approach does not show sufficiently strong generalization ability, however, because the training only relies on the given CoT data. In math problem-solving, for example, there is usually only one annotated reasoning path for each question in the training data. Intuitively, it would be better for the algorithm to learn from multiple annotated reasoning paths given a question. To address this issue, we propose a simple yet effective approach called Reinforced Fine-Tuning (ReFT) to enhance the generalizability of learning LLMs for reasoning, with math problem-solving as an example. ReFT first warmups the model with SFT, and then employs on-line reinforcement learning, specifically the PPO algorithm in this paper, to further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lqtrung1998/mwp_reft
pytorchOfficial

Models

Videos

ReFT: Reasoning with Reinforced Fine-Tuning· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsEntropy Regularization · Shrink and Fine-Tune · Proximal Policy Optimization