GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

Wangjie Gan; Miao Pan; Linbo Xi; Wenqi Zhang; Jintao Chen; Jianwei Yin; Xuhong Zhang

arXiv:2604.14258·cs.AI·May 5, 2026

GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

Wangjie Gan, Miao Pan, Linbo Xi, Wenqi Zhang, Jintao Chen, Jianwei Yin, Xuhong Zhang

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces GFT, a unified fine-tuning framework for large language models that improves stability and knowledge integration by addressing intrinsic training limitations.

Contribution

GFT combines group advantage learning and dynamic coefficient rectification to enhance fine-tuning stability and effectiveness for large language models.

Findings

01

GFT outperforms traditional SFT methods in experiments.

02

GFT produces more stable and generalizable policies.

03

The framework facilitates smoother integration with subsequent reinforcement learning.

Abstract

Large language models are typically post-trained using supervised fine-tuning (SFT) and reinforcement learning (RL), yet effectively unifying efficient knowledge injection with robust generalization remains challenging. In this work, we provide a training-dynamics analysis showing that SFT can be interpreted as a special case of policy gradient optimization with an extremely sparse implicit reward and unstable inverse-probability weighting, which together lead to single-path dependency, entropy collapse, and gradient explosion. Motivated by this diagnosis, we propose Group Fine-Tuning (GFT), a unified post-training framework that addresses these intrinsic limitations through two mechanisms: Group Advantage Learning, which constructs diverse response groups and derives normalized contrastive supervision to alleviate reward sparsity, and Dynamic Coefficient Rectification, which adaptively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zju-omniai/GFT
github

Datasets

OmniAI-ZJU/NuminaMath-Cot-Distillation-100K
dataset· 79 dl
79 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.