Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner

Hao Ma; Zhiqiang Pu; Yang Liu; Xiaolin Ai

arXiv:2603.18088·cs.LG·March 20, 2026

Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner

Hao Ma, Zhiqiang Pu, Yang Liu, Xiaolin Ai

PDF

Open Access

TL;DR

This paper introduces dynamic constraints in reinforcement learning fine-tuning, using an online refiner to adapt constraints based on output quality, leading to improved task rewards and stability.

Contribution

It proposes a novel online refiner mechanism that dynamically adjusts constraints during reinforcement learning fine-tuning, enhancing performance and stability.

Findings

01

Outperforms KL regularization and unconstrained baselines.

02

Achieves higher task rewards in dialogue and code generation.

03

Maintains training stability with dynamic constraints.

Abstract

Constraints are essential for stabilizing reinforcement learning fine-tuning (RFT) and preventing degenerate outputs, yet they inherently conflict with the optimization objective because stronger constraints limit the ability of a fine-tuned model to discover better solutions. We propose \textit{dynamic constraints} that resolve this tension by adapting to the evolving capabilities of the fine-tuned model based on the insight that constraints should only intervene when degenerate outputs occur. We implement this by using a reference model as an \textit{online refiner} that takes the response from the fine-tuned model and generates a minimally corrected version which preserves correct content verbatim while fixing errors. A supervised fine-tuning loss then trains the fine-tuned model to produce the refined output. This mechanism yields a constraint that automatically strengthens or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Machine Learning and Algorithms