Defeating the Training-Inference Mismatch via FP16

Penghui Qi; Zichen Liu; Xiangxin Zhou; Tianyu Pang; Chao Du; Wee Sun Lee; Min Lin

arXiv:2510.26788·cs.LG·October 31, 2025

Defeating the Training-Inference Mismatch via FP16

Penghui Qi, Zichen Liu, Xiangxin Zhou, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin

PDF

3 Datasets

TL;DR

This paper shows that switching from BF16 to FP16 floating point precision in RL fine-tuning of large language models reduces training-inference mismatch, leading to more stable, faster, and better performance.

Contribution

The study reveals that FP16 precision, rather than BF16, effectively resolves training-inference mismatch issues in RL fine-tuning of LLMs, with minimal implementation effort.

Findings

01

FP16 reduces training-inference mismatch

02

FP16 improves training stability and convergence

03

FP16 enhances model performance across tasks

Abstract

Reinforcement learning (RL) fine-tuning of large language models (LLMs) often suffers from instability due to the numerical mismatch between the training and inference policies. While prior work has attempted to mitigate this issue through algorithmic corrections or engineering alignments, we show that its root cause lies in the floating point precision itself. The widely adopted BF16, despite its large dynamic range, introduces large rounding errors that breaks the consistency between training and inference. In this work, we demonstrate that simply reverting to \textbf{FP16} effectively eliminates this mismatch. The change is simple, fully supported by modern frameworks with only a few lines of code change, and requires no modification to the model architecture or learning algorithm. Our results suggest that using FP16 uniformly yields more stable optimization, faster convergence, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.