Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor
Xiaocan Li, Shiliang Wu, Zheng Shen

TL;DR
This paper presents a detailed three-way decomposition of MXFP4 quantization error in LLM reinforcement learning, identifying distinct error components and proposing targeted corrections to recover near-BF16 accuracy.
Contribution
It introduces an exact decomposition of quantization error into scale bias, deadzone truncation, and grid noise, with specific correction methods for each component.
Findings
Targeted corrections recover BF16 accuracy within 0.7% and 3.0%.
Decomposition reveals how each error component affects RL training.
Proposes methods to mitigate quantization errors in LLM RL post-training.
Abstract
MXFP4 arithmetic can dramatically accelerate reinforcement learning (RL) post-training of large language models (LLMs), yet the quantization error introduces severe accuracy degradation. Existing work treats the quantization error as a monolithic noise term, missing the distinct mechanisms upon interpreting how quantization error damages training. We prove an exact three-way decomposition of quantization error and show how each component dominates a distinct RL training pathway. Our theoretical and empirical analysis decomposes the MXFP4 quantization error into three additive components: "scale bias" from power-of-two rounding, "deadzone truncation" from zeroing small values, and "grid noise" from rounding to the nearest 4-bit grid. Each component dominates a distinct RL failure mode: scale bias accumulates multiplicatively through the backward pass, affecting gradient accuracy;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
