Clipping Bottleneck: Stabilizing RLVR via Stochastic Recovery of Near-Boundary Signals
Shuo Yang, Jinda Lu, Chiyu Ma, Kexin Huang, Haoming Meng, Qihui Zhang, Yuyang Liu, Bolin Ding, Guoyin Wang, Li Yuan, and Jingren Zhou

TL;DR
This paper identifies a clipping bottleneck in RLVR training and introduces NSR, a stochastic boundary rescue method, to improve stability and performance across large language models.
Contribution
The paper proposes a simple stochastic boundary rescue technique, NSR, that mitigates clipping-induced information loss in RLVR training, enhancing stability and results.
Findings
NSR improves training stability across various model sizes.
Stochastic boundary rescue outperforms deterministic gradient decay.
Significant performance gains over strong baselines like DAPO and GSPO.
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a central paradigm for scaling LLM reasoning, yet its optimization often suffers from training instability and suboptimal convergence. Through a systematic dissection of clipping-based GRPO-style objectives, we identify the rigid clipping decision induced by hard clipping as a key practical bottleneck in the studied RLVR setups. Specifically, our analysis suggests that informative signals can lie in the near-boundary region just beyond the clipping threshold, and are therefore discarded by the standard hard-clipping rule. Notably, once this bottleneck is precisely identified, even simple stochastic perturbations at the boundary can recover meaningful performance gains. Building on this finding, we propose Near-boundary Stochastic Rescue (NSR), a minimal, plug-and-play modification that stochastically retains these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
