Clipping-Free Policy Optimization for Large Language Models

\"Omer Veysel \c{C}a\u{g}atan; Bar{\i}\c{s} Akg\"un; G\"ozde G\"ul \c{S}ahin; Xuandong Zhao

arXiv:2601.22801·cs.LG·February 2, 2026

Clipping-Free Policy Optimization for Large Language Models

\"Omer Veysel \c{C}a\u{g}atan, Bar{\i}\c{s} Akg\"un, G\"ozde G\"ul \c{S}ahin, Xuandong Zhao

PDF

Open Access

TL;DR

This paper introduces Clipping-Free Policy Optimization (CFPO), a novel reinforcement learning method for large language models that avoids clipping issues, improves training stability, and maintains performance across reasoning and alignment tasks.

Contribution

CFPO replaces heuristic clipping with a convex quadratic penalty based on Total Variation divergence, providing a stable, differentiable objective with minimal code changes.

Findings

01

CFPO matches clipping-based methods on downstream benchmarks.

02

CFPO extends stable training regimes and mitigates verbosity exploitation.

03

CFPO reduces capability degradation while maintaining instruction-following performance.

Abstract

Reinforcement learning has become central to post-training large language models, yet dominant algorithms rely on clipping mechanisms that introduce optimization issues at scale, including zero-gradient regions, reward hacking, and training instability. We propose Clipping-Free Policy Optimization (CFPO), which replaces heuristic clipping with a convex quadratic penalty derived from Total Variation divergence constraints, yielding an everywhere-differentiable objective that enforces stable policy updates without hard boundaries. We evaluate CFPO across both reasoning and alignment settings. In reasoning, CFPO matches clipping-based methods on downstream benchmarks while extending the stable training regime. In alignment, CFPO mitigates verbosity exploitation and reduces capability degradation, while achieving competitive instruction-following performance. CFPO requires only a one-line…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification