Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training
Chen Wang, Hexuan Deng, Yining Zhang, Yuchen Zhang, Jionghao Bai, Zhaochun Li, Ge Lan, Yue Wang

TL;DR
This paper introduces Implicit Compression Regularization (ICR), a novel on-policy method that encourages concise reasoning in reinforcement learning models by leveraging the shortest correct responses during training.
Contribution
ICR provides a new way to regularize RL training, maintaining a positive length--accuracy correlation and producing shorter, accurate responses without degrading performance.
Findings
ICR consistently shortens responses across benchmarks.
ICR maintains or improves accuracy while reducing response length.
ICR achieves a better accuracy--length Pareto frontier.
Abstract
Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the former may degrade accuracy and induce underthinking, whereas the latter assumes that substantial portions of reasoning traces can be safely truncated. To obtain a compression signal without these limitations, we revisit the training dynamics of existing compression methods. We observe that the length--accuracy correlation is initially negative but continually increases during compression, indicating that shorter responses are initially more likely to be correct but gradually lose this property as the policy moves toward underthinking. Based on this observation, we formalize overthinking: a negative correlation indicates an overthinking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
