Shorten After You're Right: Lazy Length Penalties for Reasoning RL

Danlong Yuan; Tian Xie; Shaohan Huang; Zhuocheng Gong; Huishuai Zhang; Chong Luo; Furu Wei; Dongyan Zhao

arXiv:2505.12284·cs.AI·March 17, 2026

Shorten After You're Right: Lazy Length Penalties for Reasoning RL

Danlong Yuan, Tian Xie, Shaohan Huang, Zhuocheng Gong, Huishuai Zhang, Chong Luo, Furu Wei, Dongyan Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces three reward-based techniques integrated into reinforcement learning to effectively shorten reasoning paths in large models, reducing response length significantly without extra training stages.

Contribution

The paper proposes novel reward designs for RL that directly reduce reasoning path length in large models without additional training stages.

Findings

01

40% reduction in response length in logic reasoning tasks

02

33% reduction in response length in math problems

03

Performance maintained or improved despite shorter responses

Abstract

Large reasoning models, such as OpenAI o1 or DeepSeek R1, have demonstrated remarkable performance on reasoning tasks but often incur a long reasoning path with significant memory and time costs. Existing methods primarily aim to shorten reasoning paths by introducing additional training data and stages. In this paper, we propose three critical reward designs integrated directly into the reinforcement learning process of large reasoning models, which reduce the response length without extra training stages. Experiments on four settings show that our method significantly decreases response length while maintaining or even improving performance. Specifically, in a logic reasoning setting, we achieve a 40% reduction in response length averaged by steps alongside a 14% gain in performance. For math problems, we reduce response length averaged by steps by 33% while preserving performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lblankl/short-rl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications