Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Yanhao Li; Lu Ma; Jiaran Zhang; Lexiang Tang; Wentao Zhang; Guibo Luo

arXiv:2512.21540·cs.AI·December 29, 2025

Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model

Yanhao Li, Lu Ma, Jiaran Zhang, Lexiang Tang, Wentao Zhang, Guibo Luo

PDF

Open Access

TL;DR

Leash introduces an adaptive reinforcement learning framework that dynamically adjusts length penalties to optimize reasoning efficiency in large language models, reducing reasoning length significantly while maintaining performance.

Contribution

We propose Leash, a novel reinforcement learning approach using a Lagrangian primal-dual method for adaptive length control in LLM reasoning tasks.

Findings

01

Reduces average reasoning length by 60% across tasks

02

Maintains competitive task performance

03

Effective in diverse domains including coding and instruction following

Abstract

Existing approaches typically rely on fixed length penalties, but such penalties are hard to tune and fail to adapt to the evolving reasoning abilities of LLMs, leading to suboptimal trade-offs between accuracy and conciseness. To address this challenge, we propose Leash (adaptive LEngth penAlty and reward SHaping), a reinforcement learning framework for efficient reasoning in LLMs. We formulate length control as a constrained optimization problem and employ a Lagrangian primal-dual method to dynamically adjust the penalty coefficient. When generations exceed the target length, the penalty is intensified; when they are shorter, it is relaxed. This adaptive mechanism guides models toward producing concise reasoning without sacrificing task performance. Experiments on Deepseek-R1-Distill-Qwen-1.5B and Qwen3-4B-Thinking-2507 show that Leash reduces the average reasoning length by 60%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)