TL;DR
LEAD introduces an adaptive, online method to optimize the balance between correctness and efficiency in large language model reasoning, reducing verbosity and improving performance across benchmarks.
Contribution
It replaces static heuristics with self-adaptive mechanisms for dynamic calibration of reasoning length and correctness trade-offs during training.
Findings
LEAD achieves higher accuracy and efficiency scores than previous RL-trained methods.
It produces significantly shorter reasoning outputs without sacrificing correctness.
Evaluated on five mathematical benchmarks, demonstrating broad applicability.
Abstract
Large reasoning models, such as OpenAI o1 and DeepSeek-R1, tend to become increasingly verbose as their reasoning capabilities improve. These inflated Chain-of-Thought (CoT) trajectories often exceed what the underlying problems require, wasting compute, latency, and context budgets. While introducing length-based efficiency rewards during reinforcement learning offers a natural remedy, existing methods struggle with two fundamental challenges: the optimal balance between correctness and efficiency is non-stationary throughout training, and intrinsic reasoning budgets vary drastically across problems. Relying on static reward weights and global length constraints inevitably forces a compromise between degraded accuracy and unrealized compression. To overcome these limitations, we propose LEAD (Length-Efficient Adaptive and Dynamic reasoning), a method that replaces static heuristics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
