TL;DR
ConciseRL introduces a hyperparameter-free conciseness score as a reward in reinforcement learning to guide large language models toward generating correct, concise reasoning traces, significantly improving efficiency and accuracy across multiple datasets.
Contribution
The paper presents a novel conciseness score used as a reward signal in reinforcement learning, enabling dynamic, context-aware guidance for reasoning models to produce concise and accurate outputs.
Findings
Reduces token usage by up to 31x on simple problems.
Improves accuracy by 7% on the MATH dataset.
Outperforms full reasoning by +7.5% accuracy on hardest problems.
Abstract
Large language models excel at complex tasks by breaking down problems into structured reasoning steps. However, reasoning traces often extend beyond reaching a correct answer, causing wasted computation, reduced readability, and hallucinations. To address this, we introduce a novel hyperparameter-free conciseness score used as a reward signal within a reinforcement learning framework to guide models toward generating correct and concise reasoning traces. This score is evaluated by a large language model acting as a judge, enabling dynamic, context-aware feedback beyond simple token length. Our method achieves state-of-the-art efficiency-accuracy trade-offs on the MATH dataset, reducing token usage by up to 31x on simple problems while improving accuracy by 7%, and on the hardest problems, it outperforms full reasoning by +7.5% accuracy with up to 3.6x fewer tokens. On TheoremQA, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
