Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning
Nathana\"el Carraz Rakotonirina, Ren Pang, Neha Anna John, Michael Bohlke-Schneider, Momchil Hardalov

TL;DR
This paper introduces a multi-stage training approach for large language models that reduces reasoning response length by about 28-40% with minimal accuracy loss, improving efficiency without sacrificing performance.
Contribution
It presents a novel combination of supervised fine-tuning and reinforcement learning with an adaptive length penalty to optimize reasoning efficiency.
Findings
Reduces response length by 28-40% across models
Maintains high accuracy with only 1.6-2.5 point drops
Achieves superior efficiency trade-offs compared to state-of-the-art methods
Abstract
The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes unnecessarily long, increasing computation cost without actual accuracy gains or sometimes even degrading performance, a phenomenon known as ``overthinking''. We propose a multi-stage efficient reasoning method that combines supervised fine-tuning -- via rejection sampling or reasoning trace reformatting -- with reinforcement learning using an adaptive length penalty. We introduce a lightweight reward function that penalizes tokens generated after the first correct answer but encouraging self-verification only when beneficial. We conduct a holistic evaluation across seven diverse reasoning tasks, analyzing the accuracy-response length trade-off. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
