Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning

Nathana\"el Carraz Rakotonirina; Ren Pang; Neha Anna John; Michael Bohlke-Schneider; Momchil Hardalov

arXiv:2601.02972·cs.CL·January 7, 2026

Correct, Concise and Complete: Multi-stage Training For Adaptive Reasoning

Nathana\"el Carraz Rakotonirina, Ren Pang, Neha Anna John, Michael Bohlke-Schneider, Momchil Hardalov

PDF

Open Access

TL;DR

This paper introduces a multi-stage training approach for large language models that reduces reasoning response length by about 28-40% with minimal accuracy loss, improving efficiency without sacrificing performance.

Contribution

It presents a novel combination of supervised fine-tuning and reinforcement learning with an adaptive length penalty to optimize reasoning efficiency.

Findings

01

Reduces response length by 28-40% across models

02

Maintains high accuracy with only 1.6-2.5 point drops

03

Achieves superior efficiency trade-offs compared to state-of-the-art methods

Abstract

The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes unnecessarily long, increasing computation cost without actual accuracy gains or sometimes even degrading performance, a phenomenon known as ``overthinking''. We propose a multi-stage efficient reasoning method that combines supervised fine-tuning -- via rejection sampling or reasoning trace reformatting -- with reinforcement learning using an adaptive length penalty. We introduce a lightweight reward function that penalizes tokens generated after the first correct answer but encouraging self-verification only when beneficial. We conduct a holistic evaluation across seven diverse reasoning tasks, analyzing the accuracy-response length trade-off. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques