Large Language Models Decide Early and Explain Later
Ayan Datta, Zhixue Zhao, Bhuvanesh Verma, Radhika Mamidi, Mounika Marreddy, Alexander Mehler

TL;DR
This paper investigates when large language models finalize their answers during reasoning, revealing that many answers are decided early and subsequent tokens often serve as explanations, enabling more efficient inference.
Contribution
The study uncovers that models often decide answers early in reasoning, and proposes early stopping heuristics to reduce token usage with minimal accuracy loss.
Findings
Predicted answers change in only 32% of queries.
Models generate an average of 760 reasoning tokens per query.
Early stopping heuristics can cut token usage by 500 tokens with only 2% accuracy drop.
Abstract
Large Language Models often achieve strong performance by generating long intermediate chain-of-thought reasoning. However, it remains unclear when a model's final answer is actually determined during generation. If the answer is already fixed at an intermediate stage, subsequent reasoning tokens may constitute post-decision explanation, increasing inference cost and latency without improving correctness. We study the evolution of predicted answers over reasoning steps using forced answer completion, which elicits the model's intermediate predictions at partial reasoning prefixes. Focusing on Qwen3-4B and averaging results across all datasets considered, we find that predicted answers change in only 32% of queries. Moreover, once the final answer switch occurs, the model generates an average of 760 additional reasoning tokens per query, accounting for a substantial fraction of the total…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
