Answer Convergence as a Signal for Early Stopping in Reasoning
Xin Liu, Lu Wang

TL;DR
This paper investigates how early stopping based on answer convergence can reduce inference costs in large language models during reasoning tasks, demonstrating significant token savings with minimal accuracy loss.
Contribution
It introduces novel inference-time strategies for early stopping in reasoning, leveraging answer stability to improve efficiency without sacrificing accuracy.
Findings
Models typically converge after 60% of reasoning steps on math tasks
Answer consistency-based early stopping reduces tokens by over 40% on NaturalQuestions
Proposed methods maintain accuracy while significantly decreasing inference costs
Abstract
Chain-of-thought (CoT) prompting enhances reasoning in large language models (LLMs) but often leads to verbose and redundant outputs, thus increasing inference cost. We hypothesize that many reasoning steps are unnecessary for producing correct answers. To investigate this, we start with a systematic study to examine what is the minimum reasoning required for a model to reach a stable decision. We find that on math reasoning tasks like math, models typically converge to their final answers after 60\% of the reasoning steps, suggesting substantial redundancy in the remaining content. Based on these insights, we propose three inference-time strategies to improve efficiency: (1) early stopping via answer consistency, (2) boosting the probability of generating end-of-reasoning signals, and (3) a supervised method that learns when to stop based on internal activations. Experiments across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Child and Animal Learning Development · AI-based Problem Solving and Planning
