ESTAR: Early-Stopping Token-Aware Reasoning For Efficient Inference
Junda Wang, Zhichao Yang, Dongxu Zhang, Sanjit Singh Batra, Robert E. Tillman

TL;DR
ESTAR introduces an early-stopping mechanism for large reasoning models that detects when to halt reasoning, significantly reducing computation while maintaining accuracy across multiple datasets.
Contribution
The paper presents a novel early-stopping method combining trajectory-based classification, supervised fine-tuning, and reinforcement learning to improve reasoning efficiency.
Findings
Reasoning length reduced by 3.7x
Accuracy preserved at around 74%
Strong cross-domain generalization
Abstract
Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been reached. We introduce Early-Stopping for Token-Aware Reasoning (ESTAR), which detects and reduces such reasoning redundancy to improve efficiency without sacrificing accuracy. Our method combines (i) a trajectory-based classifier that identifies when reasoning can be safely stopped, (ii) supervised fine-tuning to teach LRMs to propose self-generated <stop> signals, and (iii) <stop>-aware reinforcement learning that truncates rollouts at self-generated stop points with compute-aware rewards. Experiments on four reasoning datasets show that ESTAR reduces reasoning length by about 3.7x (from 4,799 to 1,290) while preserving accuracy (74.9% vs. 74.2%), with strong cross-domain generalization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
