Efficient Test-Time Scaling via Temporal Reasoning Aggregation
Jiakun Li, Xingwei He, Kefan Li, Hongzheng Chai, Hongyue Yu, and Yuan Yuan

TL;DR
TRACE is a training-free framework that improves test-time reasoning efficiency in large language models by aggregating temporal signals to detect reasoning convergence and halt inference early.
Contribution
It introduces a novel temporal aggregation approach for early stopping in reasoning, outperforming existing methods without additional training.
Findings
Reduces reasoning token usage by 25-30% on average.
Maintains accuracy within 1-2% of full-length reasoning.
Outperforms existing dynamic reasoning methods.
Abstract
Test-time scaling improves the reasoning performance of large language models but often results in token-inefficient overthinking, where models continue reasoning beyond what is necessary for a correct answer. Existing dynamic early-exit methods typically rely on single-step confidence signals, which are often unreliable for detecting reasoning convergence in multi-step settings. To mitigate this limitation, we propose TRACE, a training-free framework for efficient test-time scaling that determines when to terminate reasoning based on temporal aggregation of multi-step evidence rather than instantaneous signals. TRACE detects reasoning convergence over time by aggregating two complementary signals across recent reasoning steps: answer consistency, capturing the persistence of predicted answers, and confidence trajectory, modeling the temporal evolution of model confidence. Benefiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
