TL;DR
This paper shows that large language models encode key reasoning success information early in the process, enabling early stopping of Chain-of-Thought reasoning without significant performance loss.
Contribution
It demonstrates that initial LLM representations can predict reasoning success before completion, facilitating early stopping strategies for more efficient reasoning.
Findings
Probing classifiers perform well before token generation.
Early representations contain sufficient reasoning success information.
Early stopping can improve efficiency with minimal performance loss.
Abstract
We investigate whether the success of a zero-shot Chain-of-Thought (CoT) process can be predicted before completion. We discover that a probing classifier, based on LLM representations, performs well \emph{even before a single token is generated}, suggesting that crucial information about the reasoning process is already present in the initial steps representations. In contrast, a strong BERT-based baseline, which relies solely on the generated tokens, performs worse, likely because it depends on shallow linguistic cues rather than deeper reasoning dynamics. Surprisingly, using later reasoning steps does not always improve classification. When additional context is unhelpful, earlier representations resemble later ones more, suggesting LLMs encode key information early. This implies reasoning can often stop early without loss. To test this, we conduct early stopping experiments, showing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsEarly Stopping
