Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

Anum Afzal; Florian Matthes; Gal Chechik; Yftah Ziser

arXiv:2505.24362·cs.CL·June 3, 2025

Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion

Anum Afzal, Florian Matthes, Gal Chechik, Yftah Ziser

PDF

1 Repo 1 Video

TL;DR

This paper shows that large language models encode key reasoning success information early in the process, enabling early stopping of Chain-of-Thought reasoning without significant performance loss.

Contribution

It demonstrates that initial LLM representations can predict reasoning success before completion, facilitating early stopping strategies for more efficient reasoning.

Findings

01

Probing classifiers perform well before token generation.

02

Early representations contain sufficient reasoning success information.

03

Early stopping can improve efficiency with minimal performance loss.

Abstract

We investigate whether the success of a zero-shot Chain-of-Thought (CoT) process can be predicted before completion. We discover that a probing classifier, based on LLM representations, performs well \emph{even before a single token is generated}, suggesting that crucial information about the reasoning process is already present in the initial steps representations. In contrast, a strong BERT-based baseline, which relies solely on the generated tokens, performs worse, likely because it depends on shallow linguistic cues rather than deeper reasoning dynamics. Surprisingly, using later reasoning steps does not always improve classification. When additional context is unhelpful, earlier representations resemble later ones more, suggesting LLMs encode key information early. This implies reasoning can often stop early without loss. To test this, we conduct early stopping experiments, showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anum94/cotpred
pytorchOfficial

Videos

Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion· underline

Taxonomy

MethodsEarly Stopping