Understanding Hidden Computations in Chain-of-Thought Reasoning
Aryasomayajula Ram Bharadwaj

TL;DR
This paper explores how transformer models internally process hidden reasoning steps in Chain-of-Thought prompting, revealing that hidden characters can be decoded without performance loss, thus enhancing interpretability.
Contribution
It introduces methods to decode hidden reasoning tokens in transformer models, providing new insights into their internal representations during Chain-of-Thought reasoning.
Findings
Hidden characters can be recovered without performance loss
Layer-wise representations reveal internal reasoning processes
Decoding improves interpretability of model reasoning
Abstract
Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Logic, Reasoning, and Knowledge
