How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation
Hao Yang, Qinghua Zhao, Lei Li

TL;DR
This paper investigates the internal workings of Chain-of-Thought prompting, revealing how it guides model reasoning through information flow and neuron engagement, and offers insights for improving prompt design.
Contribution
It provides a mechanistic interpretability framework for CoT, analyzing information flow and neuron activity, and uncovers how CoT acts as a decoding space pruner and modulates neuron engagement.
Findings
CoT acts as a decoding space pruner guiding output with answer templates
Higher template adherence correlates with better performance
Neuron engagement varies with task type, decreasing in open-domain and increasing in closed-domain tasks
Abstract
Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT's operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for enabling targeted CoT interventions to design more efficient and robust prompts. We released our code and data at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
