Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Yifan Zhang, Wenyu Du, Dongming Jin, Jie Fu, Zhi Jin

TL;DR
This paper investigates how Transformer models with Chain-of-Thought reasoning internally implement finite state automata for effective state tracking, providing mechanistic insights and robustness analysis.
Contribution
It reveals the implicit FSA structure within Transformer+CoT models and identifies key neurons responsible for state tracking, advancing understanding of their internal algorithms.
Findings
Transformer+CoT effectively tracks states using implicit FSAs.
Late-layer MLP neurons are crucial for state representation.
Models demonstrate robustness under challenging conditions.
Abstract
Chain-of-thought (CoT) significantly enhances the performance of large language models (LLMs) across a wide range of tasks, and prior research shows that CoT can theoretically increase expressiveness. However, there is limited mechanistic understanding of the algorithms that Transformer+CoT can learn. Our key contributions are: (1) We evaluate the state tracking capabilities of Transformer+CoT and its variants, confirming the effectiveness of CoT. (2) Next, we identify the circuit (a subset of model components, responsible for tracking the world state), indicating that late-layer MLP neurons play a key role. We propose two metrics, compression and distinction, and show that the neuron sets for each state achieve nearly 100% accuracy, providing evidence of an implicit finite state automaton (FSA) embedded within the model. (3) Additionally, we explore three challenging settings: skipping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
