Emergent Stack Representations in Modeling Counter Languages Using Transformers
Utkarsh Tiwari, Aviral Gupta, Michael Hahn

TL;DR
This paper investigates how transformer models trained on counter languages develop internal stack-like representations, shedding light on their algorithmic understanding of formal languages.
Contribution
It demonstrates that transformers trained on counter languages learn internal representations that encode stack depths, advancing understanding of their language modeling capabilities.
Findings
Transformers trained on counter languages develop stack-like internal representations.
Probing reveals models encode stack depths at each input token.
Results contribute to understanding the algorithmic mechanisms of transformers.
Abstract
Transformer architectures are the backbone of most modern language models, but understanding the inner workings of these models still largely remains an open problem. One way that research in the past has tackled this problem is by isolating the learning capabilities of these architectures by training them over well-understood classes of formal languages. We extend this literature by analyzing models trained over counter languages, which can be modeled using counter variables. We train transformer models on 4 counter languages, and equivalently formulate these languages using stacks, whose depths can be understood as the counter values. We then probe their internal representations for stack depths at each input token to show that these models when trained as next token predictors learn stack-like representations. This brings us closer to understanding the algorithmic details of how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Topic Modeling
