One-layer transformers fail to solve the induction heads task
Clayton Sanford, Daniel Hsu, Matus Telgarsky

TL;DR
This paper demonstrates that one-layer transformers are fundamentally limited in solving the induction heads task unless they are exponentially larger than two-layer models, highlighting a critical depth-related constraint.
Contribution
It provides a theoretical proof showing the limitations of one-layer transformers for the induction heads task based on communication complexity arguments.
Findings
One-layer transformers require exponential size to match two-layer performance.
Theoretical proof of depth-related limitations in transformer architectures.
Highlights importance of depth for certain tasks.
Abstract
A simple communication complexity argument proves that no one-layer transformer can solve the induction heads task unless its size is exponentially larger than the size sufficient for a two-layer transformer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectric Power Systems and Control · Induction Heating and Inverter Technology
