One-layer transformers fail to solve the induction heads task

Clayton Sanford; Daniel Hsu; Matus Telgarsky

arXiv:2408.14332·cs.LG·August 27, 2024

One-layer transformers fail to solve the induction heads task

Clayton Sanford, Daniel Hsu, Matus Telgarsky

PDF

Open Access

TL;DR

This paper demonstrates that one-layer transformers are fundamentally limited in solving the induction heads task unless they are exponentially larger than two-layer models, highlighting a critical depth-related constraint.

Contribution

It provides a theoretical proof showing the limitations of one-layer transformers for the induction heads task based on communication complexity arguments.

Findings

01

One-layer transformers require exponential size to match two-layer performance.

02

Theoretical proof of depth-related limitations in transformer architectures.

03

Highlights importance of depth for certain tasks.

Abstract

A simple communication complexity argument proves that no one-layer transformer can solve the induction heads task unless its size is exponentially larger than the size sufficient for a two-layer transformer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectric Power Systems and Control · Induction Heating and Inverter Technology