On the Universality of Transformer Architectures; How Much Attention Is Enough?
Amirreza Abbasi, Mohsen Hooshmand

TL;DR
This paper investigates the universality and expressiveness of Transformer architectures, reviewing recent advances, architectural refinements, and identifying future research directions to better understand their capabilities.
Contribution
It provides a comprehensive review of recent progress on Transformer universality, clarifies known theoretical guarantees, and highlights key directions for future research.
Findings
Transformers exhibit strong approximation capabilities.
Recent architectural refinements improve expressiveness.
Theoretical understanding of Transformers' universality is advancing.
Abstract
Transformers are crucial across many AI fields, such as large language models, computer vision, and reinforcement learning. This prominence stems from the architecture's perceived universality and scalability compared to alternatives. This work examines the problem of universality in Transformers, reviews recent progress, including architectural refinements such as structural minimality and approximation rates, and surveys state-of-the-art advances that inform both theoretical and practical understanding. Our aim is to clarify what is currently known about Transformers expressiveness, separate robust guarantees from fragile ones, and identify key directions for future theoretical research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Multimodal Machine Learning Applications
