On the Universality of Transformer Architectures; How Much Attention Is Enough?

Amirreza Abbasi; Mohsen Hooshmand

arXiv:2512.18445·cs.LG·December 23, 2025

On the Universality of Transformer Architectures; How Much Attention Is Enough?

Amirreza Abbasi, Mohsen Hooshmand

PDF

Open Access

TL;DR

This paper investigates the universality and expressiveness of Transformer architectures, reviewing recent advances, architectural refinements, and identifying future research directions to better understand their capabilities.

Contribution

It provides a comprehensive review of recent progress on Transformer universality, clarifies known theoretical guarantees, and highlights key directions for future research.

Findings

01

Transformers exhibit strong approximation capabilities.

02

Recent architectural refinements improve expressiveness.

03

Theoretical understanding of Transformers' universality is advancing.

Abstract

Transformers are crucial across many AI fields, such as large language models, computer vision, and reinforcement learning. This prominence stems from the architecture's perceived universality and scalability compared to alternatives. This work examines the problem of universality in Transformers, reviews recent progress, including architectural refinements such as structural minimality and approximation rates, and surveys state-of-the-art advances that inform both theoretical and practical understanding. Our aim is to clarify what is currently known about Transformers expressiveness, separate robust guarantees from fragile ones, and identify key directions for future theoretical research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Multimodal Machine Learning Applications