Expressivity of Transformers: A Tropical Geometry Perspective
Ye Su, Yong Liu

TL;DR
This paper introduces a tropical geometry framework to analyze the expressivity of transformers, revealing their exact spatial partitioning capabilities and combinatorial complexity growth with network parameters.
Contribution
It models self-attention as a tropical rational map, establishes bounds on linear regions, and demonstrates the stability of these partitions under soft attention.
Findings
Transformers evaluate to Power Voronoi Diagrams in the zero-temperature limit.
Multi-head self-attention increases polyhedral complexity exponentially with the number of heads.
Number of linear regions scales as Θ(N^{d_model}L), showing combinatorial explosion.
Abstract
To quantify the geometric expressivity of transformers, we introduce a tropical geometry framework to characterize their exact spatial partitioning capabilities. By modeling self-attention as a vector-valued tropical rational map, we prove it evaluates exactly to a Power Voronoi Diagram in the zero-temperature limit. Building on this equivalence, we establish a combinatorial rationale for Multi-Head Self-Attention (MHSA): via the Minkowski sum of Newton polytopes, multi-head aggregation expands the polyhedral complexity to , overcoming the bottleneck of single heads. Extending this to deep architectures, we derive the first tight asymptotic bounds on the number of linear regions in transformers (), demonstrating a combinatorial explosion driven intrinsically by sequence length , ambient embedding dimension…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
