Transformers are Inherently Succinct

Pascal Bergstr\"a{\ss}er; Ryan Cotterell; Anthony W. Lin

arXiv:2510.19315·cs.FL·May 18, 2026

Transformers are Inherently Succinct

Pascal Bergstr\"a{\ss}er, Ryan Cotterell, Anthony W. Lin

PDF

1 Video 3 Reviews

TL;DR

This paper demonstrates that fixed-precision transformers are exponentially more succinct than traditional formal models like LTL and RNNs, with significant implications for their expressive power and verification complexity.

Contribution

It establishes the exponential succinctness of fixed-precision transformers compared to classical formalisms and improves bounds on their translation to LTL formulas.

Findings

01

Transformers can be exponentially more succinct than LTL and RNNs.

02

Any fixed-precision transformer can be converted to an LTL formula with exponential blow-up.

03

Verification problems like emptiness and equivalence are EXPSPACE-complete for transformers.

Abstract

We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that fixed-precision transformers are remarkably succinct: they can be exponentially more succinct than both linear temporal logic (LTL) and recurrent neural networks, and, by extension, state-space models, and doubly exponentially more succinct than finite automata. In other words, there exist families of languages describable by polynomial-size transformers whose smallest equivalent LTL formula or recurrent neural network is exponentially large, and whose smallest equivalent automaton is doubly exponentially large. We also establish matching upper bounds, showing that any fixed-precision transformer can be converted to an LTL formula with at most an…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

* The proposal of using "succintness" of representation as a way of measuring expressivity is novel in the study of the transformer architecture. It seem very interesting to pursue. * The results (but see caveats below in the weaknesses section) are quite intriguing, showing how transformers can represent certain counting and tiling problems vastly more succinctly than LTL formulas and RNNs. * The authors clearly identify the architectural assumptions (more like choices) early on in the paper,

Weaknesses

* While the paper makes it clear that the architectures use finite precision and unique rightmost hard attention, the authors don't mention other key aspects that seem important and consequential. E.g., the B-RASP program construction in the proof of Proposition 6 (tiling) uses $\Theta(n)$ predicates at each position $i$, which allows each # position to have access to the entire counter value represented by the previous $n$ positions, which then allows checking "locally" for increment by $1$. Th

Reviewer 02Rating 8Confidence 3

Strengths

* This paper provides a rigorous and clear analysis of a special case of transformer models (UHATs). Upper bounds have complementary matching bounds. These help build a growing literature analyzing transformer architectures via a complexity-theoretic lens. * The paper demonstrates that even though RNNs are more expressive than UHATs, there are languages that UHATs can recognize that would require an exponentially larger RNN to recognize. This is interesting, since it is a neat demonstration tha

Weaknesses

* It is shown that UHATs are shown to have more succinct representations than RNNs for some languages. However, it is also known that RNNs are more expressive than UHATs. Therefore, in terms of succinctness of representations the two classes are incomparable. Thus, the title "Transformers are inherently succinct" is a little misleading, because RNNs are also "inherently succinct" at other languages that Transformers are not succinct at. A sentence acknowledging this (e.g. in the discussion or af

Reviewer 03Rating 8Confidence 4

Strengths

- The results enrich our understanding of the attention mechanism in general and complement recent theoretical results on UHAT as well as on the emptiness problem for transformers with other attention mechanisms. - The succinctness results and EXPSPACE bounds are technically non-trivial. - Despite heavy theory, the paper is readable and provides good intuition

Weaknesses

- UHAT is an intentionally minimal transformer abstraction. The paper would benefit from a discussion of how the results transfer to *practical architectures* (e.g., softmax attention, non-unique ties) and how much of the succinctness phenomenon is specific to UHAT - The $\textsf{EXPSPACE}$-hardness construction (Prop. 6 & 8) relies on several non-trivial translations but omits the concrete UHAT layer-level construction. Without at least an appendix-level sketch (including an explicit bound on n

Videos

Transformers are Inherently Succinct· slideslive

Taxonomy

TopicsFormal Methods in Verification · Logic, programming, and type systems · semigroups and automata theory