How Many Different Outputs Can a Transformer Generate?

Maxime Meyer; Mario Michelessa; Caroline Chaux; Vincent Y. F. Tan

arXiv:2605.22223·cs.LG·May 22, 2026

How Many Different Outputs Can a Transformer Generate?

Maxime Meyer, Mario Michelessa, Caroline Chaux, Vincent Y. F. Tan

PDF

TL;DR

This paper analyzes the capacity of transformers to generate diverse sequences, providing bounds and explanations for their limitations on simple tasks, supported by empirical and theoretical results.

Contribution

It offers a theoretical framework and empirical validation for predicting the number of sequences a transformer can generate based on prompt length.

Findings

01

Maximal accessible sequence length grows linearly with prompt length.

02

Beyond a threshold, accessible sequences decay exponentially.

03

Theoretical upper bounds on the linear coefficient relating prompt and sequence length.

Abstract

We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on the length of the prompt, which we show empirically to be tight up to a factor less than 10, across architectures and model sizes. Our analysis also provides a theoretical explanation for previously observed empirical failures of transformers on simple sequence tasks, such as copying and cramming. Formally, we prove that (i) the maximal length of accessible sequences (those that the transformer can output for some prompt) grows linearly with the prompt length, (ii) beyond a critical threshold, the proportion of accessible sequences decays exponentially with sequence length, and (iii) the linear coefficient relating prompt length to accessible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.