How Many Different Outputs Can a Transformer Generate?
Maxime Meyer, Mario Michelessa, Caroline Chaux, Vincent Y. F. Tan

TL;DR
This paper analyzes the capacity of transformers to generate diverse sequences, providing bounds and explanations for their limitations on simple tasks, supported by empirical and theoretical results.
Contribution
It offers a theoretical framework and empirical validation for predicting the number of sequences a transformer can generate based on prompt length.
Findings
Maximal accessible sequence length grows linearly with prompt length.
Beyond a threshold, accessible sequences decay exponentially.
Theoretical upper bounds on the linear coefficient relating prompt and sequence length.
Abstract
We study how we can leverage only a handful of characteristics of a transformer's architecture to closely predict the number of different sequences it can output, both qualitatively and quantitatively. We provide an upper bound depending on the length of the prompt, which we show empirically to be tight up to a factor less than 10, across architectures and model sizes. Our analysis also provides a theoretical explanation for previously observed empirical failures of transformers on simple sequence tasks, such as copying and cramming. Formally, we prove that (i) the maximal length of accessible sequences (those that the transformer can output for some prompt) grows linearly with the prompt length, (ii) beyond a critical threshold, the proportion of accessible sequences decays exponentially with sequence length, and (iii) the linear coefficient relating prompt length to accessible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
