Length Generalization Bounds for Transformers
Andy Yang, Pascal Bergstr\"a{\ss}er, Georg Zetzsche, David Chiang, Anthony W. Lin

TL;DR
This paper proves that computable length generalization bounds do not exist for transformers in general, but provides such bounds for a positive fragment, revealing exponential complexity and optimality.
Contribution
It establishes the non-existence of computable bounds for CRASP with two layers, and offers computable bounds for a positive fragment linked to fixed-precision transformers.
Findings
No computable length generalization bounds for CRASP with two layers.
A computable bound exists for the positive fragment of CRASP.
Length complexity for positive CRASP and fixed-precision transformers is exponential.
Abstract
Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be able to compute a length generalization bound, beyond which the model is guaranteed to generalize. This paper concerns the open problem of the computability of such generalization bounds for CRASP, a class of languages which is closely linked to transformers. A positive partial result was recently shown by Chen et al. for CRASP with only one layer and, under some restrictions, also with two layers. We provide complete answers to the above open problem. Our main result is the non-existence of computable length generalization bounds for CRASP (already with two layers) and hence for transformers. To complement this, we provide a computable bound for the positive fragment of CRASP, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Natural Language Processing Techniques · Complexity and Algorithms in Graphs
