From Small to Large: Generalization Bounds for Transformers on Variable-Size Inputs
Anastasiia Alokhina, Pan Li

TL;DR
This paper provides a theoretical analysis of how Transformers generalize from small to large variable-size inputs, especially for geometric data like point clouds and graphs, by establishing bounds based on data density and manifold dimension.
Contribution
It introduces a novel theoretical framework and error bounds for Transformers' size generalization on geometric data, linking performance to sampling density and data manifold properties.
Findings
Theoretical bounds accurately predict Transformer behavior across input sizes.
Experimental results on graphs and point clouds validate the tightness of the bounds.
Transformers with stable positional encodings depend on data density and intrinsic dimension.
Abstract
Transformers exhibit a notable property of \emph{size generalization}, demonstrating an ability to extrapolate from smaller token sets to significantly longer ones. This behavior has been documented across diverse applications, including point clouds, graphs, and natural language. Despite its empirical success, this capability still lacks some rigorous theoretical characterizations. In this paper, we develop a theoretical framework to analyze this phenomenon for geometric data, which we represent as discrete samples from a continuous source (e.g., point clouds from manifolds, graphs from graphons). Our core contribution is a bound on the error between the Transformer's output for a discrete sample and its continuous-domain equivalent. We prove that for Transformers with stable positional encodings, this bound is determined by the sampling density and the intrinsic dimensionality of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topological and Geometric Data Analysis · Graph Theory and Algorithms
