On the Interpolation Error of Nonlinear Attention versus Linear Regression
Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling

TL;DR
This paper analyzes the interpolation error of nonlinear Attention models in high-dimensional settings, showing they generally perform worse than linear regression unless the input contains a structured signal aligned with Attention weights.
Contribution
It provides the first explicit characterization of the interpolation error for nonlinear Attention in high dimensions, revealing conditions where Attention outperforms linear regression.
Findings
Nonlinear Attention typically has higher interpolation error than linear regression.
The error gap closes or reverses when inputs contain structured signals aligned with Attention weights.
Numerical experiments support the theoretical predictions.
Abstract
Attention has become the core building block of modern machine learning (ML) by efficiently capturing the long-range dependencies among input tokens. Its inherently parallelizable structure allows for efficient performance scaling with the rapidly increasing size of both data and model parameters. Despite its central role, the theoretical understanding of Attention, especially in the nonlinear setting, is progressing at a more modest pace. This paper provides a precise characterization of the interpolation error for a nonlinear Attention, in the high-dimensional regime where the number of input tokens and the embedding dimension are both large and comparable. Under a signal-plus-noise data model and for fixed Attention weights, we derive explicit (limiting) expressions for the mean-squared interpolation error. Leveraging recent advances in random matrix theory, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
