Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

TL;DR
This paper demonstrates that pretrained transformers with nonlinear layers can efficiently learn low-dimensional target functions in-context, outperforming traditional methods that scale with ambient dimension, highlighting their adaptivity to low-dimensional structures.
Contribution
The paper introduces a theoretical analysis showing that nonlinear transformers can learn single-index functions efficiently in-context, with sample complexity depending on the low-dimensional structure rather than ambient dimension.
Findings
Transformers learn low-dimensional target functions with prompt length depending on the function class dimension.
Gradient descent-optimized nonlinear transformers outperform direct learning methods in high-dimensional settings.
Sample efficiency is achieved through the transformer's adaptivity to low-dimensional structures.
Abstract
Transformers can efficiently learn in-context from example demonstrations. Most existing theoretical analyses studied the in-context learning (ICL) ability of transformers for linear function classes, where it is typically shown that the minimizer of the pretraining loss implements one gradient descent step on the least squares objective. However, this simplified linear setting arguably does not demonstrate the statistical efficiency of ICL, since the pretrained transformer does not outperform directly solving linear regression on the test prompt. In this paper, we study ICL of a nonlinear function class via transformer with nonlinear MLP layer: given a class of \textit{single-index} target functions , where the index features are drawn from a -dimensional subspace, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsLinear Regression
