Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi

TL;DR
This paper provides a theoretical analysis demonstrating that simplified transformers trained via gradient descent can serve as near-optimal metalearners for linear classification tasks, efficiently generalizing with few in-context examples.
Contribution
It introduces the first formal analysis showing transformers as effective metalearners in a linear classification setting, with bounds independent of ambient dimension.
Findings
Transformers trained with gradient descent can act as near-optimal metalearners.
The number of in-context examples needed scales with the subspace dimension, not the ambient dimension.
Transformers outperform learners limited to in-context data, requiring fewer examples to generalize.
Abstract
Transformers have demonstrated impressive in-context learning (ICL) capabilities, raising the question of whether they can serve as metalearners that adapt to new tasks using only a small number of in-context examples, without any further training. While recent theoretical work has studied transformers' ability to perform ICL, most of these analyses do not address the formal metalearning setting, where the objective is to solve a collection of related tasks more efficiently than would be possible by solving each task individually. In this paper, we provide the first theoretical analysis showing that a simplified transformer architecture trained via gradient descent can act as a near-optimal metalearner in a linear classification setting. We consider a natural family of tasks where each task corresponds to a class-conditional Gaussian mixture model, with the mean vectors lying in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Advanced Neural Network Applications
