Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Spencer Frei, Gal Vardi

TL;DR
This paper investigates how linear transformers trained on random linear classification tasks can generalize well and exhibit benign overfitting, even when in-context examples contain label noise, through analysis of implicit regularization.
Contribution
It provides a theoretical analysis of the implicit regularization in linear transformers trained on classification tasks, revealing conditions for generalization and benign overfitting.
Findings
Trained transformers can generalize well with sufficient pre-training tasks and examples.
Transformers memorize noisy in-context examples but still generalize near-optimally.
Benign overfitting occurs when transformers handle label noise effectively.
Abstract
Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStructural Health Monitoring Techniques · Neural Networks and Applications
MethodsSparse Evolutionary Training · Linear Regression
