Trained Transformer Classifiers Generalize and Exhibit Benign   Overfitting In-Context

Spencer Frei; Gal Vardi

arXiv:2410.01774·cs.LG·December 16, 2024

Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

Spencer Frei, Gal Vardi

PDF

Open Access 1 Repo

TL;DR

This paper investigates how linear transformers trained on random linear classification tasks can generalize well and exhibit benign overfitting, even when in-context examples contain label noise, through analysis of implicit regularization.

Contribution

It provides a theoretical analysis of the implicit regularization in linear transformers trained on classification tasks, revealing conditions for generalization and benign overfitting.

Findings

01

Trained transformers can generalize well with sufficient pre-training tasks and examples.

02

Transformers memorize noisy in-context examples but still generalize near-optimally.

03

Benign overfitting occurs when transformers handle label noise effectively.

Abstract

Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spencerfrei/icl_classification
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStructural Health Monitoring Techniques · Neural Networks and Applications

MethodsSparse Evolutionary Training · Linear Regression