Optimal In-context Adaptivity and Distributional Robustness of Transformers

Tianyi Ma; Tengyao Wang; Richard J. Samworth

arXiv:2510.23254·stat.ML·May 8, 2026

Optimal In-context Adaptivity and Distributional Robustness of Transformers

Tianyi Ma, Tengyao Wang, Richard J. Samworth

PDF

TL;DR

This paper analyzes how pretrained Transformers perform on tasks with distribution shifts, demonstrating they adapt optimally to task difficulty and are robust to distributional changes, with theoretical guarantees.

Contribution

It provides a theoretical framework showing pretrained Transformers achieve optimal convergence rates under distribution shifts, outperforming traditional minimax bounds.

Findings

01

Transformers pretrained on sufficient data adapt to task difficulty levels.

02

They maintain optimal convergence rates within chi-squared divergence bounds.

03

Pretrained Transformers outperform estimators with access to test distributions.

Abstract

We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution $π = \sum_{α \in A} λ_{α} π_{α}$ , called the pretraining prior, in which each mixture component $π_{α}$ is a distribution on tasks of a specific difficulty level indexed by $α$ . Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution $μ$ , consisting of tasks of fixed difficulty $β \in A$ , and with potential distribution shift relative to $π_{β}$ , subject to the chi-squared divergence $χ^{2} (μ, π_{β})$ being at most $κ$ . In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with both random smoothness and random effective dimension. We prove that a large Transformer pretrained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.