Transformers Meet In-Context Learning: A Universal Approximation Theory

Gen Li; Yuchen Jiao; Yu Huang; Yuting Wei; Yuxin Chen

arXiv:2506.05200·cs.LG·August 29, 2025

Transformers Meet In-Context Learning: A Universal Approximation Theory

Gen Li, Yuchen Jiao, Yu Huang, Yuting Wei, Yuxin Chen

PDF

Open Access

TL;DR

This paper presents a universal approximation theory for transformers, explaining how they enable in-context learning for a broad class of functions without weight updates, extending beyond convex optimization problems.

Contribution

It introduces a new theoretical framework combining Barron's approximation theory with the algorithm mimicking view to explain transformers' in-context learning capabilities.

Findings

01

Transformers can approximate a wide class of functions with small risk.

02

A transformer can be constructed to find linear representations similar to Lasso.

03

The theory extends beyond convex problems like linear regression.

Abstract

Large language models are capable of in-context learning, the ability to perform new tasks at test time using a handful of input-output examples, without parameter updates. We develop a universal approximation theory to elucidate how transformers enable in-context learning. For a general class of functions (each representing a distinct task), we demonstrate how to construct a transformer that, without any further weight updates, can predict based on a few noisy in-context examples with vanishingly small risk. Unlike prior work that frames transformers as approximators of optimization algorithms (e.g., gradient descent) for statistical learning tasks, we integrate Barron's universal function approximation theory with the algorithm approximator viewpoint. Our approach yields approximation guarantees that are not constrained by the effectiveness of the optimization algorithms being…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms