Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

Michelle Ching; Ioana Popescu; Nico Smith; Tianyi Ma; William G. Underwood; Richard J. Samworth

arXiv:2601.15014·stat.ML·May 20, 2026

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

PDF

TL;DR

This paper demonstrates that pretrained transformers can efficiently perform minimax optimal nonparametric regression in-context, requiring fewer parameters and pretraining sequences than previous methods.

Contribution

It introduces a transformer-based approach that approximates local polynomial estimators, achieving optimal convergence rates with significantly fewer resources.

Findings

01

Transformers achieve minimax optimal convergence rates in nonparametric regression.

02

The method requires fewer parameters and pretraining sequences than prior approaches.

03

Transformers implement kernel-weighted polynomial basis via gradient descent.

Abstract

We study in-context learning for nonparametric regression with $α$ -H\"older smooth regression functions, for some $α > 0$ . We prove that, with $n$ in-context examples and $d$ -dimensional regression covariates, a pretrained transformer with $Θ (lo g n)$ parameters and $Ω (n^{2 α / (2 α + d)} lo g^{3} n)$ pretraining sequences can achieve the minimax optimal rate of convergence $O (n^{- 2 α / (2 α + d)})$ in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Gaussian Processes and Bayesian Inference