Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers
Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

TL;DR
This paper demonstrates that pretrained transformers can efficiently perform minimax optimal nonparametric regression in-context, requiring fewer parameters and pretraining sequences than previous methods.
Contribution
It introduces a transformer-based approach that approximates local polynomial estimators, achieving optimal convergence rates with significantly fewer resources.
Findings
Transformers achieve minimax optimal convergence rates in nonparametric regression.
The method requires fewer parameters and pretraining sequences than prior approaches.
Transformers implement kernel-weighted polynomial basis via gradient descent.
Abstract
We study in-context learning for nonparametric regression with -H\"older smooth regression functions, for some . We prove that, with in-context examples and -dimensional regression covariates, a pretrained transformer with parameters and pretraining sequences can achieve the minimax optimal rate of convergence in mean squared error. Our result requires substantially fewer transformer parameters and pretraining sequences than previous results in the literature. This is achieved by showing that transformers are able to approximate local polynomial estimators efficiently by implementing a kernel-weighted polynomial basis and then running gradient descent.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Gaussian Processes and Bayesian Inference
