Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Alexander Hsu; Zhaiming Shen; Wenjing Liao; Rongjie Lai

arXiv:2605.05176·cs.LG·May 7, 2026

Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai

PDF

TL;DR

This paper develops a theoretical framework for understanding in-context learning with transformers in nonlinear regression, constructing explicit nonlinear features via attention and providing finite-sample error bounds.

Contribution

It introduces a novel construction of transformer networks that realize nonlinear features, extending ICL theory beyond linear models.

Findings

01

Finite-sample generalization error bounds derived.

02

Constructed transformer networks realize polynomial and spline bases.

03

Theory validated on synthetic regression tasks.

Abstract

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.