Transformers Don't In-Context Learn Least Squares Regression
Joshua Hill, Benjamin Eyre, and Elliot Creager

TL;DR
This paper investigates how large transformers perform in in-context learning for linear regression, revealing they do not implement traditional algorithms like least squares and are heavily influenced by pretraining data distribution.
Contribution
The study challenges the assumption that transformers implement classical learning algorithms during in-context learning, showing they fail to generalize out-of-distribution and are shaped by pretraining spectral signatures.
Findings
Transformers trained for ICL do not generalize well out-of-distribution.
Spectral analysis reveals a unique signature for in-distribution inputs.
Performance correlates with spectral signature presence.
Abstract
In-context learning (ICL) has emerged as a powerful capability of large pretrained transformers, enabling them to solve new tasks implicit in example input-output pairs without any gradient updates. Despite its practical success, the mechanisms underlying ICL remain largely mysterious. In this work we study synthetic linear regression to probe how transformers implement learning at inference time. Previous works have demonstrated that transformers match the performance of learning rules such as Ordinary Least Squares (OLS) regression or gradient descent and have suggested ICL is facilitated in transformers through the learned implementation of one of these techniques. In this work, we demonstrate through a suite of out-of-distribution generalization experiments that transformers trained for ICL fail to generalize after shifts in the prompt distribution, a behaviour that is inconsistent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
