Transformer learns the cross-task prior and regularization for in-context learning
Fei Lu, Yue Yu

TL;DR
This paper investigates how transformers perform in in-context learning for inverse linear regression, revealing they learn priors and regularization strategies, outperforming traditional methods, especially in ill-posed problems.
Contribution
It introduces a linear transformer model that learns inverse mappings, demonstrating implicit prior and regularization learning, and provides insights into transformers' knowledge extraction mechanisms.
Findings
Transformers outperform ridge regression in inverse linear regression tasks.
The error scales linearly with noise, task dimension ratio, and data condition number.
Low task dimensionality relative to context length is crucial for successful learning.
Abstract
Transformers have shown a remarkable ability for in-context learning (ICL), making predictions based on contextual examples. However, while theoretical analyses have explored this prediction capability, the nature of the inferred context and its utility for downstream predictions remain open questions. This paper aims to address these questions by examining ICL for inverse linear regression (ILR), where context inference can be characterized by unsupervised learning of underlying weight vectors. Focusing on the challenging scenario of rank-deficient inverse problems, where context length is smaller than the number of unknowns in the weight vectors and regularization is necessary, we introduce a linear transformer to learn the inverse mapping from contextual examples to the underlying weight vector. Our findings reveal that the transformer implicitly learns both a prior distribution and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI) · Human Pose and Action Recognition
MethodsLinear Regression
