In-context Learning for Mixture of Linear Regressions: Existence, Generalization and Training Dynamics
Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai

TL;DR
This paper provides theoretical and empirical analysis of transformers' in-context learning abilities for mixture of linear regressions, including existence, generalization bounds, and training dynamics, with promising simulation results.
Contribution
It offers the first theoretical insights into transformers' in-context learning for mixture models, including existence proofs, generalization bounds, and analysis of training dynamics.
Findings
Transformers can achieve prediction error of order √(d/n) in high SNR regimes.
In-context excess risk bounds of order L/√B are derived for two mixtures.
Simulations show transformers outperform traditional algorithms like EM.
Abstract
We investigate the in-context learning capabilities of transformers for the -dimensional mixture of linear regression model, providing theoretical insights into their existence, generalization bounds, and training dynamics. Specifically, we prove that there exists a transformer capable of achieving a prediction error of order with high probability, where represents the training prompt size in the high signal-to-noise ratio (SNR) regime. Moreover, we derive in-context excess risk bounds of order for the case of two mixtures, where denotes the number of training prompts, and represents the number of attention layers. The dependence of on the SNR is explicitly characterized, differing between low and high SNR settings. We further analyze the training dynamics of transformers with single linear self-attention layers,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
MethodsSoftmax · Attention Is All You Need · Linear Regression
