In-context Learning for Mixture of Linear Regressions: Existence,   Generalization and Training Dynamics

Yanhao Jin; Krishnakumar Balasubramanian; Lifeng Lai

arXiv:2410.14183·stat.ML·February 11, 2025

In-context Learning for Mixture of Linear Regressions: Existence, Generalization and Training Dynamics

Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai

PDF

Open Access

TL;DR

This paper provides theoretical and empirical analysis of transformers' in-context learning abilities for mixture of linear regressions, including existence, generalization bounds, and training dynamics, with promising simulation results.

Contribution

It offers the first theoretical insights into transformers' in-context learning for mixture models, including existence proofs, generalization bounds, and analysis of training dynamics.

Findings

01

Transformers can achieve prediction error of order √(d/n) in high SNR regimes.

02

In-context excess risk bounds of order L/√B are derived for two mixtures.

03

Simulations show transformers outperform traditional algorithms like EM.

Abstract

We investigate the in-context learning capabilities of transformers for the $d$ -dimensional mixture of linear regression model, providing theoretical insights into their existence, generalization bounds, and training dynamics. Specifically, we prove that there exists a transformer capable of achieving a prediction error of order $O (d / n)$ with high probability, where $n$ represents the training prompt size in the high signal-to-noise ratio (SNR) regime. Moreover, we derive in-context excess risk bounds of order $O (L / B)$ for the case of two mixtures, where $B$ denotes the number of training prompts, and $L$ represents the number of attention layers. The dependence of $L$ on the SNR is explicitly characterized, differing between low and high SNR settings. We further analyze the training dynamics of transformers with single linear self-attention layers,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition

MethodsSoftmax · Attention Is All You Need · Linear Regression