Investigation into In-Context Learning Capabilities of Transformers
Rushil Chandrupatla, Leo Bangayan, Sebastian Leng

TL;DR
This paper empirically investigates the scaling behavior of in-context learning in transformers, focusing on Gaussian-mixture classification tasks and analyzing factors like input dimension, examples, and task diversity.
Contribution
It provides a systematic empirical analysis of in-context learning, extending theoretical insights with a detailed study of geometric and data factors affecting success.
Findings
In-context test accuracy depends on input dimension, number of examples, and task diversity.
Benign overfitting occurs under specific data geometry and training conditions.
Identifies parameter regions where in-context learning succeeds or fails.
Abstract
Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism succeeds remains insufficiently characterized. In this paper, we conduct a systematic empirical study of in-context learning for Gaussian-mixture binary classification tasks. Building on the theoretical framework of Frei and Vardi (2024), we analyze how in-context test accuracy depends on three fundamental factors: the input dimension, the number of in-context examples, and the number of pre-training tasks. Using a controlled synthetic setup and a linear in-context classifier formulation, we isolate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
