Investigation into In-Context Learning Capabilities of Transformers

Rushil Chandrupatla; Leo Bangayan; Sebastian Leng

arXiv:2604.25858·cs.LG·May 19, 2026

Investigation into In-Context Learning Capabilities of Transformers

Rushil Chandrupatla, Leo Bangayan, Sebastian Leng

PDF

TL;DR

This paper empirically investigates the scaling behavior of in-context learning in transformers, focusing on Gaussian-mixture classification tasks and analyzing factors like input dimension, examples, and task diversity.

Contribution

It provides a systematic empirical analysis of in-context learning, extending theoretical insights with a detailed study of geometric and data factors affecting success.

Findings

01

In-context test accuracy depends on input dimension, number of examples, and task diversity.

02

Benign overfitting occurs under specific data geometry and training conditions.

03

Identifies parameter regions where in-context learning succeeds or fails.

Abstract

Transformers have demonstrated a strong ability for in-context learning (ICL), enabling models to solve previously unseen tasks using only example input output pairs provided at inference time. While prior theoretical work has established conditions under which transformers can perform linear classification in-context, the empirical scaling behavior governing when this mechanism succeeds remains insufficiently characterized. In this paper, we conduct a systematic empirical study of in-context learning for Gaussian-mixture binary classification tasks. Building on the theoretical framework of Frei and Vardi (2024), we analyze how in-context test accuracy depends on three fundamental factors: the input dimension, the number of in-context examples, and the number of pre-training tasks. Using a controlled synthetic setup and a linear in-context classifier formulation, we isolate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.