When and How Unlabeled Data Provably Improve In-Context Learning

Yingcong Li; Xiangyu Chang; Muti Kara; Xiaofeng Liu; Amit Roy-Chowdhury; Samet Oymak

arXiv:2506.15329·cs.LG·January 27, 2026

When and How Unlabeled Data Provably Improve In-Context Learning

Yingcong Li, Xiangyu Chang, Muti Kara, Xiaofeng Liu, Amit Roy-Chowdhury, Samet Oymak

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of how multilayer transformers can effectively leverage unlabeled data in in-context learning, especially with missing labels, and demonstrates practical improvements in semi-supervised tabular data tasks.

Contribution

It offers a theoretical framework explaining how deep transformers construct estimators from unlabeled data and proposes a method to enhance semi-supervised learning in real-world models.

Findings

01

Multilayer transformers can implicitly construct polynomial estimators from unlabeled data.

02

Deeper models exponentially increase the polynomial degree, improving semi-supervised learning.

03

Applying looping to foundation models enhances semi-supervised tabular data performance.

Abstract

Recent research shows that in-context learning (ICL) can be effective even when demonstrations have missing or incorrect labels. To shed light on this capability, we examine a canonical setting where the demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention models recover the optimal fully-supervised estimator but completely fail to exploit unlabeled data; (2) In contrast, multilayer or looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i \geq 0} a_{i} (X^{⊤} X)^{i} X^{⊤} y$ with $X$ and $y$ denoting features and partially-observed labels (with missing entries set to zero). We characterize the class of polynomials that can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms

MethodsSparse Evolutionary Training