Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal; Eric Elmoznino; Leo Gagnon; Sangnie Bhardwaj; Tom Marty; Dhanya Sridhar; Guillaume Lajoie

arXiv:2405.19162·cs.LG·June 17, 2025

Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal, Eric Elmoznino, Leo Gagnon, Sangnie Bhardwaj, Tom Marty, Dhanya Sridhar, Guillaume Lajoie

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates whether explicitly inferring task-relevant latent variables in Transformers improves in-context learning, finding that such structured approaches do not significantly enhance out-of-distribution performance or robustness.

Contribution

The study introduces a minimal bottleneck modification to Transformers to explicitly extract task latents and compares it with standard models, revealing limited benefits for generalization.

Findings

01

Bottleneck effectively extracts task latents from context.

02

No significant out-of-distribution performance gain from latent inference.

03

Transformers struggle to utilize inferred latents for robust prediction.

Abstract

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ericelmoznino/explicit_implicit_icl
pytorchOfficial

Videos

Does learning the right latent variables necessarily improve in-context learning?· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Statistics Education and Methodologies

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections