A Bayesian Perspective on the Role of Epistemic Uncertainty for Delayed Generalization in In-Context Learning

Abdessamed Qchohi; Simone Rossi

arXiv:2604.12434·stat.ML·April 15, 2026

A Bayesian Perspective on the Role of Epistemic Uncertainty for Delayed Generalization in In-Context Learning

Abdessamed Qchohi, Simone Rossi

PDF

TL;DR

This paper investigates how epistemic uncertainty relates to delayed generalization in in-context learning, revealing that uncertainty collapse signals model grokking and linking this to spectral mechanisms.

Contribution

It introduces a Bayesian framework to analyze uncertainty dynamics in transformers, providing theoretical and empirical insights into delayed generalization and grokking phenomena.

Findings

01

Epistemic uncertainty sharply collapses at grokking.

02

Uncertainty dynamics are linked to spectral properties in a simplified model.

03

Uncertainty serves as a label-free diagnostic of generalization.

Abstract

In-context learning enables transformers to adapt to new tasks from a few examples at inference time, while grokking highlights that this generalization can emerge abruptly only after prolonged training. We study task generalization and grokking in in-context learning using a Bayesian perspective, asking what enables the delayed transition from memorization to generalization. Concretely, we consider modular arithmetic tasks in which a transformer must infer a latent linear function solely from in-context examples and analyze how predictive uncertainty evolves during training. We combine approximate Bayesian techniques to estimate the posterior distribution and we study how uncertainty behaves across training and under changes in task diversity, context length, and context noise. We find that epistemic uncertainty collapses sharply when the model groks, making uncertainty a practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.