Distinct mechanisms underlying in-context learning in transformers

Cole Gibson; Wenping Cui; Gautam Reddy

arXiv:2604.12151·cs.LG·April 15, 2026

Distinct mechanisms underlying in-context learning in transformers

Cole Gibson, Wenping Cui, Gautam Reddy

PDF

TL;DR

This paper provides a detailed mechanistic analysis of in-context learning in transformers trained on Markov chains, revealing four algorithmic phases and distinct subcircuits responsible for adaptive computation.

Contribution

It characterizes the four phases of in-context learning in transformers and identifies the underlying subcircuits and conditions influencing their operation.

Findings

01

Transformers exhibit four distinct algorithmic phases during in-context learning.

02

Two key boundaries, $K_1^*$ and $K_2^*$, depend on data diversity and influence memorization and generalization.

03

Theoretical analysis explains the transition from 1-point to 2-point generalization and the loss landscape features.

Abstract

Modern distributed networks, notably transformers, acquire a remarkable ability (termed `in-context learning') to adapt their computation to input statistics, such that a fixed network can be applied to data from a broad range of systems. Here, we provide a complete mechanistic characterization of this behavior in transformers trained on a finite set $S$ of discrete Markov chains. The transformer displays four algorithmic phases, characterized by whether the network memorizes and generalizes, and whether it uses 1-point or 2-point statistics. We show that the four phases are implemented by multi-layer subcircuits that exemplify two qualitatively distinct mechanisms for implementing context-adaptive computations. Minimal models isolate the key features of both motifs. Memorization and generalization phases are delineated by two boundaries that depend on data diversity, $K = ∣ S ∣$ . The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.