Layerwise Dynamics for In-Context Classification in Transformers

Patrick Lutz; Themistoklis Haris; Arjun Chandra; Aditya Gangrade; Venkatesh Saligrama

arXiv:2604.11613·cs.LG·April 20, 2026

Layerwise Dynamics for In-Context Classification in Transformers

Patrick Lutz, Themistoklis Haris, Arjun Chandra, Aditya Gangrade, Venkatesh Saligrama

PDF

TL;DR

This paper uncovers the internal dynamics of transformers performing in-context classification, revealing an explicit recursive update rule that enhances class separation and interpretability.

Contribution

It introduces a novel identifiable model enforcing permutation equivariance, leading to the first explicit recursion inside a softmax transformer for classification.

Findings

01

Derives an explicit depth-indexed recursion for transformer dynamics.

02

Shows that attention matrices drive coupled updates of data and labels.

03

Proves that the dynamics amplify class separation and improve class alignment.

Abstract

Transformers can perform in-context classification from a few labeled examples, yet the inference-time algorithm remains opaque. We study multi-class linear classification in the hard no-margin regime and make the computation identifiable by enforcing feature- and label-permutation equivariance at every layer. This enables interpretability while maintaining functional equivalence and yields highly structured weights. From these models we extract an explicit depth-indexed recursion: an end-to-end identified, emergent update rule inside a softmax transformer, to our knowledge the first of its kind. Attention matrices formed from mixed feature-label Gram structure drive coupled updates of training points, labels, and the test probe. The resulting dynamics implement a geometry-driven algorithmic motif, which can provably amplify class separation and yields robust expected class alignment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.