Understanding In-context Learning of Addition via Activation Subspaces

Xinyan Hu; Kayo Yin; Michael I. Jordan; Jacob Steinhardt; Lijie Chen

arXiv:2505.05145·cs.LG·October 10, 2025

Understanding In-context Learning of Addition via Activation Subspaces

Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how transformer language models perform in in-context learning of addition, revealing that a few attention heads with low-dimensional subspaces encode the addition process, and introduces methods to analyze these mechanisms.

Contribution

The paper introduces a novel optimization and analysis framework that localizes and interprets the in-context learning mechanism in transformer models, especially focusing on attention head subspaces.

Findings

01

Few attention heads encode addition in low-dimensional subspaces.

02

Identified a self-correction mechanism in the model's in-context learning.

03

Reduced model complexity to three heads with interpretable subspaces.

Abstract

To perform few-shot learning, language models extract signals from a few input-label pairs, aggregate these into a learned prediction rule, and apply this rule to new inputs. How is this implemented in the forward pass of modern transformer models? To explore this question, we study a structured family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input. We introduce a novel optimization method that localizes the model's few-shot ability to only a few attention heads. We then perform an in-depth analysis of individual heads, via dimensionality reduction and decomposition. As an example, on Llama-3-8B-instruct, we reduce its mechanism on our tasks to just three attention heads with six-dimensional subspaces, where four dimensions track the unit digit with trigonometric functions at periods $2$ , $5$ , and $10$ , and two dimensions track…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper provides a detailed analysis of how ICL emerges in transformers, moving beyond descriptive observations to a mechanistic understanding of specific heads and subspaces. 2. The authors identify a very small subset of attention heads responsible for ICL, demonstrating that task-specific behavior can be localized within a large network. 3. Intervention experiments strengthen the causal claims about which heads and subspaces are responsible for ICL. 4. The paper connects abstract mechani

Weaknesses

1. The analysis is restricted to synthetic add‑k tasks, which may not generalize to more complex or natural ICL tasks such as language understanding or reasoning. 2. The paper localizes ICL to attention heads but largely ignores contributions from feed-forward networks (FFNs) [1] or other layers, leaving a partial picture of the mechanism. 3. The projection of head outputs into low-dimensional trigonometric subspaces assumes well-behaved linear relationships, which may not hold in more complex o

Reviewer 02Rating 6Confidence 4

Strengths

* The authors present a well motivated study of how LLMs learn to perform a single task in-context. * The study is extremely in-depth and thorough. * The authors present clear evidence for their model, according to which a few heads represent the parity, unit digit and magnitude of $k$. * The authors also present a useful method for finding the heads that are used by a model to solve a task in-context.

Weaknesses

* The paper is pretty dense to read, some of the explanations in the text could have accompanying figures. This holds especially for section 3, 4 and 5 which do not contain much in terms of figures. * Not really a major weakness, but the paper only covers one task. While the analyses of how the model solves this tasks is very detailed, it's not obvious how these insights will generalize to how ICL may work in more general setups. For instance, do the circuits analyzed here also cover $k$-subtrac

Reviewer 03Rating 4Confidence 3

Strengths

The methodology is precise and reproducible (seemingly), combining causal interventions with low-dimensional analysis rather than relying on correlations. The discovery that only three heads encode nearly all ICL function is striking and empirically well supported. The identification of a structured six-dimensional subspace gives a clear, interpretable geometry to addition in LLMs. The extractor-aggregator relation and observed self-correction behavior offer new insight into how contextual infor

Weaknesses

- I disagree with the discussion in 132-138. "likely output" in my understanding is two words belong to similar topic, and thus would have closer semantic relationship. Since $x_q$ and $k$ are both numbers, they would be also semantically close than $x_q$ and singer. - Your activation patching is similar to the treatement of the study of task vector arithematic in factual recall task as in Merullo et al. (2024) leveraging task vector, please cast a comparison. - Your locolization optimization me

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms

MethodsSoftmax · Attention Is All You Need