Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition
Qinyuan Ye, Robin Jia, Xiang Ren

TL;DR
This study investigates how large language models internally generalize to unseen tasks, revealing a function induction mechanism that involves multiple attention heads and is reused across various tasks, enhancing understanding of model interpretability.
Contribution
The paper introduces the concept of function induction, a higher-level abstraction of the induction head mechanism, and demonstrates its role in task generalization and reuse across different tasks.
Findings
Identifies a function induction mechanism explaining generalization from addition to off-by-one addition.
Shows multiple attention heads collaboratively induce the +1 function.
Demonstrates the reuse of this mechanism in diverse tasks like QA and algorithmic problems.
Abstract
Large language models demonstrate the intriguing ability to perform unseen tasks via in-context learning. However, it remains unclear what mechanisms inside the model drive such task-level generalization. In this work, we approach this question through the lens of off-by-one addition (i.e., 1+1=3, 2+2=5, 3+3=?), a two-step, counterfactual task with an unexpected +1 function as a second step. Leveraging circuit-style interpretability techniques such as path patching, we analyze the models' internal computations behind their performance and present three key findings. First, we identify a mechanism that explains the model's generalization from standard addition to off-by-one addition. It resembles the induction head mechanism described in prior work, yet operates at a higher level of abstraction; we therefore term it "function induction" in this work. Second, we show that the induction of…
Peer Reviews
Decision·ICLR 2026 Poster
[S1] This paper is well written and easy to follow even if without sufficient prior knowledge on this domain. [S2] The experiment and analysis seem to be simple yet have effective coverage and depth. [S3] The paper demonstrates analyses in the tasks where function-induction heads appear, suggesting that the function-induction heads in LLMs might be responsible for the capability to identify the off-by-one function and other new functions in in-context learning.
[W1] It is unclear how these results could generalize well and be universal across different models. Most analyses focus on the Gemma2-9B model, with function-induction heads only identified in other models but not sufficiently validated (we can see the results of Mistral/Llama in Appendix D.2) as done in Gemma2. Moreover, usually the lines of mechanistic interpretability research (esp. induction heads) have started with simple toy models (e.g., a few layer attention networks) to clearly point
- The path patching results are quite compelling. It is very surprising to me that replacing the heads from the contrast task with those of the base task leads to any sensible behavior at all, much less reversion to correct base task performance. - I appreciate that the authors presented several other tasks in section 5. The results are a bit more mixed on the other tasks, but are nonetheless consistent with the narrative presented in the paper.
- There was not enough description of the differences between the FI head and the FV head. All that is reported is that the heads appear at different layers. The paper should describe conceptual differences between the solution concepts. If there aren’t major conceptual differences, this is a concern. The fact that the heads appear at different layers could be explained by other factors, like different models. - I would quibble with “Off-by-one addition is likely an unseen task to these language
- The off-by-one task is simple yet counterfactual, enabling precise mechanistic tracing. The experiments are carefully designed to isolate the function-induction process. - The analysis spans six modern LLMs (Gemma-2, Llama-2/3, Mistral, Qwen-2.5, Phi-4), confirming the generality of findings. - The introduction of Function Induction (FI) heads extends prior induction-head results to function-level abstraction, a conceptually and methodologically significant advance. - The paper provides one
- Ambiguous motivation: In Line 41, the paper states that “our understanding is still limited, especially regarding more complex generalization scenarios involving unexpected elements or newly defined concepts in the task.” However, the actual experiments focus on a very simple and synthetic task (off-by-one addition). It is therefore difficult to claim that the study meaningfully addresses “more complex generalization scenarios.” - Unclear mechanism of Group 1 (consolidation heads): The disc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
