Temporal Dependencies in In-Context Learning: The Role of Induction Heads
Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Billy Dickson, Zoran Tiganj

TL;DR
This paper investigates how induction heads in large language models facilitate temporal context processing, revealing their crucial role in serial recall and in-context learning behaviors.
Contribution
It demonstrates that induction heads are mechanistically linked to temporal dependencies and ordered retrieval in transformer-based language models.
Findings
Induction heads significantly influence serial-recall-like patterns in LLMs.
Removing induction heads reduces +1 lag bias and impairs serial recall performance.
Induction heads are crucial for temporal context processing in in-context learning.
Abstract
Large language models (LLMs) exhibit strong in-context learning capabilities, but how they track and retrieve information from context remains underexplored. Drawing on the free recall paradigm in cognitive science (where participants recall list items in any order), we show that several open-source LLMs consistently display a serial-recall-like pattern, assigning peak probability to tokens that immediately follow a repeated token in the input sequence. Through systematic ablation experiments, we show that induction heads, specialized attention heads that attend to the token following a previous occurrence of the current token, play an important role in this phenomenon. Removing heads with a high induction score substantially reduces the +1 lag bias, whereas ablating random heads does not reproduce the same reduction. We also show that removing heads with high induction scores impairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
