Rethinking Associative Memory Mechanism in Induction Head

Shuo Wang; Issei Sato

arXiv:2412.11459·cs.CL·July 9, 2025

Rethinking Associative Memory Mechanism in Induction Head

Shuo Wang, Issei Sato

PDF

Open Access

TL;DR

This paper investigates how a two-layer transformer captures in-context information and balances it with pretrained bigram knowledge, providing theoretical analysis and experimental validation of associative memory mechanisms in in-context learning.

Contribution

It offers a theoretical analysis of transformer attention weights and logits in the context of associative memory, complemented by experiments with specially designed prompts.

Findings

01

Transformers encode in-context information and bigram knowledge in attention weights.

02

Theoretical predictions align with experimental results on prompt outputs.

03

Insights into the balance between in-context learning and pretrained knowledge.

Abstract

Induction head mechanism is a part of the computational circuits for in-context learning (ICL) that enable large language models (LLMs) to adapt to new tasks without fine-tuning. Most existing work explains the training dynamics behind acquiring such a powerful mechanism. However, the model's ability to coordinate in-context information over long contexts and global knowledge acquired during pretraining remains poorly understood. This paper investigates how a two-layer transformer thoroughly captures in-context information and balances it with pretrained bigram knowledge in next token prediction, from the viewpoint of associative memory. We theoretically analyze the representation of weight matrices in attention layers and the resulting logits when a transformer is given prompts generated by a bigram model. In the experiments, we design specific prompts to evaluate whether the outputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsALIGN