Emergence of Episodic Memory in Transformers: Characterizing Changes in   Temporal Structure of Attention Scores During Training

Deven Mahesh Mistry; Anooshka Bajaj; Yash Aggarwal; Sahaj Singh Maini,; Zoran Tiganj

arXiv:2502.06902·cs.LG·February 12, 2025

Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training

Deven Mahesh Mistry, Anooshka Bajaj, Yash Aggarwal, Sahaj Singh Maini,, Zoran Tiganj

PDF

Open Access 1 Video

TL;DR

This paper explores how transformer models develop temporal biases similar to human episodic memory during training, revealing the role of induction heads in organizing information over time.

Contribution

It characterizes the emergence of episodic memory-like effects in transformers and identifies induction heads as key to this temporal organization.

Findings

01

Transformers exhibit effects like temporal contiguity, primacy, and recency.

02

Induction head ablation eliminates the contiguity effect.

03

Transformers show tendencies toward in-context serial recall.

Abstract

We investigate in-context temporal biases in attention heads and transformer outputs. Using cognitive science methodologies, we analyze attention scores and outputs of the GPT-2 models of varying sizes. Across attention heads, we observe effects characteristic of human episodic memory, including temporal contiguity, primacy and recency. Transformer outputs demonstrate a tendency toward in-context serial recall. Importantly, this effect is eliminated after the ablation of the induction heads, which are the driving force behind the contiguity effect. Our findings offer insights into how transformers organize information temporally during in-context learning, shedding light on their similarities and differences with human memory and learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Emergence of Episodic Memory in Transformers: Characterizing Changes in Temporal Structure of Attention Scores During Training· underline

Taxonomy

TopicsCognitive Abilities and Testing · Neural and Behavioral Psychology Studies · Memory Processes and Influences

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings