Linking In-context Learning in Transformers to Human Episodic Memory

Li Ji-An; Corey Y. Zhou; Marcus K. Benna; Marcelo G. Mattar

arXiv:2405.14992·cs.CL·November 1, 2024·2 cites

Linking In-context Learning in Transformers to Human Episodic Memory

Li Ji-An, Corey Y. Zhou, Marcus K. Benna, Marcelo G. Mattar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores the parallels between attention mechanisms in Transformer models and human episodic memory, revealing that certain attention heads function similarly to human memory processes and are crucial for in-context learning.

Contribution

It identifies and characterizes CMR-like attention heads in Transformers, linking them to human episodic memory and demonstrating their causal role in in-context learning.

Findings

01

CMR-like heads emerge in intermediate and late layers of LLMs.

02

Ablation of CMR-like heads impairs in-context learning performance.

03

Attention heads exhibit behaviors similar to human memory biases.

Abstract

Understanding connections between artificial and biological intelligent systems can reveal fundamental principles of general intelligence. While many artificial intelligence models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between interacting attention heads and human episodic memory. We focus on induction heads, which contribute to in-context learning in Transformer-based large language models (LLMs). We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory. Our analyses of LLMs pre-trained on extensive text data show that CMR-like heads often emerge in the intermediate and late layers, qualitatively mirroring human memory biases. The ablation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

corxyz/icl-cmr
noneOfficial

Videos

Linking In-context Learning in Transformers to Human Episodic Memory· slideslive

Taxonomy

TopicsNeural Networks and Applications · EEG and Brain-Computer Interfaces

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections