Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

Thomas F Burns; Tomoki Fukai; Christopher J Earls

arXiv:2412.15113·cs.NE·August 5, 2025

Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture

Thomas F Burns, Tomoki Fukai, Christopher J Earls

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel residual stream architecture inspired by associative memory, enhancing in-context learning in language models by enabling faster and improved information flow between attention heads.

Contribution

The paper proposes a new residual stream architecture inspired by associative memory models, improving in-context learning speed and performance in small and large language models.

Findings

01

Faster manifestation of in-context learning abilities during training.

02

Improved performance on attention head values in larger models.

03

Effective in small models with 8 million parameters.

Abstract

Large language models (LLMs) demonstrate an impressive ability to utilise information within the context of their input sequences to appropriately respond to data unseen by the LLM during its training procedure. This ability is known as in-context learning (ICL). Humans and non-human animals demonstrate similar abilities, however their neural architectures differ substantially from LLMs. Despite this, a critical component within LLMs, the attention mechanism, resembles modern associative memory models, widely used in and influenced by the computational neuroscience community to model biological memory systems. Using this connection, we introduce an associative memory model capable of performing ICL. We use this as inspiration for a novel residual stream architecture which allows information to directly flow between attention heads. We test this architecture during training within a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tfburns/amicl-and-residual-attention-streams
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Data Stream Mining Techniques · Online Learning and Analytics

MethodsLinear Layer · Dropout · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Attention Is All You Need