Hybrid Associative Memories

Leon Lufkin; Tom\'as Figliolia; Beren Millidge; Kamesh Krishnamurthy

arXiv:2603.22325·cs.LG·March 30, 2026

Hybrid Associative Memories

Leon Lufkin, Tom\'as Figliolia, Beren Millidge, Kamesh Krishnamurthy

PDF

TL;DR

The paper introduces the Hybrid Associative Memory (HAM) layer, combining RNNs and self-attention to efficiently store and retrieve sequence information with controllable memory growth, improving performance and reducing costs.

Contribution

It proposes a novel HAM layer that leverages the strengths of RNNs and self-attention, enabling data-dependent, controllable memory growth for sequence modeling.

Findings

01

HAM layers outperform RNNs and Transformers with less KV-cache usage.

02

Controlling KV-cache growth trades off between performance and memory cost.

03

HAM achieves competitive results on sequence tasks.

Abstract

Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a fixed-size state, whereas self-attention's state stores every past time step growing its state (the KV cache) linearly with the sequence length. This results in orthogonal strengths and weaknesses. Self-attention layers excel at retrieving information in the context but have large memory and computational costs, while RNNs are more efficient but degrade over longer contexts and underperform for precise recall tasks. Prior work combining these mechanisms has focused primarily on naively interleaving them to reduce computational cost without regard to their complementary mechanisms. We propose the Hybrid Associative Memory (HAM) layer, which combines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.