Hybrid Associative Memories
Leon Lufkin, Tom\'as Figliolia, Beren Millidge, Kamesh Krishnamurthy

TL;DR
The paper introduces the Hybrid Associative Memory (HAM) layer, combining RNNs and self-attention to efficiently store and retrieve sequence information with controllable memory growth, improving performance and reducing costs.
Contribution
It proposes a novel HAM layer that leverages the strengths of RNNs and self-attention, enabling data-dependent, controllable memory growth for sequence modeling.
Findings
HAM layers outperform RNNs and Transformers with less KV-cache usage.
Controlling KV-cache growth trades off between performance and memory cost.
HAM achieves competitive results on sequence tasks.
Abstract
Recurrent neural networks (RNNs) and self-attention are both widely used sequence-mixing layers that maintain an internal memory. However, this memory is constructed using two orthogonal mechanisms: RNNs compress the entire past into a fixed-size state, whereas self-attention's state stores every past time step growing its state (the KV cache) linearly with the sequence length. This results in orthogonal strengths and weaknesses. Self-attention layers excel at retrieving information in the context but have large memory and computational costs, while RNNs are more efficient but degrade over longer contexts and underperform for precise recall tasks. Prior work combining these mechanisms has focused primarily on naively interleaving them to reduce computational cost without regard to their complementary mechanisms. We propose the Hybrid Associative Memory (HAM) layer, which combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
