Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

Kazuki Irie; Morris Yau; Samuel J. Gershman

arXiv:2506.00744·cs.LG·October 24, 2025

Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

Kazuki Irie, Morris Yau, Samuel J. Gershman

PDF

Open Access

TL;DR

This paper introduces hybrid memory architectures combining key-value and fast weight memories in transformers, enhancing sequence processing by leveraging their complementary strengths for language modeling, algorithmic, and reinforcement learning tasks.

Contribution

The paper proposes and compares three methods to effectively blend quadratic and linear transformer memory systems, demonstrating improved performance across multiple tasks.

Findings

01

Hybrid memory systems outperform individual components in language modeling.

02

Hybrid approaches enable processing of longer sequences with better recall.

03

Experimental results show improved performance in reinforcement learning environments.

Abstract

We develop hybrid memory architectures for general-purpose sequence processing neural networks, that combine key-value memory using softmax attention (KV-memory) with fast weight memory through dynamic synaptic modulation (FW-memory) -- the core principles of quadratic and linear transformers, respectively. These two memory systems have complementary but individually limited properties: KV-memory offers precise retrieval but is constrained by quadratic complexity in sequence length, while FW-memory supports arbitrarily long sequences and enables more expressive computation but sacrifices precise recall. We propose and compare three methods to blend these two systems into a single memory system, differing in how and when input information is delivered to each system, to leverage the strengths of both. We conduct experiments on general language modeling and retrieval tasks by training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies