RAM-Net: Expressive Linear Attention with Selectively Addressable Memory
Kaicheng Xiao, Haotian Li, Liran Dong, Guoliang Xing

TL;DR
RAM-Net introduces a high-dimensional sparse memory architecture with selective addressing, enabling exponential scaling of state size and improved expressivity in linear attention models, while maintaining computational efficiency.
Contribution
The paper presents RAM-Net, a novel linear attention architecture with explicit sparse memory addressing, bridging the gap between expressivity and efficiency in sequence modeling.
Findings
Outperforms state-of-the-art in long-range retrieval tasks
Achieves competitive results in language modeling benchmarks
Demonstrates superior dependency capturing with reduced computation
Abstract
While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memory Network (RAM-Net), a novel architecture designed to bridge the gap between the representational capacity of full attention and the memory efficiency of linear models. The core of RAM-Net maps inputs to high-dimensional sparse vectors serving as explicit addresses, allowing the model to selectively access a massive memory state. This design enables exponential state size scaling without additional parameters, which significantly mitigates signal interference and enhances retrieval fidelity. Moreover, the inherent sparsity ensures exceptional computational efficiency, as state updates are confined to minimal entries. Extensive experiments demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
