# Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling

**Authors:** Rishiraj Acharya

arXiv: 2509.00605 · 2025-09-03

## TL;DR

The paper introduces the Gated Associative Memory (GAM), a linear-time, fully parallel sequence modeling architecture that combines local and global context processing, outperforming traditional Transformers in speed and competitive in accuracy.

## Contribution

It presents GAM, a novel sequence model replacing self-attention with parallel pathways for local and global context, achieving linear complexity and improved efficiency.

## Key findings

- GAM is faster than Transformer and Mamba models.
- GAM achieves superior or competitive perplexity on benchmarks.
- GAM demonstrates effective local-global context integration.

## Abstract

The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating a significant bottleneck for processing long contexts. In this paper, we propose the Gated Associative Memory (GAM) network, a novel, fully parallel architecture for sequence modeling that exhibits linear complexity (O(N)) with respect to sequence length. The GAM block replaces the self-attention layer with two parallel pathways: a causal convolution to efficiently capture local, position-dependent context, and a parallel associative memory retrieval mechanism to model global, content-based patterns. These pathways are dynamically fused using a gating mechanism, allowing the model to flexibly combine local and global information for each token. We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline (Mamba) on the WikiText-2 benchmark, as well as against the Transformer on the TinyStories dataset. Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets, establishing it as a promising and efficient alternative for sequence modeling.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00605/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00605/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/2509.00605/full.md

---
Source: https://tomesphere.com/paper/2509.00605