Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Xin Cheng, Wangding Zeng, Damai Dai, Qinyu Chen, Bingxuan Wang, Zhenda Xie, Kezhao Huang, Xingkai Yu, Zhewen Hao, Yukun Li, Han Zhang, Huishuai Zhang, Dongyan Zhao, Wenfeng Liang

TL;DR
This paper introduces Engram, a scalable conditional memory module for large language models that improves knowledge retrieval, reasoning, and long-context understanding by combining static memory with neural computation, guided by a novel sparsity law.
Contribution
The paper proposes Engram, a scalable, O(1) lookup memory module, and formulates the Sparsity Allocation problem, revealing a U-shaped scaling law to optimize model capacity and efficiency.
Findings
Engram scales to 27B parameters with superior performance.
Significant gains in reasoning and domain-specific tasks.
Enhanced long-context retrieval and efficiency through deterministic addressing.
Abstract
While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic -gram embedding for O(1) lookup. By formulating the Sparsity Allocation problem, we uncover a U-shaped scaling law that optimizes the trade-off between neural computation (MoE) and static memory (Engram). Guided by this law, we scale Engram to 27B parameters, achieving superior performance over a strictly iso-parameter and iso-FLOPs MoE baseline. Most notably, while the memory module is expected to aid knowledge retrieval (e.g., MMLU +3.4; CMMLU +4.0), we observe even larger gains in general reasoning (e.g., BBH +5.0; ARC-Challenge +3.7) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
DeepSeek Just Fixed One Of The Biggest Problems With AI· youtube
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems
