AllMem: A Memory-centric Recipe for Efficient Long-context Modeling

Ziming Wang; Xiang Wang; Kailong Peng; Lang Qin; Juan Gabriel Kostelec; Christos Sourmpis; Axel Laborieux; Qinghai Guo

arXiv:2602.13680·cs.AI·February 17, 2026

AllMem: A Memory-centric Recipe for Efficient Long-context Modeling

Ziming Wang, Xiang Wang, Kailong Peng, Lang Qin, Juan Gabriel Kostelec, Christos Sourmpis, Axel Laborieux, Qinghai Guo

PDF

Open Access

TL;DR

AllMem introduces a hybrid architecture combining sliding window attention with memory networks, enabling efficient long-context modeling in LLMs with reduced computational costs and mitigated catastrophic forgetting.

Contribution

The paper presents a novel hybrid architecture and a memory-efficient fine-tuning method that allow pre-trained LLMs to effectively handle ultra-long contexts with minimal performance loss.

Findings

01

Achieves near-lossless performance on 37k LongBench with 4k window.

02

Outperforms full attention on 128k context in InfiniteBench.

03

Reduces computational and memory overhead during long-sequence inference.

Abstract

Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient hybrid architecture that integrates Sliding Window Attention (SWA) with non-linear Test-Time Training (TTT) memory networks. \textsc{AllMem} enables models to effectively scale to ultra-long contexts while mitigating catastrophic forgetting. This approach not only overcomes the representation constraints typical of linear memory models but also significantly reduces the computational and memory footprint during long-sequence inference. Furthermore, we implement a Memory-Efficient Fine-Tuning strategy to replace standard attention layers in pre-trained models with memory-augmented sliding window layers. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning