G-MemLLM: Gated Latent Memory Augmentation for Long-Context Reasoning in Large Language Models
Xun Xu

TL;DR
G-MemLLM introduces a gated latent memory system for large language models, significantly improving long-context reasoning and factual consistency across various model scales and benchmarks.
Contribution
The paper proposes G-MemLLM, a novel memory-augmented architecture with a gated update mechanism, enhancing long-term knowledge retention in LLMs.
Findings
13.3% accuracy boost on ZsRE for Llama 3.1-8B
8.56 point increase in Answer F1 on HotpotQA for GPT-2
6.89 point increase in Supporting Fact F1 on HotpotQA for Llama 3.1-8B
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, yet they remain constrained by the finite capacity of their context windows and the inherent difficulty of maintaining long-term factual consistency during multi-hop reasoning. While existing methods utilize context compression or recurrent tokens, they often suffer from ``context rot'' or the dilution of information over long horizons. In this paper, we propose \textbf{G-MemLLM}, a memory-augmented architecture that integrates a frozen LLM backbone with a trainable \textbf{Latent Memory Bank}. Our key innovation is a GRU-style gated update logic that allows the model to selectively update, preserve, or overwrite latent memory slots, preventing the vanishing gradients of knowledge common in recurrent systems. We evaluate G-MemLLM across scales, from GPT-2 (124M) to Llama 3.1 (8B),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
