NGM: A Plug-and-Play Training-Free Memory Module for LLMs

Yuwen Qu; Wenhui Dong; Chenyang Si; Caifeng Shan

arXiv:2605.16893·cs.AI·May 19, 2026

NGM: A Plug-and-Play Training-Free Memory Module for LLMs

Yuwen Qu, Wenhui Dong, Chenyang Si, Caifeng Shan

PDF

1 Repo

TL;DR

NGM is a training-free, plug-and-play memory module for LLMs that enhances knowledge retrieval and task performance without additional training or complex retrieval pipelines.

Contribution

The paper introduces NGM, a novel memory module that directly constructs N-gram representations from pretrained embeddings, eliminating the need for training or extra memory components.

Findings

01

NGM improves average performance by 0.5 to 1.2 points across benchmarks.

02

Significant gains on code generation and knowledge-intensive tasks.

03

Enhances multimodal benchmark performance.

Abstract

Recent studies introduce conditional memory modules that decouple knowledge storage from neural computation, enabling more direct knowledge access. Compared to MoE, which relies on dynamic computation paths, explicit lookup provides a more efficient knowledge retrieval mechanism. However, these approaches still depend on learned memory embeddings, requiring additional training and limiting flexibility. To address this, we propose N-gram Memory (NGM), a training-free, plug-and-play module composed of a Causal N-Gram Encoder and a Cosine-Gated Memory Injector. The Causal N-Gram Encoder directly averages the pretrained token embeddings of the backbone model to construct N-gram representations, thereby eliminating the need to train separate N-gram embeddings from scratch. This design requires neither an additional memory table nor a retrieval pipeline. The Cosine-Gated Memory Injector then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pioneerqyw/NGM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.