TL;DR
NGM is a training-free, plug-and-play memory module for LLMs that enhances knowledge retrieval and task performance without additional training or complex retrieval pipelines.
Contribution
The paper introduces NGM, a novel memory module that directly constructs N-gram representations from pretrained embeddings, eliminating the need for training or extra memory components.
Findings
NGM improves average performance by 0.5 to 1.2 points across benchmarks.
Significant gains on code generation and knowledge-intensive tasks.
Enhances multimodal benchmark performance.
Abstract
Recent studies introduce conditional memory modules that decouple knowledge storage from neural computation, enabling more direct knowledge access. Compared to MoE, which relies on dynamic computation paths, explicit lookup provides a more efficient knowledge retrieval mechanism. However, these approaches still depend on learned memory embeddings, requiring additional training and limiting flexibility. To address this, we propose N-gram Memory (NGM), a training-free, plug-and-play module composed of a Causal N-Gram Encoder and a Cosine-Gated Memory Injector. The Causal N-Gram Encoder directly averages the pretrained token embeddings of the backbone model to construct N-gram representations, thereby eliminating the need to train separate N-gram embeddings from scratch. This design requires neither an additional memory table nor a retrieval pipeline. The Cosine-Gated Memory Injector then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
