Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Jiaqi Cao; Jiarui Wang; Rubin Wei; Qipeng Guo; Kai Chen; Bowen Zhou; Zhouhan Lin

arXiv:2508.09874·cs.CL·October 24, 2025

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen, Bowen Zhou, Zhouhan Lin

PDF

8 Models 1 Datasets 1 Video

TL;DR

Memory Decoder is a pretrained, plug-and-play memory module that enables efficient domain adaptation for large language models without retraining the entire model, improving performance in specialized domains.

Contribution

Introduces Memory Decoder, a novel pretrained memory component that allows seamless, efficient domain adaptation for large language models without altering their original parameters.

Findings

01

Reduces perplexity by an average of 6.17 points across domains.

02

Enables effective adaptation of Qwen and Llama models to biomedicine, finance, and law.

03

Does not require model-specific modifications or costly retraining.

Abstract

Large Language Models (LLMs) have shown strong abilities in general language tasks, yet adapting them to specific domains remains a challenge. Current method like Domain Adaptive Pretraining (DAPT) requires costly full-parameter training and suffers from catastrophic forgetting. Meanwhile, Retrieval-Augmented Generation (RAG) introduces substantial inference latency due to expensive nearest-neighbor searches and longer context. This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters. Memory Decoder employs a small transformer decoder that learns to imitate the behavior of an external non-parametric retriever. Once trained, Memory Decoder can be seamlessly integrated with any pretrained language model that shares the same tokenizer, requiring no model-specific modifications.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Clover-Hill/MemoryDecoder-domain-data
dataset· 92 dl
92 dl

Videos

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models· slideslive