Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Jiaqi Cao, Jiarui Wang, Rubin Wei, Qipeng Guo, Kai Chen, Bowen Zhou, Zhouhan Lin

TL;DR
Memory Decoder is a pretrained, plug-and-play memory module that enables efficient domain adaptation for large language models without retraining the entire model, improving performance in specialized domains.
Contribution
Introduces Memory Decoder, a novel pretrained memory component that allows seamless, efficient domain adaptation for large language models without altering their original parameters.
Findings
Reduces perplexity by an average of 6.17 points across domains.
Enables effective adaptation of Qwen and Llama models to biomedicine, finance, and law.
Does not require model-specific modifications or costly retraining.
Abstract
Large Language Models (LLMs) have shown strong abilities in general language tasks, yet adapting them to specific domains remains a challenge. Current method like Domain Adaptive Pretraining (DAPT) requires costly full-parameter training and suffers from catastrophic forgetting. Meanwhile, Retrieval-Augmented Generation (RAG) introduces substantial inference latency due to expensive nearest-neighbor searches and longer context. This paper introduces Memory Decoder, a plug-and-play pretrained memory that enables efficient domain adaptation without changing the original model's parameters. Memory Decoder employs a small transformer decoder that learns to imitate the behavior of an external non-parametric retriever. Once trained, Memory Decoder can be seamlessly integrated with any pretrained language model that shares the same tokenizer, requiring no model-specific modifications.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Clover-Hill/MemoryDecoder-gpt2-smallmodel· 260 dl· ♡ 3260 dl♡ 3
- 🤗Clover-Hill/gpt2-xl-finetuned-wikitext103model· 57 dl57 dl
- 🤗Clover-Hill/MemoryDecoder-Qwen-biomedmodel· 58 dl58 dl
- 🤗Clover-Hill/MemoryDecoder-Qwen-financemodel· 15 dl15 dl
- 🤗Clover-Hill/MemoryDecoder-Qwen-lawmodel
- 🤗Clover-Hill/MemoryDecoder-Llama-biomedmodel· 1 dl1 dl
- 🤗Clover-Hill/MemoryDecoder-Llama-lawmodel
- 🤗Clover-Hill/MemoryDecoder-Llama-financemodel
Videos
