Augmenting Language Models with Long-Term Memory
Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng, Gao, Furu Wei

TL;DR
This paper introduces LongMem, a framework that enables large language models to memorize and utilize long-term context by decoupling memory retrieval from the core model, significantly enhancing long-context modeling and in-context learning.
Contribution
The paper proposes a novel decoupled architecture with a frozen backbone LLM and an adaptive side-network for long-term memory, enabling unlimited-length context handling and improved long-form memory in language models.
Findings
Outperforms existing long-context models on ChapterBreak benchmark.
Enlarges memory to 65k tokens for in-context learning.
Achieves significant improvements in memory-augmented in-context learning.
Abstract
Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever and reader. Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness. Enhanced with memory-augmented adaptation training, LongMem can thus memorize long past context and use long-term memory for language modeling. The proposed memory retrieval module can handle unlimited-length context in its memory bank to benefit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
