Structured Memory Mechanisms for Stable Context Representation in Large Language Models
Yue Xing, Tao Yang, Yijiashun Qi, Minggu Wei, Yu Cheng, Honghui Xin

TL;DR
This paper introduces a memory-augmented architecture for large language models that enhances long-term context understanding, retention, and retrieval, leading to improved performance in long-text and multi-turn tasks.
Contribution
It proposes a novel memory mechanism with explicit units, gating, attention, and a dynamic forgetting function, combined with a joint training objective for better memory management.
Findings
Improves text generation consistency and stability in multi-turn dialogues.
Enhances accuracy in cross-context reasoning tasks.
Mitigates semantic drift and context loss in long-text processing.
Abstract
This paper addresses the limitations of large language models in understanding long-term context. It proposes a model architecture equipped with a long-term memory mechanism to improve the retention and retrieval of semantic information across paragraphs and dialogue turns. The model integrates explicit memory units, gated writing mechanisms, and attention-based reading modules. A forgetting function is introduced to enable dynamic updates of memory content, enhancing the model's ability to manage historical information. To further improve the effectiveness of memory operations, the study designs a joint training objective. This combines the main task loss with constraints on memory writing and forgetting. It guides the model to learn better memory strategies during task execution. Systematic evaluation across multiple subtasks shows that the model achieves clear advantages in text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
