Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning
Kehao Zhang, Shangtong Gui, Sheng Yang, Wei Chen, Yang Feng

TL;DR
This paper introduces UMA, an end-to-end reinforcement learning framework that unifies memory management and question answering, significantly improving long-horizon reasoning and dynamic learning in long-context tasks.
Contribution
UMA is a novel memory agent that combines a core summary and a structured Memory Bank for proactive memory consolidation, trained end-to-end for long-context reasoning.
Findings
Outperforms baselines on long-horizon reasoning tasks
Effective in dynamic state tracking and evidence aggregation
Maintains competitive performance on standard retrieval benchmarks
Abstract
Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, update, delete, reorganize) over key value entries, enabling proactive consolidation during streaming. To evaluate long-horizon memory behavior, we introduce Ledger-QA, a diagnostic benchmark for continuous state tracking where answers are latent values derived from accumulated updates rather than lo cal span…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
