Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning

Kehao Zhang; Shangtong Gui; Sheng Yang; Wei Chen; Yang Feng

arXiv:2602.18493·cs.LG·February 24, 2026

Learning to Remember: End-to-End Training of Memory Agents for Long-Context Reasoning

Kehao Zhang, Shangtong Gui, Sheng Yang, Wei Chen, Yang Feng

PDF

Open Access

TL;DR

This paper introduces UMA, an end-to-end reinforcement learning framework that unifies memory management and question answering, significantly improving long-horizon reasoning and dynamic learning in long-context tasks.

Contribution

UMA is a novel memory agent that combines a core summary and a structured Memory Bank for proactive memory consolidation, trained end-to-end for long-context reasoning.

Findings

01

Outperforms baselines on long-horizon reasoning tasks

02

Effective in dynamic state tracking and evidence aggregation

03

Maintains competitive performance on standard retrieval benchmarks

Abstract

Long-context LLMs and Retrieval-Augmented Generation (RAG) systems process information passively, deferring state tracking, contradiction resolution, and evidence aggregation to query time, which becomes brittle under ultra long streams with frequent updates. We propose the Unified Memory Agent (UMA), an end-to-end reinforcement learning framework that unifies memory operations and question answering within a single policy. UMA maintains a dual memory representation: a compact core summary for global context and a structured Memory Bank that supports explicit CRUD (create, update, delete, reorganize) over key value entries, enabling proactive consolidation during streaming. To evaluate long-horizon memory behavior, we introduce Ledger-QA, a diagnostic benchmark for continuous state tracking where answers are latent values derived from accumulated updates rather than lo cal span…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning