Mem-T: Densifying Rewards for Long-Horizon Memory Agents
Yanwei Yue, Boci Peng, Xuanbo Fan, Jiaxin Guo, Qiankun Li, Yan Zhang

TL;DR
Mem-T introduces a hierarchical memory management system with a novel reinforcement learning framework, MoT-GRPO, enabling autonomous, efficient, and effective long-horizon memory operations for agents dealing with streaming inputs.
Contribution
The paper presents Mem-T, a new autonomous memory agent with dynamic memory updates and retrieval, and MoT-GRPO, a reinforcement learning method for dense training signals in long-horizon memory tasks.
Findings
Mem-T outperforms existing frameworks like A-Mem and Mem0 by up to 14.92%.
Mem-T reduces inference tokens per query by approximately 24.45% compared to GAM.
Mem-T achieves a favorable accuracy-efficiency trade-off.
Abstract
Memory agents, which depart from predefined memory-processing pipelines by endogenously managing the processing, storage, and retrieval of memories, have garnered increasing attention for their autonomy and adaptability. However, existing training paradigms remain constrained: agents often traverse long-horizon sequences of memory operations before receiving sparse and delayed rewards, which hinders truly end-to-end optimization of memory management policies. To address this limitation, we introduce Mem-T, an autonomous memory agent that interfaces with a lightweight hierarchical memory database to perform dynamic updates and multi-turn retrieval over streaming inputs. To effectively train long-horizon memory management capabilities, we further propose MoT-GRPO, a tree-guided reinforcement learning framework that transforms sparse terminal feedback into dense, step-wise supervision via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques · Reinforcement Learning in Robotics
