TL;DR
MemPO introduces a self-managing memory policy for long-horizon agents, enabling autonomous memory summarization and management to improve performance and reduce token usage.
Contribution
It presents MemPO, a novel algorithm allowing agents to autonomously optimize their memory management aligned with task objectives.
Findings
MemPO achieves a 25.98% F1 score improvement over the base model.
Reduces token consumption by approximately 70%.
Outperforms previous state-of-the-art methods in experiments.
Abstract
Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades the performance and stability. Existing methods typically introduce the external memory module and look up the relevant information from the stored memory, which prevents the model itself from proactively managing its memory content and aligning with the agent's overarching task objectives. To address these limitations, we propose the self-memory policy optimization algorithm (MemPO), which enables the agent (policy model) to autonomously summarize and manage their memory during interaction with environment. By improving the credit assignment mechanism based on memory effectiveness, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments and analyses confirm that MemPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
