Dynamic Mixture of Latent Memories for Self-Evolving Agents

Dianzhi Yu; Vireo Zhang; Hongru Wang; Yanyu Chen; Minda Hu; Wanghan Xu; Siki Chen; Philip Torr; Zhenfei Yin; Irwin King

arXiv:2605.21951·cs.LG·May 22, 2026

Dynamic Mixture of Latent Memories for Self-Evolving Agents

Dianzhi Yu, Vireo Zhang, Hongru Wang, Yanyu Chen, Minda Hu, Wanghan Xu, Siki Chen, Philip Torr, Zhenfei Yin, Irwin King

PDF

TL;DR

The paper introduces MoLEM, a dynamic mixture-of-experts framework for self-evolving agents that continually acquire knowledge without forgetting, demonstrated across math, science, and code tasks.

Contribution

MoLEM is a novel generative mixture of latent memory approach that enables continual learning without catastrophic forgetting by internalizing knowledge into additional modules.

Findings

01

MoLEM improves average accuracy by 10.40% over baseline.

02

It effectively preserves knowledge across multiple training stages.

03

Competing methods do not consistently outperform the baseline.

Abstract

Achieving self-evolution in intelligent agents requires the continual accumulation of new knowledge across changing task sequences without forgetting previously acquired abilities. Existing approaches either internalize knowledge by updating model parameters, which induces catastrophic forgetting, or rely on external memory, which fails to genuinely enhance the model's intrinsic capabilities. We propose MoLEM, a generative mixture of latent memory framework based on a dynamic mixture-of-experts (MoE). We treat multiple experts as independent carriers to generate memory. A router selects and weights experts through key-query matching, and the aggregated latent memory is injected into the reasoning process. The base model for reasoning remains entirely frozen, with all experiential knowledge internalized into the additional modules, avoiding catastrophic forgetting. For continual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.