Useful Memories Become Faulty When Continuously Updated by LLMs
Dylan Zhang, Yanshan Lin, Zhengkun Wu, Yihang Sun, Bingxuan Li, Dianqi Li, Hao Peng

TL;DR
This paper investigates how continuous memory consolidation by LLMs can degrade memory accuracy, highlighting the importance of raw episodic data and controlled consolidation for reliable agent memory.
Contribution
It reveals that current LLM-based memory consolidation often introduces faults, and proposes that raw episodes should be preserved and consolidation should be explicitly managed.
Findings
Memory utility peaks then degrades with consolidation.
GPT-5.4 fails on 54% of previously solved problems after consolidation.
Preserving raw episodes and controlling consolidation improves accuracy.
Abstract
Learning from past experience benefits from two complementary forms of memory: episodic traces -- raw trajectories of what happened -- and consolidated abstractions distilled across many episodes into reusable, schema-like lessons. Recent agentic-memory systems pursue the consolidated form: an LLM rewrites past trajectories into a textual memory bank that it continuously updates with new interactions, promising self-improving agents without parameter updates. Yet we find that such consolidated memories produced by today's LLMs are often faulty even when derived from useful experiences. As consolidation proceeds, memory utility first rises, then degrades, and can fall below the no-memory baseline. More surprisingly, even when consolidating from ground-truth solutions, GPT-5.4 fails on 54% of a set of ARC-AGI problems it had previously solved without memory. We trace the regression to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
