TL;DR
This paper introduces a diagnostic framework for analyzing memory management in LLM agents, revealing that retrieval quality impacts performance more than write strategies, with raw chunk storage being surprisingly effective.
Contribution
It provides a systematic analysis of how write and retrieval strategies affect LLM agent memory performance, highlighting the dominance of retrieval methods.
Findings
Retrieval method significantly impacts accuracy, with a 20-point variation.
Raw chunked storage performs as well or better than more complex methods.
Performance issues mostly arise during retrieval rather than utilization.
Abstract
Memory-augmented LLM agents store and retrieve information from prior interactions, yet the relative importance of how memories are written versus how they are retrieved remains unclear. We introduce a diagnostic framework that analyzes how performance differences manifest across write strategies, retrieval methods, and memory utilization behavior, and apply it to a 3x3 study crossing three write strategies (raw chunks, Mem0-style fact extraction, MemGPT-style summarization) with three retrieval methods (cosine, BM25, hybrid reranking). On LoCoMo, retrieval method is the dominant factor: average accuracy spans 20 points across retrieval methods (57.1% to 77.2%) but only 3-8 points across write strategies. Raw chunked storage, which requires zero LLM calls, matches or outperforms expensive lossy alternatives, suggesting that current memory pipelines may discard useful context that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
