MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
Yihao Wang, Haoran Xu, Renjie Gu, Yixuan Ye, Xinyi Chen, Xinyu Mu, Yuan Gao, Chunxiao Guo, Peng Wei, Jinjie Gu, Huan Li, Ke Chen, Lidan Shou

TL;DR
MedMemoryBench introduces a realistic benchmarking framework for medical agent memory, emphasizing long-term clinical tracking, dynamic evaluation, and exposing bottlenecks in current architectures for healthcare applications.
Contribution
It presents a novel benchmark with a realistic, long-horizon medical dataset and streaming evaluation protocol, highlighting critical memory saturation issues in existing models.
Findings
Severe bottlenecks identified in mainstream architectures for medical reasoning.
Memory saturation degrades retrieval and reasoning robustness.
Benchmark facilitates development of more robust medical agents.
Abstract
The large-scale deployment of personalized healthcare agents demands memory mechanisms that are exceptionally precise, safe, and capable of long-term clinical tracking. However, existing benchmarks primarily focus on daily open-domain conversations, failing to capture the high-stakes complexity of real-world medical applications. Motivated by the stringent production requirements of an industry-leading health management agent serving tens of millions of active users, we introduce MedMemoryBench. We develop a human-agent collaborative pipeline to synthesize highly realistic, long-horizon medical trajectories based on clinically grounded, synthetic patient archetypes. This process yields a massive, expertly validated dataset comprising approximately 2,000 sessions and 16,000 interaction turns. Crucially, MedMemoryBench departs from traditional static evaluations by pioneering an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
