Evaluating Long-Horizon Memory for Multi-Party Collaborative Dialogues
Chuanrui Hu, Tong Li, Xingze Gao, Hongda Chen, Yi Bai, Dannong Xu, Tianwei Lin, Xiaohong Li, Yunyun Han, Jian Pei, and Yafeng Deng

TL;DR
This paper introduces EverMemBench, a novel benchmark for evaluating long-term memory in multi-party collaborative dialogues, highlighting current limitations and guiding future development of more capable LLM memory systems.
Contribution
The paper presents EverMemBench, the first benchmark specifically designed for long-horizon multi-party collaborative memory evaluation, with comprehensive multi-dimensional assessment.
Findings
Current systems struggle with multi-hop reasoning in multi-party contexts (26% accuracy).
Temporal reasoning requires explicit version semantics beyond timestamps.
Memory awareness is limited by retrieval methods missing implicit relevance.
Abstract
Long-term conversational memory in practical LLM applications is inherently collaborative: information is produced by multiple participants, scattered across groups and channels, revised over time, and implicitly grounded in roles and social context. Yet there is currently no established benchmark that evaluates memory under interaction patterns resembling real-world deployment, as existing benchmarks largely focus on dyadic or single-topic dialogues. In this paper, we introduce EverMemBench, the first benchmark designed for long-horizon collaborative memory, built from multi-party, multi-group conversations spanning over one million tokens with dense cross-topic interleaving, temporally evolving decisions, and role-conditioned personas. EverMemBench evaluates memory systems using 2400 QA pairs across three dimensions essential for real applications: fine-grained recall, memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Persona Design and Applications
