GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

Jingbo Yang; Kwei-Herng Lai; Xiaowen Wang; Shiyu Chang; Yaar Harari; Evgeniy Gabrilovich

arXiv:2605.14498·cs.CL·May 19, 2026

GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

Jingbo Yang, Kwei-Herng Lai, Xiaowen Wang, Shiyu Chang, Yaar Harari, Evgeniy Gabrilovich

PDF

TL;DR

This paper introduces GroupMemBench, a comprehensive benchmark for evaluating memory systems of LLM agents in multi-party conversations, revealing significant gaps in current memory capabilities.

Contribution

It presents a novel benchmark that captures group dynamics, speaker-grounded belief tracking, and audience-adapted language, filling gaps in existing single-user focused benchmarks.

Findings

01

Leading memory systems achieve only 46.0% accuracy

02

Knowledge update accuracy is 27.1%

03

A simple BM25 baseline outperforms most memory systems

Abstract

Large Language Model (LLM) agents increasingly serve as personal assistants and workplace collaborators, where their utility depends on memory systems that extract, retrieve, and apply information across long-running conversations. However, both existing memory systems and benchmarks are built around the dyadic, single-user setup, even though real deployments routinely span groups and channels with multiple users interacting with the agent and with each other. This mismatch leaves three properties of group memory unmeasured: (i) group dynamics that go beyond concatenated one-on-one chats, (ii) speaker-grounded belief tracking, where the per-user memory modeling is needed, and (iii) audience-adapted language, where Theory-of-Mind shifts produce role-specific vocabulary. We introduce GroupMemBench, a benchmark that exposes all three. A graph-grounded synthesis pipeline produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.