MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios

Yihang Ding; Wanke Xia; Yiting Zhao; Jinbo Su; Jialiang Yang; Zhengbo Zhang; Ke Wang; Wenming Yang

arXiv:2604.14158·cs.CL·April 17, 2026

MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios

Yihang Ding, Wanke Xia, Yiting Zhao, Jinbo Su, Jialiang Yang, Zhengbo Zhang, Ke Wang, Wenming Yang

PDF

TL;DR

MemGround introduces a comprehensive benchmark for evaluating long-term memory in large language models within gamified, interactive scenarios, addressing limitations of static evaluation methods.

Contribution

It proposes a hierarchical framework and multi-dimensional metrics for assessing dynamic memory capabilities in LLMs during complex interactions.

Findings

01

State-of-the-art LLMs struggle with sustained dynamic tracking.

02

Models have difficulty with temporal event association.

03

Complex reasoning from long-term evidence remains challenging.

Abstract

Current evaluations of long-term memory in LLMs are fundamentally static. By fixating on simple retrieval and short-context inference, they neglect the multifaceted nature of complex memory systems, such as dynamic state tracking and hierarchical reasoning in continuous interactions. To overcome these limitations, we propose MemGround, a rigorous long-term memory benchmark natively grounded in rich, gamified interactive scenarios. To systematically assess these capabilities, MemGround introduces a three-tier hierarchical framework that evaluates Surface State Memory, Temporal Associative Memory, and Reasoning-Based Memory through specialized interactive tasks. Furthermore, to comprehensively quantify both memory utilization and behavioral trajectories, we propose a multi-dimensional metric suite comprising Question-Answer Score (QA Overall), Memory Fragments Unlocked (MFU), Memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.