MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal   Assistants

Zeyu Zhang; Quanyu Dai; Luyu Chen; Zeren Jiang; Rui Li; Jieming Zhu,; Xu Chen; Yi Xie; Zhenhua Dong; Ji-Rong Wen

arXiv:2409.20163·cs.AI·October 1, 2024

MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants

Zeyu Zhang, Quanyu Dai, Luyu Chen, Zeren Jiang, Rui Li, Jieming Zhu,, Xu Chen, Yi Xie, Zhenhua Dong, Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

MemSim is a Bayesian simulation framework that automatically generates reliable evaluation datasets to objectively assess the memory capabilities of LLM-based personal assistants, addressing a key challenge in the field.

Contribution

We introduce MemSim, a Bayesian simulator with a causal generation mechanism for automatic, scalable, and reliable memory evaluation of LLM-based agents.

Findings

01

MemSim effectively generates diverse evaluation datasets.

02

The MemDaily dataset enables benchmarking of memory mechanisms.

03

Our experiments demonstrate the simulator's reliability in assessing memory performance.

Abstract

LLM-based agents have been widely applied as personal assistants, capable of memorizing information from user messages and responding to personal queries. However, there still lacks an objective and automatic evaluation on their memory capability, largely due to the challenges in constructing reliable questions and answers (QAs) according to user messages. In this paper, we propose MemSim, a Bayesian simulator designed to automatically construct reliable QAs from generated user messages, simultaneously keeping their diversity and scalability. Specifically, we introduce the Bayesian Relation Network (BRNet) and a causal generation mechanism to mitigate the impact of LLM hallucinations on factual information, facilitating the automatic creation of an evaluation dataset. Based on MemSim, we generate a dataset in the daily-life scenario, named MemDaily, and conduct extensive experiments to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nuster1128/memsim
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Business Process Modeling and Analysis · Multi-Agent Systems and Negotiation