HaluMem: Evaluating Hallucinations in Memory Systems of Agents
Ding Chen, Simin Niu, Kehang Li, Peng Liu, Xiangping Zheng, Bo Tang, Xinchi Li, Feiyu Xiong, Zhiyu Li

TL;DR
This paper introduces HaluMem, a benchmark for evaluating hallucinations in memory systems of AI agents across different operational stages, revealing that hallucinations mainly occur during extraction and updating, affecting overall reliability.
Contribution
HaluMem is the first operation-level benchmark for memory hallucinations, providing datasets and evaluation tasks to localize hallucination sources in memory systems.
Findings
Hallucinations mainly occur during memory extraction and updating stages.
Existing memory systems tend to generate and accumulate hallucinations.
Hallucinations propagate errors to the question answering stage.
Abstract
Memory systems are key components that enable AI systems such as LLMs and AI agents to achieve long-term learning and sustained interaction. However, during memory storage and retrieval, these systems frequently exhibit memory hallucinations, including fabrication, errors, conflicts, and omissions. Existing evaluations of memory hallucinations are primarily end-to-end question answering, which makes it difficult to localize the operational stage within the memory system where hallucinations arise. To address this, we introduce the Hallucination in Memory Benchmark (HaluMem), the first operation level hallucination evaluation benchmark tailored to memory systems. HaluMem defines three evaluation tasks (memory extraction, memory updating, and memory question answering) to comprehensively reveal hallucination behaviors across different operational stages of interaction. To support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Personal Information Management and User Behavior · Cognitive Functions and Memory
