MemEvoBench: Benchmarking Safety Risks from Memory Misevolution in LLM Agents
Weiwei Xie, Shaoxiong Guo, Fan Zhang, Tian Xia, Xue Yang, Lizhuang Ma, Junchi Yan, Qibing Ren

TL;DR
MemEvoBench is a new benchmark designed to evaluate the safety risks associated with memory misevolution in LLM agents, focusing on adversarial, noisy, and biased memory influences over long-term interactions.
Contribution
This paper introduces MemEvoBench, the first comprehensive framework for assessing long-horizon memory safety risks in LLM agents across multiple domains and risk types.
Findings
Memory misevolution significantly degrades safety in LLM agents.
Static prompt defenses are ineffective against memory-related safety risks.
Memory biases can cause substantial behavioral drift in LLM agents.
Abstract
Equipping Large Language Models (LLMs) with persistent memory enhances interaction continuity and personalization but introduces new safety risks. Specifically, contaminated or biased memory accumulation can trigger abnormal agent behaviors. Existing evaluation methods have not yet established a standardized framework for measuring memory misevolution. This phenomenon refers to the gradual behavioral drift resulting from repeated exposure to misleading information. To address this gap, we introduce MemEvoBench, the first benchmark evaluating long-horizon memory safety in LLM agents against adversarial memory injection, noisy tool outputs, and biased feedback. The framework consists of QA-style tasks across 7 domains and 36 risk types, complemented by workflow-style tasks adapted from 20 Agent-SafetyBench environments with noisy tool returns. Both settings employ mixed benign and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
