Hidden in Memory: Sleeper Memory Poisoning in LLM Agents
Sidharth Pulipaka, Stanislau Hlebik, Leonidas Raghav, Sahar Abdelnabi, Vyas Raina, Ivaxi Sheth, Mario Fritz

TL;DR
This paper introduces and evaluates sleeper memory poisoning, a novel security threat where adversaries manipulate persistent memory in LLM assistants to influence future interactions, demonstrating high success rates across models.
Contribution
It presents the concept of sleeper memory poisoning, analyzes its effectiveness, and highlights its potential as a long-term attack surface in stateful LLM agents.
Findings
Poisoned memories were added up to 99.8% on GPT-5.5 and 95% on Kimi-K2.6.
Poisoned memories caused attacker-intended actions in 60-89% of cases.
The attack can remain dormant and re-emerge across multiple conversations.
Abstract
Large language models are increasingly augmented with persistent memory, allowing assistants to store user-specific information across sessions for personalization and continuity. This statefulness introduces a new security risk: adversarial content can corrupt what an assistant remembers and thereby influence future interactions. We propose and study sleeper memory poisoning, a delayed attack in which an adversary manipulates external context, such as a document, webpage, or repository, to cause the assistant to store a fabricated memory about the user. Unlike conventional prompt injection, the attack can remain dormant and re-emerge across multiple later conversations. We evaluate the full attack pipeline: whether poisoned memories are written, later retrieved, and ultimately used to steer the following conversations. Across stateful LLM assistants, poisoned memories were added up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
