Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das; Julien Piet; Darya Kaviani; Luca Beurer-Kellner; Florian Tram\`er; David Wagner

arXiv:2605.01970·cs.CR·May 18, 2026

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

Debeshee Das, Julien Piet, Darya Kaviani, Luca Beurer-Kellner, Florian Tram\`er, David Wagner

PDF

TL;DR

Trojan Hippo demonstrates a novel persistent memory attack on LLM agents that plants dormant payloads, which activate during sensitive discussions and exfiltrate data, highlighting security-utility tradeoffs in defenses.

Contribution

This work systematically evaluates Trojan Hippo attacks across diverse memory architectures and defenses, introducing a dynamic evaluation framework and capability-aware analysis.

Findings

01

Achieves 85-100% attack success rate against top models

02

Defenses can reduce success rates to 0-5%

03

Security-utility tradeoff varies significantly with defense deployment

Abstract

Memory systems enable otherwise-stateless LLM agents to persist user information across sessions, but also introduce a new attack surface. We characterize the Trojan Hippo attack, a class of persistent memory attacks that operates in a more realistic threat model than prior memory poisoning work: the attacker plants a dormant payload into an agent's long-term memory via a single untrusted tool call (e.g., a crafted email), which activates only when the user later discusses sensitive topics such as finance, health, or identity, and exfiltrates high-value personal data to the attacker. While anecdotal demonstrations of such attacks have appeared against deployed systems, no prior work systematically evaluates them across heterogeneous memory architectures and defenses. We introduce a dynamic evaluation framework comprising two components: (1) an OpenEvolve-based adaptive red-teaming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.