Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Balachandra Devarangadi Sunil, Isheeta Sinha, Piyush Maheshwari, Shantanu Todmal, Shreyan Mallik, Shuchi Mishra

TL;DR
This paper systematically evaluates memory poisoning attacks on memory-augmented LLM agents in realistic settings and proposes two novel defense mechanisms, highlighting the importance of trust calibration for effective protection.
Contribution
It provides the first empirical analysis of attack robustness under realistic conditions and introduces two defense strategies with detailed evaluation and insights.
Findings
Realistic conditions reduce attack success rates significantly.
Memory sanitization effectiveness depends on trust threshold calibration.
Proposed defenses improve robustness against memory poisoning attacks.
Abstract
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However, the robustness of these attacks in realistic deployments and effective defensive mechanisms remain understudied. This work addresses these gaps through systematic empirical evaluation of memory poisoning attacks and defenses in Electronic Health Record (EHR) agents. We investigate attack robustness by varying three critical dimensions: initial memory state, number of indication prompts, and retrieval parameters. Our experiments on GPT-4o-mini, Gemini-2.0-Flash and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices
