ER-MIA: Black-Box Adversarial Memory Injection Attacks on Long-Term Memory-Augmented Large Language Models
Mitchell Piehl, Zhaohan Xi, Zuobin Xiong, Pan He, Muchao Ye

TL;DR
This paper systematically studies black-box adversarial attacks on long-term memory-augmented large language models, revealing a fundamental vulnerability in their similarity-based retrieval mechanisms that poses security risks.
Contribution
It introduces ER-MIA, a unified framework for black-box adversarial memory injection attacks targeting LLMs' retrieval systems, with extensive experimental validation.
Findings
High success rates of attacks under minimal assumptions
Vulnerability persists across different memory designs
Security risks are significant and widespread
Abstract
Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because memory provides extra attack surfaces. In this paper, we present the first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs. We introduce ER-MIA, a unified framework that exposes this vulnerability and formalizes two realistic attack settings: content-based attacks and question-targeted attacks. In these settings, ER-MIA includes an arsenal of composable attack primitives and ensemble attacks that achieve high success rates under minimal attacker assumptions. Extensive experiments across multiple LLMs and long-term memory systems demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks
