Visual Memory Injection Attacks for Multi-Turn Conversations
Christian Schlarmann, Matthias Hein

TL;DR
This paper introduces a novel stealthy attack method called Visual Memory Injection (VMI) that manipulates large vision-language models during multi-turn conversations through manipulated images, highlighting security vulnerabilities.
Contribution
The paper presents the first multi-turn attack method on LVLMs using visual memory injection, demonstrating its effectiveness and raising awareness of security issues in long-context models.
Findings
VMI can manipulate LVLM outputs after multiple conversation turns.
The attack remains stealthy under normal prompts.
The method is effective across several open-weight LVLMs.
Abstract
Generative large vision-language models (LVLMs) have recently achieved impressive performance gains, and their user base is growing rapidly. However, the security of LVLMs, in particular in a long-context multi-turn setting, is largely underexplored. In this paper, we consider the realistic scenario in which an attacker uploads a manipulated image to the web/social media. A benign user downloads this image and uses it as input to the LVLM. Our novel stealthy Visual Memory Injection (VMI) attack is designed such that on normal prompts the LVLM exhibits nominal behavior, but once the user gives a triggering prompt, the LVLM outputs a specific prescribed target message to manipulate the user, e.g. for adversarial marketing or political persuasion. Compared to previous work that focused on single-turn attacks, VMI is effective even after a long multi-turn conversation with the user. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
