Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

TL;DR
This paper introduces ImageProtector, a method for protecting images from analysis by multi-modal large language models through imperceptible visual perturbations that induce refusal responses.
Contribution
It proposes a proactive, user-side visual prompt injection technique to prevent MLLMs from extracting sensitive information from images.
Findings
ImageProtector effectively induces refusal responses in six MLLMs across four datasets.
Countermeasures like Gaussian noise, DiffPure, and adversarial training partially mitigate ImageProtector.
Mitigation methods often degrade model accuracy and efficiency.
Abstract
Multi-modal large language models (MLLMs) have emerged as powerful tools for analyzing Internet-scale image data, offering significant benefits but also raising critical safety and societal concerns. In particular, open-weight MLLMs may be misused to extract sensitive information from personal images at scale, such as identities, locations, or other private details. In this work, we propose ImageProtector, a user-side method that proactively protects images before sharing by embedding a carefully crafted, nearly imperceptible perturbation that acts as a visual prompt injection attack on MLLMs. As a result, when an adversary analyzes a protected image with an MLLM, the MLLM is consistently induced to generate a refusal response such as "I'm sorry, I can't help with that request." We empirically demonstrate the effectiveness of ImageProtector across six MLLMs and four datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
