Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
Subaru Kimura, Ryota Tanaka, Shumpei Miyawaki, Jun Suzuki, Keisuke, Sakaguchi

TL;DR
This paper investigates the security vulnerabilities of large vision-language models, revealing that visual prompt injection can hijack their tasks with a significant success rate, highlighting a critical security concern.
Contribution
It introduces a novel visual prompt injection method called goal hijacking via visual prompt injection (GHVPI) and evaluates its effectiveness on GPT-4V.
Findings
GPT-4V has a 15.8% success rate in GHVPI attacks.
Successful GHVPI requires high character recognition and instruction-following abilities.
VPI poses an unignorable security risk for LVLMs.
Abstract
We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
