Empirical Analysis of Large Vision-Language Models against Goal   Hijacking via Visual Prompt Injection

Subaru Kimura; Ryota Tanaka; Shumpei Miyawaki; Jun Suzuki; Keisuke; Sakaguchi

arXiv:2408.03554·cs.CL·August 8, 2024

Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection

Subaru Kimura, Ryota Tanaka, Shumpei Miyawaki, Jun Suzuki, Keisuke, Sakaguchi

PDF

Open Access

TL;DR

This paper investigates the security vulnerabilities of large vision-language models, revealing that visual prompt injection can hijack their tasks with a significant success rate, highlighting a critical security concern.

Contribution

It introduces a novel visual prompt injection method called goal hijacking via visual prompt injection (GHVPI) and evaluates its effectiveness on GPT-4V.

Findings

01

GPT-4V has a 15.8% success rate in GHVPI attacks.

02

Successful GHVPI requires high character recognition and instruction-following abilities.

03

VPI poses an unignorable security risk for LVLMs.

Abstract

We explore visual prompt injection (VPI) that maliciously exploits the ability of large vision-language models (LVLMs) to follow instructions drawn onto the input image. We propose a new VPI method, "goal hijacking via visual prompt injection" (GHVPI), that swaps the execution task of LVLMs from an original task to an alternative task designated by an attacker. The quantitative analysis indicates that GPT-4V is vulnerable to the GHVPI and demonstrates a notable attack success rate of 15.8%, which is an unignorable security risk. Our analysis also shows that successful GHVPI requires high character recognition capability and instruction-following ability in LVLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications