VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

Pang Liu; Yingjie Lao

arXiv:2605.01449·cs.CR·May 5, 2026

VisInject: Disruption != Injection -- A Dual-Dimension Evaluation of Universal Adversarial Attacks on Vision-Language Models

Pang Liu, Yingjie Lao

PDF

1 Repo

TL;DR

This paper introduces a dual-axis evaluation framework for universal adversarial attacks on vision-language models, distinguishing between influence and precise injection, revealing a significant gap between perceived disturbance and actual injection success.

Contribution

It proposes a novel dual-axis assessment method for adversarial attacks, combining influence detection with injection accuracy, and provides a comprehensive dataset and analysis of attack effectiveness.

Findings

01

Most pairs show influence without successful injection.

02

Zero detectable drift in BLIP-2 at specified perturbation levels.

03

Significant divergence between influence and injection success rates.

Abstract

Universal adversarial attacks on aligned multimodal large language models are increasingly reported with attack success rates in the 60-80% range, suggesting the visual modality is highly vulnerable to imperceptible perturbations as a prompt-injection channel. We argue that this number conflates two distinct events: (i) the model's output was perturbed (Influence), and (ii) the attacker's chosen target concept was actually emitted (Precise Injection). We compose two existing techniques -- Universal Adversarial Attack and AnyAttack -- under an $L_{in f}$ budget of 16/255, and we add a dual-axis evaluation: a deterministic Ratcliff-Obershelp drift score for Influence (programmatic baseline) plus a 4-tier ordinal categorical none/weak/partial/confirmed for Precise Injection. The judge is DeepSeek-V4-Pro in thinking mode, calibrated against Claude Opus 4.7 with Cohen's $κ$ = 0.77 on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.