Transferable Adversarial Attacks on Black-Box Vision-Language Models

Kai Hu; Weichen Yu; Li Zhang; Alexander Robey; Andy Zou; Chengming Xu,; Haoqi Hu; Matt Fredrikson

arXiv:2505.01050·cs.CV·May 5, 2025

Transferable Adversarial Attacks on Black-Box Vision-Language Models

Kai Hu, Weichen Yu, Li Zhang, Alexander Robey, Andy Zou, Chengming Xu,, Haoqi Hu, Matt Fredrikson

PDF

Open Access

TL;DR

This paper demonstrates that black-box vision-language models like GPT-4o, Claude, and Gemini are highly vulnerable to transferable targeted adversarial attacks, which can manipulate their visual interpretations and responses.

Contribution

It provides a comprehensive analysis of adversarial transferability to proprietary VLLMs and introduces universal perturbations that can induce specific misinterpretations across multiple models.

Findings

01

Targeted adversarial examples transfer effectively to proprietary VLLMs.

02

Universal perturbations can induce consistent misinterpretations across models.

03

Vulnerabilities are prevalent in state-of-the-art vision-language models.

Abstract

Vision Large Language Models (VLLMs) are increasingly deployed to offer advanced capabilities on inputs comprising both text and images. While prior research has shown that adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts, the extent and effectiveness of such vulnerabilities remain underexplored for VLLMs. We present a comprehensive analysis demonstrating that targeted adversarial examples are highly transferable to widely-used proprietary VLLMs such as GPT-4o, Claude, and Gemini. We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information, such as misinterpreting hazardous content as safe, overlooking sensitive or restricted material, or generating detailed incorrect responses aligned with the attacker's intent. Furthermore, we discover that universal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training