Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language   Model Guardrails

Yijun Yang; Lichao Wang; Xiao Yang; Lanqing Hong; Jun Zhu

arXiv:2502.05772·cs.CV·February 11, 2025

Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails

Yijun Yang, Lichao Wang, Xiao Yang, Lanqing Hong, Jun Zhu

PDF

Open Access

TL;DR

This paper introduces MultiFaceted Attack, a comprehensive black-box method that effectively bypasses multi-layered safety defenses in vision large language models, exposing significant vulnerabilities.

Contribution

It presents a novel multi-faceted attack framework that systematically breaches safety mechanisms in VLLMs, demonstrating high success rates against commercial models.

Findings

01

Achieves 61.56% attack success rate on eight VLLMs

02

Surpasses state-of-the-art methods by at least 42.18%

03

Effectively exploits multimodal and alignment vulnerabilities

Abstract

Vision Large Language Models (VLLMs) integrate visual data processing, expanding their real-world applications, but also increasing the risk of generating unsafe responses. In response, leading companies have implemented Multi-Layered safety defenses, including alignment training, safety system prompts, and content moderation. However, their effectiveness against sophisticated adversarial attacks remains largely unexplored. In this paper, we propose MultiFaceted Attack, a novel attack framework designed to systematically bypass Multi-Layered Defenses in VLLMs. It comprises three complementary attack facets: Visual Attack that exploits the multimodal nature of VLLMs to inject toxic system prompts through images; Alignment Breaking Attack that manipulates the model's alignment mechanism to prioritize the generation of contrasting responses; and Adversarial Signature that deceives content…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning