Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Yijun Yang; Lichao Wang; Jianping Zhang; Chi Harold Liu; Lanqing Hong; Qiang Xu

arXiv:2511.16110·cs.CR·November 21, 2025

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models

Yijun Yang, Lichao Wang, Jianping Zhang, Chi Harold Liu, Lanqing Hong, Qiang Xu

PDF

Open Access 1 Video

TL;DR

This paper introduces Multi-Faceted Attack (MFA), a systematic framework exposing safety vulnerabilities in defense-equipped vision-language models by bypassing filters and revealing shared visual representation weaknesses, with high success rates.

Contribution

The paper presents MFA, a novel attack framework that effectively exposes safety vulnerabilities in modern VLMs, demonstrating broad transferability and surpassing existing methods.

Findings

01

MFA achieves a 58.5% success rate in bypassing defenses.

02

Adversarial images transfer broadly across models, indicating shared vulnerabilities.

03

MFA outperforms existing attack methods significantly.

Abstract

The growing misuse of Vision-Language Models (VLMs) has led providers to deploy multiple safeguards, including alignment tuning, system prompts, and content moderation. However, the real-world robustness of these defenses against adversarial attacks remains underexplored. We introduce Multi-Faceted Attack (MFA), a framework that systematically exposes general safety vulnerabilities in leading defense-equipped VLMs such as GPT-4o, Gemini-Pro, and Llama-4. The core component of MFA is the Attention-Transfer Attack (ATA), which hides harmful instructions inside a meta task with competing objectives. We provide a theoretical perspective based on reward hacking to explain why this attack succeeds. To improve cross-model transferability, we further introduce a lightweight transfer-enhancement algorithm combined with a simple repetition strategy that jointly bypasses both input-level and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI