Multi-Faceted Attack: Exposing Cross-Model Vulnerabilities in Defense-Equipped Vision-Language Models
Yijun Yang, Lichao Wang, Jianping Zhang, Chi Harold Liu, Lanqing Hong, Qiang Xu

TL;DR
This paper introduces Multi-Faceted Attack (MFA), a systematic framework exposing safety vulnerabilities in defense-equipped vision-language models by bypassing filters and revealing shared visual representation weaknesses, with high success rates.
Contribution
The paper presents MFA, a novel attack framework that effectively exposes safety vulnerabilities in modern VLMs, demonstrating broad transferability and surpassing existing methods.
Findings
MFA achieves a 58.5% success rate in bypassing defenses.
Adversarial images transfer broadly across models, indicating shared vulnerabilities.
MFA outperforms existing attack methods significantly.
Abstract
The growing misuse of Vision-Language Models (VLMs) has led providers to deploy multiple safeguards, including alignment tuning, system prompts, and content moderation. However, the real-world robustness of these defenses against adversarial attacks remains underexplored. We introduce Multi-Faceted Attack (MFA), a framework that systematically exposes general safety vulnerabilities in leading defense-equipped VLMs such as GPT-4o, Gemini-Pro, and Llama-4. The core component of MFA is the Attention-Transfer Attack (ATA), which hides harmful instructions inside a meta task with competing objectives. We provide a theoretical perspective based on reward hacking to explain why this attack succeeds. To improve cross-model transferability, we further introduce a lightweight transfer-enhancement algorithm combined with a simple repetition strategy that jointly bypasses both input-level and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Ethics and Social Impacts of AI
