Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models
Yuanbo Li, Tianyang Xu, Cong Hu, Tao Zhou, Xiao-Jun Wu, Josef Kittler

TL;DR
This paper introduces MPCAttack, a novel adversarial attack framework that enhances transferability against multi-modal large language models by collaboratively optimizing features from both visual and textual modalities.
Contribution
It proposes a multi-paradigm collaborative attack method that aggregates visual and textual features for more effective adversarial attacks on MLLMs, surpassing existing methods.
Findings
MPCAttack outperforms state-of-the-art attack methods in various benchmarks.
The approach effectively balances multi-modal feature importance during optimization.
Experimental results show improved attack success rates on multiple MLLMs.
Abstract
The rapid progress of Multi-Modal Large Language Models (MLLMs) has significantly advanced downstream applications. However, this progress also exposes serious transferable adversarial vulnerabilities. In general, existing adversarial attacks against MLLMs typically rely on surrogate models trained within a single learning paradigm and perform independent optimisation in their respective feature spaces. This straightforward setting naturally restricts the richness of feature representations, delivering limits on the search space and thus impeding the diversity of adversarial perturbations. To address this, we propose a novel Multi-Paradigm Collaborative Attack (MPCAttack) framework to boost the transferability of adversarial examples against MLLMs. In principle, MPCAttack aggregates semantic representations, from both visual images and language texts, to facilitate joint adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling
