Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

Yuanbo Li; Tianyang Xu; Cong Hu; Tao Zhou; Xiao-Jun Wu; Josef Kittler

arXiv:2603.04846·cs.CV·March 24, 2026

Multi-Paradigm Collaborative Adversarial Attack Against Multi-Modal Large Language Models

Yuanbo Li, Tianyang Xu, Cong Hu, Tao Zhou, Xiao-Jun Wu, Josef Kittler

PDF

Open Access

TL;DR

This paper introduces MPCAttack, a novel adversarial attack framework that enhances transferability against multi-modal large language models by collaboratively optimizing features from both visual and textual modalities.

Contribution

It proposes a multi-paradigm collaborative attack method that aggregates visual and textual features for more effective adversarial attacks on MLLMs, surpassing existing methods.

Findings

01

MPCAttack outperforms state-of-the-art attack methods in various benchmarks.

02

The approach effectively balances multi-modal feature importance during optimization.

03

Experimental results show improved attack success rates on multiple MLLMs.

Abstract

The rapid progress of Multi-Modal Large Language Models (MLLMs) has significantly advanced downstream applications. However, this progress also exposes serious transferable adversarial vulnerabilities. In general, existing adversarial attacks against MLLMs typically rely on surrogate models trained within a single learning paradigm and perform independent optimisation in their respective feature spaces. This straightforward setting naturally restricts the richness of feature representations, delivering limits on the search space and thus impeding the diversity of adversarial perturbations. To address this, we propose a novel Multi-Paradigm Collaborative Attack (MPCAttack) framework to boost the transferability of adversarial examples against MLLMs. In principle, MPCAttack aggregates semantic representations, from both visual images and language texts, to facilitate joint adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Topic Modeling