Universal Adversarial Attacks against Closed-Source MLLMs via Target-View Routed Meta Optimization
Hui Lu, Yi Yu, Yiming Yang, Chenyu Yi, Xueyi Ke, Qixing Zhang, Bingquan Shen, Alex Kot, Xudong Jiang

TL;DR
This paper introduces MCRMO-Attack, a novel method for universal targeted adversarial attacks on closed-source multimodal large language models, improving attack success rates across unknown models.
Contribution
It proposes a new meta-optimization framework with multi-crop aggregation, token routing, and cross-target prior learning to enhance universal attack effectiveness.
Findings
Boosts attack success rate by +23.7% on GPT-4o
Achieves +19.9% improvement on Gemini-2.0
Outperforms existing universal attack baselines
Abstract
Targeted adversarial attacks on closed-source multimodal large language models (MLLMs) have been increasingly explored under black-box transfer, yet prior methods are predominantly sample-specific and offer limited reusability across inputs. We instead study a more stringent setting, Universal Targeted Transferable Adversarial Attacks (UTTAA), where a single perturbation must consistently steer arbitrary inputs toward a specified target across unknown commercial MLLMs. Naively adapting existing sample-wise attacks to this universal setting faces three core difficulties: (i) target supervision becomes high-variance due to target-crop randomness, (ii) token-wise matching is unreliable because universality suppresses image-specific cues that would otherwise anchor alignment, and (iii) few-source per-target adaptation is highly initialization-sensitive, which can degrade the attainable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
