Adversarial Attacks Against MLLMs via Progressive Resolution Processing and Adaptive Feature Alignment
Haobo Wang, Xiaorong Ma, Weiqi Luo, Xiaojun Jia, Jiwu Huang

TL;DR
This paper introduces PRAF-Attack, a novel targeted transfer-based attack framework for Multimodal Large Language Models that enhances transferability by utilizing multi-scale features and progressive resolution refinement.
Contribution
It proposes an adaptive feature alignment and progressive resolution processing strategy to improve attack transferability against black-box MLLMs, surpassing existing methods.
Findings
PRAF-Attack outperforms seven state-of-the-art baselines in transferability.
The method effectively leverages intermediate-layer features for better robustness.
Progressive resolution processing enhances attack success across multiple scales.
Abstract
Adversarial perturbations can mislead Multimodal Large Language Models (MLLMs) recognize a benign image as a specific target object, posing serious risks in safety-critical scenarios such as autonomous driving and medical diagnosis. This makes transfer-based targeted attacks crucial for understanding and improving black-box MLLM robustness. Existing transfer-based targeted attack methods typically rely on the final global features of the surrogate encoder and anchor optimization to original-resolution target crops, leading to their limited transferability and robustness. To address these challenges, we propose Progressive Resolution Processing and Adaptive Feature Alignment (PRAF-Attack), a targeted transfer-based attack framework that integrates multi-scale global semantic guidance with robust intermediate-layer local alignment. Unlike prior methods that align only the surrogate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
