Adversarial-Guided Diffusion for Multimodal LLM Attacks
Chengwei Xia, Fan Ma, Ruijie Quan, Kun Zhan, Yi Yang

TL;DR
This paper introduces an adversarial-guided diffusion method that effectively generates targeted adversarial images for multimodal large language models, leveraging noise injection in diffusion processes to enhance attack success and robustness against defenses.
Contribution
The paper proposes a novel adversarial-guided diffusion approach that embeds target semantics into the noise component, improving attack efficacy and robustness against defenses in multimodal LLMs.
Findings
AGD outperforms state-of-the-art attack methods.
AGD demonstrates increased robustness to defenses like low-pass filtering.
Extensive experiments validate the effectiveness of AGD in various scenarios.
Abstract
This paper addresses the challenge of generating adversarial image using a diffusion model to deceive multimodal large language models (MLLMs) into generating the targeted responses, while avoiding significant distortion of the clean image. To address the above challenges, we propose an adversarial-guided diffusion (AGD) approach for adversarial attack MLLMs. We introduce adversarial-guided noise to ensure attack efficacy. A key observation in our design is that, unlike most traditional adversarial attacks which embed high-frequency perturbations directly into the clean image, AGD injects target semantics into the noise component of the reverse diffusion. Since the added noise in a diffusion model spans the entire frequency spectrum, the adversarial signal embedded within it also inherits this full-spectrum property. Importantly, during reverse diffusion, the adversarial image is formed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
