Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Xiaojun Jia, Sensen Gao, Qing Guo, Ke Ma, Yihao Huang, Simeng Qin,, Yang Liu, Ivor Tsang Fellow, and Xiaochun Cao

TL;DR
This paper introduces a novel adversarial attack method for vision-language models that enhances transferability by using an adversarial evolution triangle and a semantic-aligned feature space, leading to more effective multimodal adversarial examples.
Contribution
It proposes the adversarial evolution triangle and semantic-aligned feature space to improve the transferability of multimodal adversarial examples in vision-language models.
Findings
Enhanced transferability of adversarial examples across models.
Outperforms state-of-the-art attack methods in experiments.
Reduces feature redundancy to improve attack effectiveness.
Abstract
Vision-language pre-training (VLP) models excel at interpreting both images and text but remain vulnerable to multimodal adversarial examples (AEs). Advancing the generation of transferable AEs, which succeed across unseen models, is key to developing more robust and practical VLP models. Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process, aiming to improve transferability by expanding the contrast space of image-text features. However, these methods focus solely on diversity around the current AEs, yielding limited gains in transferability. To address this issue, we propose to increase the diversity of AEs by leveraging the intersection regions along the adversarial trajectory during optimization. Specifically, we propose sampling from adversarial evolution triangles composed of clean, historical, and current adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection
MethodsFocus
