Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction
Yuanbo Li, Tianyang Xu, Cong Hu, Tao Zhou, Xiao-Jun Wu, Josef Kittler

TL;DR
This paper introduces SADCA, a novel attack method that enhances the transferability of adversarial examples in vision-language models by using semantic augmentation and dynamic contrastive interactions, leading to more effective cross-model attacks.
Contribution
The paper proposes SADCA, a semantic-augmented dynamic contrastive attack that improves transferability of adversarial examples in vision-language pre-training models through progressive, semantically guided perturbations.
Findings
SADCA outperforms existing attack methods in transferability across multiple datasets and models.
Semantic augmentation increases diversity and generalization of adversarial examples.
Dynamic contrastive interactions reinforce semantic inconsistency, enhancing attack effectiveness.
Abstract
With the rapid advancement and widespread application of vision-language pre-training (VLP) models, their vulnerability to adversarial attacks has become a critical concern. In general, the adversarial examples can typically be designed to exhibit transferable power, attacking not only different models but also across diverse tasks. However, existing attacks on language-vision models mainly rely on static cross-modal interactions and focus solely on disrupting positive image-text pairs, resulting in limited cross-modal disruption and poor transferability. To address this issue, we propose a Semantic-Augmented Dynamic Contrastive Attack (SADCA) that enhances adversarial transferability through progressive and semantically guided perturbation. SADCA progressively disrupts cross-modal alignment through dynamic interactions between adversarial images and texts. This is accomplished by SADCA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
