Universal Adversarial Perturbations for Vision-Language Pre-trained Models
Peng-Fei Zhang, Zi Huang, Guangdong Bai

TL;DR
This paper introduces a novel black-box method called ETU for generating universal adversarial perturbations that effectively and transferably attack vision-language pre-trained models across various tasks and datasets.
Contribution
The paper proposes ETU, a new method for creating universal adversarial perturbations that are highly transferable and effective against multiple VLP models and tasks.
Findings
ETU achieves high transferability of adversarial attacks across models.
The proposed data augmentation method ScMix enhances attack effectiveness.
Experiments demonstrate the method's success on various datasets and tasks.
Abstract
Vision-language pre-trained (VLP) models have been the foundation of numerous vision-language tasks. Given their prevalence, it becomes imperative to assess their adversarial robustness, especially when deploying them in security-crucial real-world applications. Traditionally, adversarial perturbations generated for this assessment target specific VLP models, datasets, and/or downstream tasks. This practice suffers from low transferability and additional computation costs when transitioning to new scenarios. In this work, we thoroughly investigate whether VLP models are commonly sensitive to imperceptible perturbations of a specific pattern for the image modality. To this end, we propose a novel black-box method to generate Universal Adversarial Perturbations (UAPs), which is so called the Effective and T ransferable Universal Adversarial Attack (ETU), aiming to mislead a variety of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Multimodal Machine Learning Applications
