Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang, Jiang, Jitao Sang

TL;DR
This paper proposes Adversarial Prompt Tuning (AdvPT), a novel method that enhances the robustness of vision-language models against adversarial image attacks by using learnable text prompts, without extensive model retraining.
Contribution
AdvPT introduces a new paradigm of adversarial prompt tuning that improves VLMs' resistance to attacks through textual input modifications, without altering model architecture.
Findings
AdvPT increases robustness against white-box adversarial attacks.
AdvPT enhances defense when combined with existing image-processing techniques.
Experimental results demonstrate significant improvements in adversarial resistance.
Abstract
With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsContrastive Language-Image Pre-training
