Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang; Xingjun Ma; Xin Wang; Lingyu Qiu; Jiaqi Wang; Yu-Gang; Jiang; Jitao Sang

arXiv:2311.11261·cs.CV·August 20, 2024·1 cites

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang, Jiang, Jitao Sang

PDF

Open Access 1 Repo

TL;DR

This paper proposes Adversarial Prompt Tuning (AdvPT), a novel method that enhances the robustness of vision-language models against adversarial image attacks by using learnable text prompts, without extensive model retraining.

Contribution

AdvPT introduces a new paradigm of adversarial prompt tuning that improves VLMs' resistance to attacks through textual input modifications, without altering model architecture.

Findings

01

AdvPT increases robustness against white-box adversarial attacks.

02

AdvPT enhances defense when combined with existing image-processing techniques.

03

Experimental results demonstrate significant improvements in adversarial resistance.

Abstract

With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiamingzhang94/adversarial-prompt-tuning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsContrastive Language-Image Pre-training