TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in   Vision-Language Models

Xin Wang; Kai Chen; Jiaming Zhang; Jingjing Chen; Xingjun Ma

arXiv:2411.13136·cs.CV·November 21, 2024

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models

Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma

PDF

Open Access

TL;DR

This paper introduces TAPT, a test-time adversarial prompt tuning method that significantly improves the robustness of CLIP against visual adversarial attacks without sacrificing much accuracy on clean data.

Contribution

TAPT is a novel unsupervised, test-time defense approach that learns bimodal prompts to enhance CLIP's robustness against adversarial perturbations.

Findings

01

Increases CLIP's adversarial robustness by at least 48.9% against AutoAttack.

02

Maintains high performance on clean examples.

03

Outperforms existing adversarial prompt tuning methods.

Abstract

Large pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated excellent zero-shot generalizability across various downstream tasks. However, recent studies have shown that the inference performance of CLIP can be greatly degraded by small adversarial perturbations, especially its visual modality, posing significant safety threats. To mitigate this vulnerability, in this paper, we propose a novel defense method called Test-Time Adversarial Prompt Tuning (TAPT) to enhance the inference robustness of CLIP against visual adversarial attacks. TAPT is a test-time defense method that learns defensive bimodal (textual and visual) prompts to robustify the inference process of CLIP. Specifically, it is an unsupervised method that optimizes the defensive prompts for each test sample by minimizing a multi-view entropy and aligning adversarial-clean distributions. We evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsContrastive Language-Image Pre-training