Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP
Jiawei Kong, Hao Fang, Sihang Guo, Chenxi Qing, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Ke Xu

TL;DR
This paper introduces Class-wise Backdoor Prompt Tuning (CBPT), a novel method that uses text prompts to effectively defend CLIP models against backdoor attacks by purifying poisoned features while maintaining accuracy.
Contribution
The paper proposes a new prompt-based defense mechanism, CBPT, that effectively mitigates backdoor threats in CLIP models without degrading clean accuracy.
Findings
CBPT reduces attack success rate to 0.39% on average.
CBPT maintains an average clean accuracy of 58.83%.
Extensive experiments validate CBPT's effectiveness against multiple backdoor attacks.
Abstract
While pre-trained Vision-Language Models (VLMs) such as CLIP exhibit impressive representational capabilities for multimodal data, recent studies have revealed their vulnerability to backdoor attacks. To alleviate the threat, existing defense strategies primarily focus on fine-tuning the entire suspicious model. However, the substantial model parameters increase the difficulty of reaching a stable and consistent optimization direction, limiting their resistance against state-of-the-art attacks and often resulting in a degradation of clean accuracy. To address this challenge, we propose Class-wise Backdoor Prompt Tuning (CBPT), an efficient and effective defense mechanism that operates on text prompts to indirectly purify poisoned CLIP. Specifically, we first employ the advanced contrastive learning via carefully crafted positive and negative samples, to effectively invert the backdoor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParkinson's Disease Mechanisms and Treatments
