CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang

TL;DR
This paper introduces CGD, a CLIP-guided backdoor defense method that efficiently detects and neutralizes various backdoor attacks in DNNs by leveraging CLIP's capabilities, achieving near-zero attack success rates with minimal impact on clean accuracy.
Contribution
The paper presents a novel CLIP-guided approach for backdoor defense that is both efficient and effective against diverse attacks, including clean-label and clean-image backdoors.
Findings
CGD reduces attack success rates to below 1%.
Maintains clean accuracy with less than 0.3% drop.
Outperforms existing backdoor defenses across multiple datasets and attack types.
Abstract
Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Network Security and Intrusion Detection
MethodsContrastive Language-Image Pre-training
