BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao,, Ee-Chien Chang

TL;DR
This paper introduces BadCLIP, a novel backdoor attack on multimodal contrastive learning models like CLIP, which remains effective even against state-of-the-art defenses by using a dual-embedding guided approach.
Contribution
The paper proposes a dual-embedding guided backdoor attack framework that is resistant to detection and fine-tuning defenses in multimodal contrastive learning models.
Findings
Outperforms state-of-the-art baselines by +45.3% ASR under defenses.
Remains effective against backdoor detection and model fine-tuning defenses.
Successfully attacks downstream tasks in rigorous scenarios.
Abstract
Studying backdoor attacks is valuable for model copyright protection and enhancing defenses. While existing backdoor attacks have successfully infected multimodal contrastive learning models such as CLIP, they can be easily countered by specialized backdoor defenses for MCL models. This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses and introduces the \emph{\toolns} attack, which is resistant to backdoor detection and model fine-tuning defenses. To achieve this, we draw motivations from the perspective of the Bayesian rule and propose a dual-embedding guided framework for backdoor attacks. Specifically, we ensure that visual trigger patterns approximate the textual target semantics in the embedding space, making it challenging to detect the subtle parameter variations induced by backdoor learning on such natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training · ALIGN · Contrastive Learning
