BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive   Learning

Siyuan Liang; Mingli Zhu; Aishan Liu; Baoyuan Wu; Xiaochun Cao,; Ee-Chien Chang

arXiv:2311.12075·cs.CV·March 5, 2024·2 cites

BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning

Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao,, Ee-Chien Chang

PDF

Open Access 1 Repo

TL;DR

This paper introduces BadCLIP, a novel backdoor attack on multimodal contrastive learning models like CLIP, which remains effective even against state-of-the-art defenses by using a dual-embedding guided approach.

Contribution

The paper proposes a dual-embedding guided backdoor attack framework that is resistant to detection and fine-tuning defenses in multimodal contrastive learning models.

Findings

01

Outperforms state-of-the-art baselines by +45.3% ASR under defenses.

02

Remains effective against backdoor detection and model fine-tuning defenses.

03

Successfully attacks downstream tasks in rigorous scenarios.

Abstract

Studying backdoor attacks is valuable for model copyright protection and enhancing defenses. While existing backdoor attacks have successfully infected multimodal contrastive learning models such as CLIP, they can be easily countered by specialized backdoor defenses for MCL models. This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses and introduces the \emph{\toolns} attack, which is resistant to backdoor detection and model fine-tuning defenses. To achieve this, we draw motivations from the perspective of the Bayesian rule and propose a dual-embedding guided framework for backdoor attacks. Specifically, we ensure that visual trigger patterns approximate the textual target semantics in the embedding space, making it challenging to detect the subtle parameter variations induced by backdoor learning on such natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LiangSiyuan21/BadCLIP
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training · ALIGN · Contrastive Learning