Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning
Siyuan Liang, Kuanrong Liu, Jiajun Gong, Jiawei Liang, Yuan Xun,, Ee-Chien Chang, and Xiaochun Cao

TL;DR
This paper proposes a novel, low-cost defense method for multimodal contrastive learning models against backdoor attacks by identifying suspicious samples and applying localized token unlearning to remove malicious behaviors while maintaining high accuracy.
Contribution
It introduces a token-based localized unlearning approach that effectively mitigates backdoor threats with minimal impact on clean model performance.
Findings
Reduces backdoor attack success rate significantly.
Preserves high clean accuracy after unlearning.
Requires only a small set of poisoned samples for effective defense.
Abstract
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features using the complementary strengths of various data modalities. However, the open nature of such systems inadvertently increases the possibility of backdoor attacks. These attacks subtly embed malicious behaviors within the model during training, which can be activated by specific triggers in the inference phase, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the adverse impacts of such attacks, these defenses often degrade the clean accuracy and necessitate the construction of extensive clean training pairs. In this paper, we explore the possibility of a less-cost defense from the perspective of model unlearning, that is, whether the model can be made to quickly \textbf{u}nlearn \textbf{b}ackdoor \textbf{t}hreats (UBT) by constructing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Speech Recognition and Synthesis · Fire Detection and Safety Systems
MethodsSparse Evolutionary Training · Contrastive Learning
