DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders

Sizai Hou; Songze Li; Duanyi Yao

arXiv:2411.16154·cs.LG·March 21, 2025

DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders

Sizai Hou, Songze Li, Duanyi Yao

PDF

Open Access 1 Repo

TL;DR

DeDe is a novel detection method that identifies backdoor attacks in SSL encoders by training decoders to generate outputs differing from triggered inputs, effectively detecting stealthy backdoors in contrastive learning and CLIP models.

Contribution

DeDe introduces a decoder-based detection mechanism for SSL encoders that effectively identifies backdoor triggers by analyzing discrepancies between inputs and decoded outputs.

Findings

01

DeDe achieves high detection accuracy against various backdoor attacks.

02

It outperforms existing detection methods in empirical evaluations.

03

DeDe works effectively on both contrastive learning and CLIP models.

Abstract

Self-supervised learning (SSL) is pervasively exploited in training high-quality upstream encoders with a large amount of unlabeled data. However, it is found to be susceptible to backdoor attacks merely via polluting a small portion of training data. The victim encoders associate triggered inputs with target embeddings, e.g., mapping a triggered cat image to an airplane embedding, such that the downstream tasks inherit unintended behaviors when the trigger is activated. Emerging backdoor attacks have shown great threats across different SSL paradigms such as contrastive learning and CLIP, yet limited research is devoted to defending against such attacks, and existing defenses fall short in detecting advanced stealthy backdoors. To address the limitations, we propose a novel detection mechanism, DeDe, which detects the activation of backdoor mappings caused by triggered inputs on victim…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jsrdcht/SSL-Backdoor
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Digital Rights Management and Security

MethodsContrastive Learning · Contrastive Language-Image Pre-training