Can Distillation Mitigate Backdoor Attacks in Pre-trained Encoders?

TIngxu Han; Wei Song; Weisong Sun; Ziqi Ding; Yebo Feng; Chunrong Fang; Jun Li; Hanwei Qian; Zhenyu Chen; Yang Liu

arXiv:2403.03846·cs.LG·February 2, 2026·1 cites

Can Distillation Mitigate Backdoor Attacks in Pre-trained Encoders?

TIngxu Han, Wei Song, Weisong Sun, Ziqi Ding, Yebo Feng, Chunrong Fang, Jun Li, Hanwei Qian, Zhenyu Chen, Yang Liu

PDF

Open Access 1 Repo

TL;DR

This paper explores using knowledge distillation as a defense mechanism to reduce backdoor attack success rates in self-supervised learning pre-trained encoders, achieving significant mitigation with minimal accuracy loss.

Contribution

It introduces a novel application of distillation to remove backdoors from poisoned encoders in SSL, demonstrating its effectiveness through extensive experiments.

Findings

01

Distillation reduces attack success rate from 80.87% to 27.51%.

02

Minimal 6.35% drop in model accuracy.

03

Best performance achieved with fine-tuned teachers and attention-based losses.

Abstract

Self-Supervised Learning (SSL) has become a prominent paradigm for pre-training encoders to learning general-purpose representations from unlabeled data and releasing them on third-party platforms for broad downstream deep learning tasks. However, SSL is vulnerable to backdoor attacks, where an adversary may train and distribute poisoned pre-training encoders to contaminate the downstream models. In this paper, we study a defense mechanism based on distillation against poisoned encoders in SSL. Traditionally, distillation transfers knowledge from a pre-trained teacher model to a student model, enabling the student to replicate or refine the teacher's learned representations. We repurpose distillation to extract benign knowledge and remove backdoors from a poisoned pre-trained encoder to produce a clean and reliable pre-trained model. We conduct extensive experiments to evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wssun/sslbackdoormitigation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensor Technology and Measurement Systems · Fault Detection and Control Systems · Scientific Measurement and Uncertainty Evaluation