Can Distillation Mitigate Backdoor Attacks in Pre-trained Encoders?
TIngxu Han, Wei Song, Weisong Sun, Ziqi Ding, Yebo Feng, Chunrong Fang, Jun Li, Hanwei Qian, Zhenyu Chen, Yang Liu

TL;DR
This paper explores using knowledge distillation as a defense mechanism to reduce backdoor attack success rates in self-supervised learning pre-trained encoders, achieving significant mitigation with minimal accuracy loss.
Contribution
It introduces a novel application of distillation to remove backdoors from poisoned encoders in SSL, demonstrating its effectiveness through extensive experiments.
Findings
Distillation reduces attack success rate from 80.87% to 27.51%.
Minimal 6.35% drop in model accuracy.
Best performance achieved with fine-tuned teachers and attention-based losses.
Abstract
Self-Supervised Learning (SSL) has become a prominent paradigm for pre-training encoders to learning general-purpose representations from unlabeled data and releasing them on third-party platforms for broad downstream deep learning tasks. However, SSL is vulnerable to backdoor attacks, where an adversary may train and distribute poisoned pre-training encoders to contaminate the downstream models. In this paper, we study a defense mechanism based on distillation against poisoned encoders in SSL. Traditionally, distillation transfers knowledge from a pre-trained teacher model to a student model, enabling the student to replicate or refine the teacher's learned representations. We repurpose distillation to extract benign knowledge and remove backdoors from a poisoned pre-trained encoder to produce a clean and reliable pre-trained model. We conduct extensive experiments to evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSensor Technology and Measurement Systems · Fault Detection and Control Systems · Scientific Measurement and Uncertainty Evaluation
