BeDKD: Backdoor Defense Based on Directional Mapping Module and Adversarial Knowledge Distillation
Zhengxian Wu, Juan Wen, Wanli Peng, Yinghan Zhou, Changtong dou, Yiming Xue

TL;DR
BeDKD introduces a novel backdoor defense method that effectively reduces attack success rates by using a directional mapping module and adversarial knowledge distillation, even with limited clean data.
Contribution
The paper proposes BeDKD, a new backdoor defense approach combining directional mapping and adversarial knowledge distillation to improve detection and mitigation with minimal data.
Findings
Reduces attack success rate by 98% on multiple datasets
Outperforms existing state-of-the-art defenses
Maintains high clean accuracy
Abstract
Although existing backdoor defenses have gained success in mitigating backdoor attacks, they still face substantial challenges. In particular, most of them rely on large amounts of clean data to weaken the backdoor mapping but generally struggle with residual trigger effects, resulting in persistently high attack success rates (ASR). Therefore, in this paper, we propose a novel \textbf{B}ackdoor d\textbf{e}fense method based on \textbf{D}irectional mapping module and adversarial \textbf{K}nowledge \textbf{D}istillation (BeDKD), which balances the trade-off between defense effectiveness and model performance using a small amount of clean and poisoned data. We first introduce a directional mapping module to identify poisoned data, which destroys clean mapping while keeping backdoor mapping on a small set of flipped clean data. Then, the adversarial knowledge distillation is designed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques
