Backdoor Defense with Machine Unlearning
Yang Liu, Mingyuan Fan, Cen Chen, Ximeng Liu, Zhuo Ma, Li Wang,, Jianfeng Ma

TL;DR
This paper introduces BAERASE, a machine unlearning-based method for effectively erasing backdoors in neural networks by recovering trigger patterns and reversing the attack, outperforming existing defenses.
Contribution
BAERASE is a novel backdoor defense method that does not require full training data and achieves higher erasure effectiveness than prior techniques.
Findings
Reduces attack success rates by 99% on benchmarks
Outperforms fine-tuning and pruning methods
Effective against multiple backdoor attack types
Abstract
Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications
MethodsPruning
