Backdoor Defense with Machine Unlearning

Yang Liu; Mingyuan Fan; Cen Chen; Ximeng Liu; Zhuo Ma; Li Wang,; Jianfeng Ma

arXiv:2201.09538·cs.CR·January 25, 2022·1 cites

Backdoor Defense with Machine Unlearning

Yang Liu, Mingyuan Fan, Cen Chen, Ximeng Liu, Zhuo Ma, Li Wang,, Jianfeng Ma

PDF

Open Access

TL;DR

This paper introduces BAERASE, a machine unlearning-based method for effectively erasing backdoors in neural networks by recovering trigger patterns and reversing the attack, outperforming existing defenses.

Contribution

BAERASE is a novel backdoor defense method that does not require full training data and achieves higher erasure effectiveness than prior techniques.

Findings

01

Reduces attack success rates by 99% on benchmarks

02

Outperforms fine-tuning and pruning methods

03

Effective against multiple backdoor attack types

Abstract

Backdoor injection attack is an emerging threat to the security of neural networks, however, there still exist limited effective defense methods against the attack. In this paper, we propose BAERASE, a novel method that can erase the backdoor injected into the victim model through machine unlearning. Specifically, BAERASE mainly implements backdoor defense in two key steps. First, trigger pattern recovery is conducted to extract the trigger patterns infected by the victim model. Here, the trigger pattern recovery problem is equivalent to the one of extracting an unknown noise distribution from the victim model, which can be easily resolved by the entropy maximization based generative model. Subsequently, BAERASE leverages these recovered trigger patterns to reverse the backdoor injection procedure and induce the victim model to erase the polluted memories through a newly designed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications

MethodsPruning