Adversarial Neuron Pruning Purifies Backdoored Deep Models
Dongxian Wu, Yisen Wang

TL;DR
This paper introduces Adversarial Neuron Pruning, a method that effectively removes backdoors from deep neural networks by pruning sensitive neurons, even with minimal clean data, enhancing security in outsourced training scenarios.
Contribution
The paper presents a novel neuron pruning technique that exploits backdoored DNNs' sensitivity to adversarial perturbations to purify and remove backdoors.
Findings
ANP effectively removes backdoors with minimal clean data
Backdoored DNNs are more sensitive to adversarial neuron perturbations
ANP maintains model performance while purifying backdoors
Abstract
As deep neural networks (DNNs) are growing larger, their requirements for computational resources become huge, which makes outsourcing training more popular. Training in a third-party platform, however, may introduce potential risks that a malicious trainer will return backdoored DNNs, which behave normally on clean samples but output targeted misclassifications whenever a trigger appears at the test time. Without any knowledge of the trigger, it is difficult to distinguish or recover benign DNNs from backdoored ones. In this paper, we first identify an unexpected sensitivity of backdoored DNNs, that is, they are much easier to collapse and tend to predict the target label on clean samples when their neurons are adversarially perturbed. Based on these observations, we propose a novel model repairing method, termed Adversarial Neuron Pruning (ANP), which prunes some sensitive neurons to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
MethodsTest · Pruning
