Towards Stable Backdoor Purification through Feature Shift Tuning
Rui Min, Zeyu Qin, Li Shen, Minhao Cheng

TL;DR
This paper proposes Feature Shift Tuning (FST), a simple and effective method for backdoor defense in neural networks that disentangles backdoor and clean features, ensuring stable purification across diverse attack scenarios with minimal tuning cost.
Contribution
The paper introduces FST, a novel feature disentanglement approach for backdoor purification that outperforms existing methods in stability and efficiency, especially at low poisoning rates.
Findings
FST achieves consistent backdoor removal across various attack settings.
FST requires only 10 epochs, reducing tuning costs significantly.
Disentangling features improves the effectiveness of backdoor defenses.
Abstract
It has been widely observed that deep neural networks (DNN) are vulnerable to backdoor attacks where attackers could manipulate the model behavior maliciously by tampering with a small set of training samples. Although a line of defense methods is proposed to mitigate this threat, they either require complicated modifications to the training process or heavily rely on the specific model architecture, which makes them hard to deploy into real-world applications. Therefore, in this paper, we instead start with fine-tuning, one of the most common and easy-to-deploy backdoor defenses, through comprehensive evaluations against diverse attack scenarios. Observations made through initial experiments show that in contrast to the promising defensive results on high poisoning rates, vanilla tuning methods completely fail at low poisoning rate scenarios. Our analysis shows that with the low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
