Towards Understanding How Self-training Tolerates Data Backdoor Poisoning
Soumyadeep Pal, Ren Wang, Yuguang Yao, Sijia Liu

TL;DR
This paper investigates how self-training with data augmentation can mitigate backdoor attacks in machine learning models, showing significant improvements over vanilla methods and exploring combined self-supervised approaches.
Contribution
It introduces a novel self-training approach with strong data augmentations to defend against backdoor attacks, and explores combining self-supervised learning for enhanced robustness.
Findings
Self-training with data augmentation effectively reduces backdoor attack success.
Vanilla self-training is ineffective against backdoor attacks.
Combining self-supervised learning with self-training further improves defense.
Abstract
Recent studies on backdoor attacks in model training have shown that polluting a small portion of training data is sufficient to produce incorrect manipulated predictions on poisoned test-time data while maintaining high clean accuracy in downstream tasks. The stealthiness of backdoor attacks has imposed tremendous defense challenges in today's machine learning paradigm. In this paper, we explore the potential of self-training via additional unlabeled data for mitigating backdoor attacks. We begin by making a pilot study to show that vanilla self-training is not effective in backdoor mitigation. Spurred by that, we propose to defend the backdoor attacks by leveraging strong but proper data augmentations in the self-training pseudo-labeling stage. We find that the new self-training regime help in defending against backdoor attacks to a great extent. Its effectiveness is demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
