Less is More: Culling the Training Set to Improve Robustness of Deep Neural Networks
Yongshuai Liu, Jiyu Chen, Hao Chen

TL;DR
This paper explores how removing outliers from training data can enhance the robustness of deep neural networks against adversarial attacks, proposing a detection framework that improves adversarial example detection accuracy.
Contribution
It introduces a novel outlier removal method for training data and a detection framework based on model output differences to improve adversarial robustness.
Findings
Outliers increase generalization but weaken robustness.
The proposed method detects adversarial examples with over 94% accuracy.
Sanitized models improve robustness against adversarial attacks.
Abstract
Deep neural networks are vulnerable to adversarial examples. Prior defenses attempted to make deep networks more robust by either changing the network architecture or augmenting the training set with adversarial examples, but both have inherent limitations. Motivated by recent research that shows outliers in the training set have a high negative influence on the trained model, we studied the relationship between model robustness and the quality of the training set. We first show that outliers give the model better generalization ability but weaker robustness. Next, we propose an adversarial example detection framework, in which we design two methods for removing outliers from training set to obtain the sanitized model and then detect adversarial example by calculating the difference of outputs between the original and the sanitized model. We evaluated the framework on both MNIST and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
