Explaining and Harnessing Adversarial Examples
Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy

TL;DR
This paper argues that the linear nature of neural networks primarily causes their vulnerability to adversarial examples, and introduces a simple method to generate such examples, improving robustness through adversarial training.
Contribution
It provides a new linearity-based explanation for adversarial vulnerability and presents a fast method for generating adversarial examples that enhances model robustness.
Findings
Linear nature explains adversarial vulnerability
New quantitative support for the explanation
Adversarial training reduces test error on MNIST
Abstract
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Adversarial Machine Learning explained! | With examples.· youtube
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
