Ensemble Adversarial Training: Attacks and Defenses
Florian Tram\`er, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow,, Dan Boneh, Patrick McDaniel

TL;DR
This paper analyzes the limitations of standard adversarial training, introduces ensemble adversarial training to improve robustness against black-box attacks, and demonstrates its effectiveness on ImageNet, winning a defense competition.
Contribution
It identifies why single-step adversarial training converges to a degenerate minimum and proposes ensemble adversarial training as a novel method to enhance robustness against transfer attacks.
Findings
Standard adversarial training is vulnerable to black-box attacks.
Ensemble adversarial training improves robustness on ImageNet.
Our model won the NIPS 2017 defense competition.
Abstract
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
