Ensemble Adversarial Training: Attacks and Defenses

Florian Tram\`er; Alexey Kurakin; Nicolas Papernot; Ian Goodfellow,; Dan Boneh; Patrick McDaniel

arXiv:1705.07204·stat.ML·April 28, 2020·1.1k cites

Ensemble Adversarial Training: Attacks and Defenses

Florian Tram\`er, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow,, Dan Boneh, Patrick McDaniel

PDF

Open Access 5 Repos

TL;DR

This paper analyzes the limitations of standard adversarial training, introduces ensemble adversarial training to improve robustness against black-box attacks, and demonstrates its effectiveness on ImageNet, winning a defense competition.

Contribution

It identifies why single-step adversarial training converges to a degenerate minimum and proposes ensemble adversarial training as a novel method to enhance robustness against transfer attacks.

Findings

01

Standard adversarial training is vulnerable to black-box attacks.

02

Ensemble adversarial training improves robustness on ImageNet.

03

Our model won the NIPS 2017 defense competition.

Abstract

Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning