TL;DR
This paper introduces adversarial logit pairing, a novel defense technique that significantly improves robustness against adversarial attacks on ImageNet, achieving state-of-the-art results for both white box and black box scenarios.
Contribution
The paper develops adversarial logit pairing, a new method that enhances adversarial defenses and demonstrates superior performance on large-scale ImageNet benchmarks.
Findings
Achieves 27.9% accuracy against PGD white box attacks on ImageNet.
Drops black box attack accuracy from 66.6% to 47.1%.
Outperforms previous defenses in large-scale adversarial robustness.
Abstract
In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on ImageNet against PGD white box attacks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
