Adversarial Logit Pairing

Harini Kannan; Alexey Kurakin; Ian Goodfellow

arXiv:1803.06373·cs.LG·March 20, 2018

Adversarial Logit Pairing

Harini Kannan, Alexey Kurakin, Ian Goodfellow

PDF

4 Repos

TL;DR

This paper introduces adversarial logit pairing, a novel defense technique that significantly improves robustness against adversarial attacks on ImageNet, achieving state-of-the-art results for both white box and black box scenarios.

Contribution

The paper develops adversarial logit pairing, a new method that enhances adversarial defenses and demonstrates superior performance on large-scale ImageNet benchmarks.

Findings

01

Achieves 27.9% accuracy against PGD white box attacks on ImageNet.

02

Drops black box attack accuracy from 66.6% to 47.1%.

03

Outperforms previous defenses in large-scale adversarial robustness.

Abstract

In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on ImageNet against PGD white box attacks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.