Provably Minimally-Distorted Adversarial Examples

Nicholas Carlini; Guy Katz; Clark Barrett; David L. Dill

arXiv:1709.10207·cs.LG·February 21, 2018·101 cites

Provably Minimally-Distorted Adversarial Examples

Nicholas Carlini, Guy Katz, Clark Barrett, David L. Dill

PDF

Open Access 1 Repo

TL;DR

This paper introduces a formal verification method to generate provably minimally distorted adversarial examples, demonstrating its effectiveness in evaluating and improving neural network robustness.

Contribution

It presents a novel verification technique to construct minimal-distortion adversarial examples and applies it to assess and enhance existing defenses.

Findings

01

Successfully constructed minimal-distortion adversarial examples for neural networks.

02

Proved that adversarial retraining increases the distortion needed for attacks by a factor of 4.2.

03

Showed that many existing defenses are vulnerable despite claimed robustness.

Abstract

The ability to deploy neural networks in real-world, safety-critical systems is severely limited by the presence of adversarial examples: slightly perturbed inputs that are misclassified by the network. In recent years, several techniques have been proposed for increasing robustness to adversarial examples --- and yet most of these have been quickly shown to be vulnerable to future attacks. For example, over half of the defenses proposed by papers accepted at ICLR 2018 have already been broken. We propose to address this difficulty through formal verification techniques. We show how to construct provably minimally distorted adversarial examples: given an arbitrary neural network and input sample, we can construct adversarial examples which we prove are of minimal distortion. Using this approach, we demonstrate that one of the recent ICLR defense proposals, adversarial retraining,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huanzhang12/ATLA_robust_RL
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning