Provable defenses against adversarial examples via the convex outer   adversarial polytope

Eric Wong; J. Zico Kolter

arXiv:1711.00851·cs.LG·June 12, 2018·713 cites

Provable defenses against adversarial examples via the convex outer adversarial polytope

Eric Wong, J. Zico Kolter

PDF

Open Access 5 Repos

TL;DR

This paper introduces a method to train deep neural networks with provable robustness against norm-bounded adversarial attacks by using convex outer approximations and dual optimization, ensuring detection of all such adversarial examples.

Contribution

The authors develop a novel convex outer approximation technique and a dual optimization approach that enables training neural networks with guaranteed robustness bounds against adversarial perturbations.

Findings

01

Achieved less than 5.8% test error on MNIST with provable robustness against bounded $ ext{l}_ ext{infinity}$ attacks.

02

Developed an efficient optimization method using deep network duals for robust training.

03

Provided publicly available code for reproducibility.

Abstract

We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning