Provable defenses against adversarial examples via the convex outer adversarial polytope
Eric Wong, J. Zico Kolter

TL;DR
This paper introduces a method to train deep neural networks with provable robustness against norm-bounded adversarial attacks by using convex outer approximations and dual optimization, ensuring detection of all such adversarial examples.
Contribution
The authors develop a novel convex outer approximation technique and a dual optimization approach that enables training neural networks with guaranteed robustness bounds against adversarial perturbations.
Findings
Achieved less than 5.8% test error on MNIST with provable robustness against bounded $ ext{l}_ ext{infinity}$ attacks.
Developed an efficient optimization method using deep network duals for robust training.
Provided publicly available code for reproducibility.
Abstract
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
