Splitting the Difference on Adversarial Training
Matan Levi, Aryeh Kontorovich

TL;DR
This paper introduces a novel adversarial training method that splits each class into 'clean' and 'adversarial' subclasses, simplifying decision boundaries and achieving high robustness without sacrificing natural accuracy.
Contribution
The work proposes a new approach to adversarial training by treating perturbed examples as separate classes, improving robustness while maintaining near-optimal natural accuracy.
Findings
Achieves 95.01% natural accuracy on CIFAR-10.
Provides theoretical justification for the class-splitting approach.
Demonstrates robustness across multiple tasks.
Abstract
The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
