Splitting the Difference on Adversarial Training

Matan Levi; Aryeh Kontorovich

arXiv:2310.02480·cs.LG·October 5, 2023·1 cites

Splitting the Difference on Adversarial Training

Matan Levi, Aryeh Kontorovich

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adversarial training method that splits each class into 'clean' and 'adversarial' subclasses, simplifying decision boundaries and achieving high robustness without sacrificing natural accuracy.

Contribution

The work proposes a new approach to adversarial training by treating perturbed examples as separate classes, improving robustness while maintaining near-optimal natural accuracy.

Findings

01

Achieves 95.01% natural accuracy on CIFAR-10.

02

Provides theoretical justification for the class-splitting approach.

03

Demonstrates robustness across multiple tasks.

Abstract

The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matanle51/splitting-the-difference-on-adversarial-training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications