GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification
Xuwang Yin, Soheil Kolouri, Gustavo K. Rohde

TL;DR
This paper introduces a novel adversarial detection method called GAT, which uses one-vs-rest binary classifiers and generative modeling to detect adversarial examples effectively, even against white-box attacks.
Contribution
The paper proposes GAT, a principled adversarial detection framework combining one-vs-rest classifiers and generative modeling, providing robust detection against white-box adversarial attacks.
Findings
Competitive detection performance demonstrated
Effective against norm-constrained white-box attacks
Generative approach enhances detection robustness
Abstract
The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples are only valid when the attacker is oblivious to the detection mechanism. In this paper we propose a principled adversarial example detection method that can withstand norm-constrained white-box attacks. Inspired by one-versus-the-rest classification, in a K class classification problem, we train K binary classifiers where the i-th binary classifier is used to distinguish between clean data of class i and adversarially perturbed samples of other classes. At test time, we first use a trained classifier to get the predicted label (say k) of the input, and then use the k-th binary classifier to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
