TL;DR
This paper introduces ContraNet, a novel adversarial example detection method that leverages semantic contradiction to effectively identify AEs, including adaptive attacks, outperforming existing defenses and enhancing robustness when combined with adversarial training.
Contribution
ContraNet is a new detection framework that models semantic contradiction to identify adversarial examples, demonstrating superior performance against adaptive attacks and compatibility with adversarial training.
Findings
ContraNet outperforms existing AE detection methods significantly.
It remains effective against adaptive, knowledgeable attacks.
Combining ContraNet with adversarial training further improves defense.
Abstract
Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense solutions, to the best of our knowledge, they all suffer from some weaknesses, e.g., defending against only a subset of AEs or causing a relatively high accuracy loss for legitimate inputs. Moreover, most existing solutions cannot defend against adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this paper, we propose a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model. To be specific, the proposed solution, namely ContraNet, models such contradiction by first taking both the input and the inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAutoencoders
