What You See is Not What the Network Infers: Detecting Adversarial   Examples Based on Semantic Contradiction

Yijun Yang; Ruiyuan Gao; Yu Li; Qiuxia Lai; Qiang Xu

arXiv:2201.09650·cs.CR·January 25, 2022

What You See is Not What the Network Infers: Detecting Adversarial Examples Based on Semantic Contradiction

Yijun Yang, Ruiyuan Gao, Yu Li, Qiuxia Lai, Qiang Xu

PDF

1 Repo

TL;DR

This paper introduces ContraNet, a novel adversarial example detection method that leverages semantic contradiction to effectively identify AEs, including adaptive attacks, outperforming existing defenses and enhancing robustness when combined with adversarial training.

Contribution

ContraNet is a new detection framework that models semantic contradiction to identify adversarial examples, demonstrating superior performance against adaptive attacks and compatibility with adversarial training.

Findings

01

ContraNet outperforms existing AE detection methods significantly.

02

It remains effective against adaptive, knowledgeable attacks.

03

Combining ContraNet with adversarial training further improves defense.

Abstract

Adversarial examples (AEs) pose severe threats to the applications of deep neural networks (DNNs) to safety-critical domains, e.g., autonomous driving. While there has been a vast body of AE defense solutions, to the best of our knowledge, they all suffer from some weaknesses, e.g., defending against only a subset of AEs or causing a relatively high accuracy loss for legitimate inputs. Moreover, most existing solutions cannot defend against adaptive attacks, wherein attackers are knowledgeable about the defense mechanisms and craft AEs accordingly. In this paper, we propose a novel AE detection framework based on the very nature of AEs, i.e., their semantic information is inconsistent with the discriminative features extracted by the target DNN model. To be specific, the proposed solution, namely ContraNet, models such contradiction by first taking both the input and the inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cure-lab/contranet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAutoencoders