Where Classification Fails, Interpretation Rises

Chanh Nguyen; Georgi Georgiev; Yujie Ji; Ting Wang

arXiv:1712.00558·cs.LG·December 5, 2017

Where Classification Fails, Interpretation Rises

Chanh Nguyen, Georgi Georgiev, Yujie Ji, Ting Wang

PDF

Open Access

TL;DR

This paper introduces a novel adversarial input detection framework that compares model interpretations with classifications, leveraging human discernibility to improve detection robustness against adversarial attacks.

Contribution

It proposes a new detection approach based on interpretability, contrasting interpretations with classifications, which is a departure from pattern-based methods.

Findings

01

Effective detection across multiple benchmark datasets

02

Robust against adaptive adversarial attacks

03

Opens new directions in adversarial input detection

Abstract

An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their application in security-critical domains. Most existing detection methods attempt to use carefully engineered patterns to distinguish adversarial inputs from their genuine counterparts, which however can often be circumvented by adaptive adversaries. In this work, we take a completely different route by leveraging the definition of adversarial inputs: while deceiving for deep neural networks, they are barely discernible for human visions. Building upon recent advances in interpretable models, we construct a new detection framework that contrasts an input's interpretation against its classification. We validate the efficacy of this framework through extensive experiments using benchmark datasets and attacks. We believe that this work opens a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning