Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection   Methods

Nicholas Carlini; David Wagner

arXiv:1705.07263·cs.LG·November 2, 2017·330 cites

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Nicholas Carlini, David Wagner

PDF

Open Access

TL;DR

This paper demonstrates that existing adversarial example detection methods can be bypassed, revealing that adversarial examples are more difficult to detect than previously believed and challenging assumptions about their intrinsic properties.

Contribution

It provides a comprehensive survey of ten detection methods, shows they can all be defeated with new loss functions, and offers guidelines for evaluating future defenses.

Findings

01

All ten detection methods can be bypassed.

02

Adversarial examples are harder to detect than previously thought.

03

Properties of adversarial examples are not intrinsic.

Abstract

Neural networks are known to be vulnerable to adversarial examples: inputs that are close to natural inputs but classified incorrectly. In order to better understand the space of adversarial examples, we survey ten recent proposals that are designed for detection and compare their efficacy. We show that all can be defeated by constructing new loss functions. We conclude that adversarial examples are significantly harder to detect than previously appreciated, and the properties believed to be intrinsic to adversarial examples are in fact not. Finally, we propose several simple guidelines for evaluating future proposed defenses.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)