Early Methods for Detecting Adversarial Images

Dan Hendrycks; Kevin Gimpel

arXiv:1608.00530·cs.LG·March 27, 2017·122 cites

Early Methods for Detecting Adversarial Images

Dan Hendrycks, Kevin Gimpel

PDF

Open Access 1 Repo

TL;DR

This paper introduces three methods to detect adversarial images in machine learning classifiers, highlighting that adversarial examples often emphasize lower-ranked PCA components, and discusses the robustness of these detection techniques.

Contribution

The paper presents novel detection methods for adversarial images, emphasizing PCA-based analysis and robustness against adversarial bypass strategies.

Findings

01

Adversarial images focus on lower-ranked PCA components.

02

Detection methods can identify adversarial images effectively.

03

Adversaries must alter images significantly to bypass detection.

Abstract

Many machine learning classifiers are vulnerable to adversarial perturbations. An adversarial perturbation modifies an input to change a classifier's prediction without causing the input to seem substantially different to human perception. We deploy three methods to detect adversarial images. Adversaries trying to bypass our detectors must make the adversarial image less pathological or they will fail trying. Our best detection method reveals that adversarial images place abnormal emphasis on the lower-ranked principal components from PCA. Other detectors and a colorful saliency map are in an appendix.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hendrycks/fooling
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Cell Image Analysis Techniques

MethodsPrincipal Components Analysis