When Not to Classify: Anomaly Detection of Attacks (ADA) on DNN Classifiers at Test Time
David J. Miller, Yulia Wang, George Kesidis

TL;DR
This paper introduces an unsupervised anomaly detection method for identifying adversarial attacks on DNN classifiers at test time, emphasizing detection over classification in certain attack scenarios.
Contribution
It proposes a novel unsupervised anomaly detector leveraging deep layer densities and class relationships, outperforming previous methods on standard image datasets.
Findings
Outperforms previous detection methods on MNIST and CIFAR-10.
Achieves high ROC AUC detection accuracy against multiple attack strategies.
Effective even under a fully white box attack scenario.
Abstract
A significant threat to the recent, wide deployment of machine learning-based systems, including deep neural networks (DNNs), is adversarial learning attacks. We analyze possible test-time evasion-attack mechanisms and show that, in some important cases, when the image has been attacked, correctly classifying it has no utility: i) when the image to be attacked is (even arbitrarily) selected from the attacker's cache; ii) when the sole recipient of the classifier's decision is the attacker. Moreover, in some application domains and scenarios it is highly actionable to detect the attack irrespective of correctly classifying in the face of it (with classification still performed if no attack is detected). We hypothesize that, even if human-imperceptible, adversarial perturbations are machine-detectable. We propose a purely unsupervised anomaly detector (AD) that, unlike previous works: i)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia?
