Closeness and Uncertainty Aware Adversarial Examples Detection in Adversarial Machine Learning
Omer Faruk Tuna, Ferhat Ozgur Catak, M. Taner Eskil

TL;DR
This paper proposes and evaluates two complementary metrics, based on uncertainty estimates and deep feature subspace analysis, for detecting adversarial examples in DNNs, achieving high ROC-AUC scores across multiple datasets.
Contribution
It introduces a novel subspace-based detection method and combines it with uncertainty estimates to improve adversarial sample detection in neural networks.
Findings
Combined metrics achieve up to 99% ROC-AUC
Effective across multiple datasets and attack algorithms
Uncertainty and feature subspace methods complement each other
Abstract
While state-of-the-art Deep Neural Network (DNN) models are considered to be robust to random perturbations, it was shown that these architectures are highly vulnerable to deliberately crafted perturbations, albeit being quasi-imperceptible. These vulnerabilities make it challenging to deploy DNN models in security-critical areas. In recent years, many research studies have been conducted to develop new attack methods and come up with new defense techniques that enable more robust and reliable models. In this work, we explore and assess the usage of different type of metrics for detecting adversarial samples. We first leverage the usage of moment-based predictive uncertainty estimates of a DNN classifier obtained using Monte-Carlo Dropout Sampling. And we also introduce a new method that operates in the subspace of deep features extracted by the model. We verified the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsDropout
