Revisiting Model's Uncertainty and Confidences for Adversarial Example Detection
Ahmed Aldahdooh, Wassim Hamidouche, and Olivier D\'eforges

TL;DR
This paper introduces SFAD, an unsupervised ensemble method that improves adversarial example detection by leveraging model uncertainty and feature map analysis, outperforming existing techniques especially against black- and gray-box attacks.
Contribution
The paper proposes SFAD, a novel unsupervised ensemble detection mechanism using SelectiveNet and feature maps, achieving superior robustness against various adversarial attacks.
Findings
Outperforms state-of-the-art detection methods against black- and gray-box attacks.
Achieves comparable performance to top methods against white-box attacks.
Fully robust against High Confidence Attacks on MNIST, partially robust on CIFAR10.
Abstract
Security-sensitive applications that rely on Deep Neural Networks (DNNs) are vulnerable to small perturbations that are crafted to generate Adversarial Examples(AEs). The AEs are imperceptible to humans and cause DNN to misclassify them. Many defense and detection techniques have been proposed. Model's confidences and Dropout, as a popular way to estimate the model's uncertainty, have been used for AE detection but they showed limited success against black- and gray-box attacks. Moreover, the state-of-the-art detection techniques have been designed for specific attacks or broken by others, need knowledge about the attacks, are not consistent, increase model parameters overhead, are time-consuming, or have latency in inference time. To trade off these factors, we revisit the model's uncertainty and confidences and propose a novel unsupervised ensemble AE detection mechanism that 1) uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsAutoencoders · Dropout
