Adversarial vulnerability of powerful near out-of-distribution detection
Stanislav Fort

TL;DR
This paper reveals that current state-of-the-art out-of-distribution detection methods for neural networks are highly vulnerable to small, targeted adversarial perturbations, compromising their reliability especially in near OOD scenarios.
Contribution
The study demonstrates the severe adversarial vulnerabilities of leading OOD detection techniques and evaluates the robustness of various post-processing methods, proposing ensemble and Relative Mahalanobis approaches for improved resilience.
Findings
Adversarial perturbations can invert in-distribution and out-of-distribution classifications.
Ensemble methods and Relative Mahalanobis distance improve robustness against adversarial attacks.
Zero-shot OOD detection with CLIP also suffers from significant adversarial vulnerabilities.
Abstract
There has been a significant progress in detecting out-of-distribution (OOD) inputs in neural networks recently, primarily due to the use of large models pretrained on large datasets, and an emerging use of multi-modality. We show a severe adversarial vulnerability of even the strongest current OOD detection techniques. With a small, targeted perturbation to the input pixels, we can change the image assignment from an in-distribution to an out-distribution, and vice versa, easily. In particular, we demonstrate severe adversarial vulnerability on the challenging near OOD CIFAR-100 vs CIFAR-10 task, as well as on the far OOD CIFAR-100 vs SVHN. We study the adversarial robustness of several post-processing techniques, including the simple baseline of Maximum of Softmax Probabilities (MSP), the Mahalanobis distance, and the newly proposed \textit{Relative} Mahalanobis distance. By comparing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Integrated Circuits and Semiconductor Failure Analysis
MethodsSoftmax · Contrastive Language-Image Pre-training
