Adversarial vulnerability of powerful near out-of-distribution detection

Stanislav Fort

arXiv:2201.07012·cs.LG·January 19, 2022·1 cites

Adversarial vulnerability of powerful near out-of-distribution detection

Stanislav Fort

PDF

Open Access 1 Repo

TL;DR

This paper reveals that current state-of-the-art out-of-distribution detection methods for neural networks are highly vulnerable to small, targeted adversarial perturbations, compromising their reliability especially in near OOD scenarios.

Contribution

The study demonstrates the severe adversarial vulnerabilities of leading OOD detection techniques and evaluates the robustness of various post-processing methods, proposing ensemble and Relative Mahalanobis approaches for improved resilience.

Findings

01

Adversarial perturbations can invert in-distribution and out-of-distribution classifications.

02

Ensemble methods and Relative Mahalanobis distance improve robustness against adversarial attacks.

03

Zero-shot OOD detection with CLIP also suffers from significant adversarial vulnerabilities.

Abstract

There has been a significant progress in detecting out-of-distribution (OOD) inputs in neural networks recently, primarily due to the use of large models pretrained on large datasets, and an emerging use of multi-modality. We show a severe adversarial vulnerability of even the strongest current OOD detection techniques. With a small, targeted perturbation to the input pixels, we can change the image assignment from an in-distribution to an out-distribution, and vice versa, easily. In particular, we demonstrate severe adversarial vulnerability on the challenging near OOD CIFAR-100 vs CIFAR-10 task, as well as on the far OOD CIFAR-100 vs SVHN. We study the adversarial robustness of several post-processing techniques, including the simple baseline of Maximum of Softmax Probabilities (MSP), the Mahalanobis distance, and the newly proposed \textit{Relative} Mahalanobis distance. By comparing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanislavfort/adversaries_to_ood_detection
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Integrated Circuits and Semiconductor Failure Analysis

MethodsSoftmax · Contrastive Language-Image Pre-training