The Limitations of Model Uncertainty in Adversarial Settings
Kathrin Grosse, David Pfaff, Michael Thomas Smith, Michael Backes

TL;DR
This paper explores the limitations of Bayesian neural network uncertainty measures in adversarial settings, revealing that confidence and uncertainty can be misleading even when models are wrong, highlighting challenges in model reliability.
Contribution
The study demonstrates that Bayesian uncertainty measures can be unreliable under adversarial perturbations and compares them with Gaussian process classifiers to analyze their behavior.
Findings
Uncertainty and confidence can be misleading in adversarial scenarios.
Bayesian neural network uncertainty measures can be unsuspicious even when predictions are wrong.
Differences exist between features influencing uncertainty and confidence in most tasks.
Abstract
Machine learning models are vulnerable to adversarial examples: minor perturbations to input samples intended to deliberately cause misclassification. While an obvious security threat, adversarial examples yield as well insights about the applied model itself. We investigate adversarial examples in the context of Bayesian neural network's (BNN's) uncertainty measures. As these measures are highly non-smooth, we use a smooth Gaussian process classifier (GPC) as substitute. We show that both confidence and uncertainty can be unsuspicious even if the output is wrong. Intriguingly, we find subtle differences in the features influencing uncertainty and confidence for most tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Bacillus and Francisella bacterial research
MethodsGaussian Process
