Adversarial Phenomenon in the Eyes of Bayesian Deep Learning
Ambrish Rawat, Martin Wistuba, Maria-Irina Nicolae

TL;DR
This paper investigates how Bayesian neural networks handle adversarial examples, showing they exhibit increased uncertainty in such cases, which could aid in detecting adversarial attacks.
Contribution
The study provides an extensive comparison of Bayesian neural networks' responses to adversarial attacks, highlighting their potential for adversarial example detection.
Findings
Bayesian neural networks show increased uncertainty on adversarial examples.
They behave similarly under Gaussian noise and adversarial perturbations.
Bayesian methods can help in detecting adversarial attacks.
Abstract
Deep Learning models are vulnerable to adversarial examples, i.e.\ images obtained via deliberate imperceptible perturbations, such that the model misclassifies them with high confidence. However, class confidence by itself is an incomplete picture of uncertainty. We therefore use principled Bayesian methods to capture model uncertainty in prediction for observing adversarial misclassification. We provide an extensive study with different Bayesian neural networks attacked in both white-box and black-box setups. The behaviour of the networks for noise, attacks and clean test data is compared. We observe that Bayesian neural networks are uncertain in their predictions for adversarial perturbations, a behaviour similar to the one observed for random Gaussian perturbations. Thus, we conclude that Bayesian neural networks can be considered for detecting adversarial examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Algorithms
