Attacking Bayes: On the Adversarial Robustness of Bayesian Neural Networks
Yunzhen Feng, Tim G. J. Rudner, Nikolaos Tsilivis, Julia Kempe

TL;DR
This paper critically examines the claimed robustness of Bayesian neural networks against adversarial attacks, revealing they are vulnerable despite previous assertions of inherent robustness.
Contribution
The study systematically tests state-of-the-art BNNs and uncovers their susceptibility to adversarial attacks, challenging prior claims of robustness.
Findings
BNNs are highly susceptible to adversarial attacks
Previous claims of BNN robustness are flawed
Uncertainty-based detection methods are ineffective against attacks
Abstract
Adversarial examples have been shown to cause neural networks to fail on a wide range of vision and language tasks, but recent work has claimed that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. In this work, we examine this claim. To study the adversarial robustness of BNNs, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even BNNs trained with Hamiltonian Monte Carlo, are highly susceptible to adversarial attacks. We also identify various conceptual and experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
