Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks
Ginevra Carbone, Guido Sanguinetti, Luca Bortolussi

TL;DR
This paper demonstrates that Bayesian Neural Networks offer significantly more stable and robust explanations under adversarial attacks compared to deterministic models, supported by empirical and theoretical analysis.
Contribution
It introduces the stability of Bayesian explanations under adversarial attacks and provides a geometric theoretical framework for this robustness.
Findings
Bayesian explanations are more stable under adversarial perturbations.
Deterministic explanations are brittle even when attacks do not change predictions.
Bayesian methods enhance interpretability and robustness of neural network predictions.
Abstract
We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and even under direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to being more robust to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
