Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks

Ginevra Carbone; Guido Sanguinetti; Luca Bortolussi

arXiv:2102.11010·cs.LG·May 6, 2022

Resilience of Bayesian Layer-Wise Explanations under Adversarial Attacks

Ginevra Carbone, Guido Sanguinetti, Luca Bortolussi

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that Bayesian Neural Networks offer significantly more stable and robust explanations under adversarial attacks compared to deterministic models, supported by empirical and theoretical analysis.

Contribution

It introduces the stability of Bayesian explanations under adversarial attacks and provides a geometric theoretical framework for this robustness.

Findings

01

Bayesian explanations are more stable under adversarial perturbations.

02

Deterministic explanations are brittle even when attacks do not change predictions.

03

Bayesian methods enhance interpretability and robustness of neural network predictions.

Abstract

We consider the problem of the stability of saliency-based explanations of Neural Network predictions under adversarial attacks in a classification task. Saliency interpretations of deterministic Neural Networks are remarkably brittle even when the attacks fail, i.e. for attacks that do not change the classification label. We empirically show that interpretations provided by Bayesian Neural Networks are considerably more stable under adversarial perturbations of the inputs and even under direct attacks to the explanations. By leveraging recent results, we also provide a theoretical explanation of this result in terms of the geometry of the data manifold. Additionally, we discuss the stability of the interpretations of high level representations of the inputs in the internal layers of a Network. Our results demonstrate that Bayesian methods, in addition to being more robust to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ginevracoal/BayesianRelevance
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications