Extracting Explanations, Justification, and Uncertainty from Black-Box Deep Neural Networks
Paul Ardis, Arjuna Flenner

TL;DR
This paper introduces a Bayesian method to extract explanations, justifications, and uncertainty estimates from black-box deep neural networks, enhancing their interpretability and reliability without retraining.
Contribution
A novel, efficient Bayesian approach that provides explanations and uncertainty measures for any black-box DNN without retraining.
Findings
Improves interpretability of DNNs
Enhances reliability in anomaly detection
Applicable to out-of-distribution detection
Abstract
Deep Neural Networks (DNNs) do not inherently compute or exhibit empirically-justified task confidence. In mission critical applications, it is important to both understand associated DNN reasoning and its supporting evidence. In this paper, we propose a novel Bayesian approach to extract explanations, justifications, and uncertainty estimates from DNNs. Our approach is efficient both in terms of memory and computation, and can be applied to any black box DNN without any retraining, including applications to anomaly detection and out-of-distribution detection tasks. We validate our approach on the CIFAR-10 dataset, and show that it can significantly improve the interpretability and reliability of DNNs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
