Explainability as statistical inference
Hugo Henri Joseph Senetaire, Damien Garreau, Jes Frellsen,, Pierre-Alexandre Mattei

TL;DR
This paper introduces a novel approach to interpretability by framing it as a statistical inference problem, enabling flexible, model-agnostic explanations through a deep probabilistic model that can be learned efficiently.
Contribution
It presents a general deep probabilistic framework for interpretability, unifying existing methods and introducing new datasets for evaluation of feature importance.
Findings
Multiple imputation yields more reasonable interpretations.
The proposed model can be adapted to any predictor architecture.
Several existing interpretability methods are special cases of this framework.
Abstract
A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
