Training Deep Models to be Explained with Fewer Examples
Tomoharu Iwata, Yuya Yoshikawa

TL;DR
This paper introduces a training method for deep models that enhances explanation faithfulness by enabling their predictions to be accurately explained with fewer examples, improving interpretability without sacrificing accuracy.
Contribution
It proposes a novel training approach that jointly optimizes prediction accuracy and explanation simplicity using a sparse regularizer, applicable to any neural network-based model.
Findings
Improves faithfulness of explanations with fewer examples
Maintains high predictive performance
Applicable to various neural network models
Abstract
Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Topic Modeling
