Training Deep Models to be Explained with Fewer Examples

Tomoharu Iwata; Yuya Yoshikawa

arXiv:2112.03508·stat.ML·December 8, 2021

Training Deep Models to be Explained with Fewer Examples

Tomoharu Iwata, Yuya Yoshikawa

PDF

Open Access

TL;DR

This paper introduces a training method for deep models that enhances explanation faithfulness by enabling their predictions to be accurately explained with fewer examples, improving interpretability without sacrificing accuracy.

Contribution

It proposes a novel training approach that jointly optimizes prediction accuracy and explanation simplicity using a sparse regularizer, applicable to any neural network-based model.

Findings

01

Improves faithfulness of explanations with fewer examples

02

Maintains high predictive performance

03

Applicable to various neural network models

Abstract

Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Topic Modeling