Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise   Sufficient Reasons

Shahaf Bassan; Ron Eliav; Shlomit Gur

arXiv:2502.03391·cs.LG·March 4, 2025

Explain Yourself, Briefly! Self-Explaining Neural Networks with Concise Sufficient Reasons

Shahaf Bassan, Ron Eliav, Shlomit Gur

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised training method called sufficient subset training (SST) that enables neural networks to generate concise, faithful explanations as part of their predictions, overcoming computational and reliability issues of previous post-hoc methods.

Contribution

The paper presents SST, a novel training approach that integrates explanation generation into neural network training, improving efficiency and faithfulness of minimal sufficient reasons.

Findings

01

SST produces more succinct explanations than existing methods.

02

Models trained with SST maintain comparable predictive accuracy.

03

SST significantly reduces explanation generation time.

Abstract

*Minimal sufficient reasons* represent a prevalent form of explanation - the smallest subset of input features which, when held constant at their corresponding values, ensure that the prediction remains unchanged. Previous *post-hoc* methods attempt to obtain such explanations but face two main limitations: (1) Obtaining these subsets poses a computational challenge, leading most scalable methods to converge towards suboptimal, less meaningful subsets; (2) These methods heavily rely on sampling out-of-distribution input assignments, potentially resulting in counterintuitive behaviors. To tackle these limitations, we propose in this work a self-supervised training approach, which we term *sufficient subset training* (SST). Using SST, we train models to generate concise sufficient reasons for their predictions as an integral part of their output. Our results indicate that our framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/sax
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)