Selective Explanations
Lucas Monteiro Paes, Dennis Wei, Flavio P. Calmon

TL;DR
This paper introduces selective explanations, a method that identifies low-quality feature attributions from amortized explainers and enhances them, balancing efficiency and explanation accuracy in machine learning models.
Contribution
It proposes a novel approach to detect and improve low-quality explanations from amortized explainers, bridging the gap between efficiency and accuracy.
Findings
Selective explanations effectively identify low-quality attributions.
The method improves explanation quality using initial guesses.
Practitioners can control the fraction of explanations improved.
Abstract
Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there has been increasing efforts to develop amortized explainers, where a machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. In this paper, we propose selective explanations, a novel feature attribution method that (i) detects when amortized explainers generate low-quality explanations and (ii) improves these explanations using a technique called explanations with initial guess. Our selective explanation method allows practitioners to specify the fraction of samples that receive explanations with initial guess,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Advanced Neural Network Applications
