Knowing What You Cannot Explain: Learning to Reject Low-Quality Explanations
Luca Stradiotti, Dario Pesenti, Stefano Teso, Jesse Davis

TL;DR
This paper introduces a framework called LtX that enables models to reject making predictions when they cannot provide high-quality explanations, improving trust and reliability in AI systems.
Contribution
The paper proposes a novel approach to reject low-quality explanations by learning a rejector using both machine and human explanation quality labels, and releases a new dataset for future research.
Findings
REX outperforms existing LtR strategies and explanation metric baselines.
The approach effectively identifies low-quality explanations using combined labels.
A new dataset with 1050 human-annotated explanations is provided.
Abstract
Learning to Reject (LtR) frameworks allow ML models to abstain from uncertain predictions and promote user trust. However, since current LtR strategies focus solely on predictive performance, they completely neglect explanation quality. Low-quality explanations -- whether they inaccurately reflect the model's reasoning or fail to satisfy users -- can severely compromise trust assessments and induce over-reliance on incorrect predictions. We argue that models should abstain from making a prediction when they cannot offer a satisfactory explanation for it and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the explanation quality. Focusing on popular attribution techniques, we propose REX (REjector of low-quality eXplanations), which learns a rejector from explanation quality labels combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning
