TL;DR
This paper introduces a new, efficient method for computing contrastive explanations in machine learning models, focusing on plausibility and optimality, and formalizes the relationship between contrastive and counterfactual explanations.
Contribution
It proposes a mathematical framework and a two-phase algorithm for efficiently generating contrastive explanations, emphasizing guarantees on uniqueness and optimality.
Findings
Developed a formalization of contrastive and counterfactual explanations
Designed a 2-phase algorithm for computing pertinent positives
Achieved efficient computation with guarantees on explanation quality
Abstract
With the increasing deployment of machine learning systems in practice, transparency and explainability have become serious issues. Contrastive explanations are considered to be useful and intuitive, in particular when it comes to explaining decisions to lay people, since they mimic the way in which humans explain. Yet, so far, comparably little research has addressed computationally feasible technologies, which allow guarantees on uniqueness and optimality of the explanation and which enable an easy incorporation of additional constraints. Here, we will focus on specific types of models rather than black-box technologies. We study the relation of contrastive and counterfactual explanations and propose mathematical formalizations as well as a 2-phase algorithm for efficiently computing (plausible) pertinent positives of many standard machine learning models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
