Inverse Classification for Comparison-based Interpretability in Machine Learning
Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier, Renard, Marcin Detyniecki

TL;DR
This paper introduces an instance-based method for post-hoc interpretability that explains classifier predictions by finding minimal changes needed to flip the classification, without requiring access to the classifier or data.
Contribution
It proposes a novel approach using Growing Spheres to generate close neighbors with different classifications, enhancing understanding of black-box models.
Findings
Effective in explaining predictions without classifier access
Applicable to different datasets demonstrating versatility
Provides insights into decision boundaries
Abstract
In the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an instance-based approach whose principle consists in determining the minimal changes needed to alter a prediction: given a data point whose classification must be explained, the proposed method consists in identifying a close neighbour classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
