An Active Approach for Model Interpretation
Jialin Lu, Martin Ester

TL;DR
This paper introduces an active learning-based method for model interpretation that generates synthetic instances and queries classifiers to produce more faithful and simpler rule-based explanations.
Contribution
It proposes the Active Decision Set Induction (ADS) algorithm, integrating active querying with local search to improve interpretability of machine learning models.
Findings
More faithful interpretation with simpler models achieved.
Active querying enhances the quality of model explanations.
The method outperforms traditional passive approaches.
Abstract
Model interpretation, or explanation of a machine learning classifier, aims to extract generalizable knowledge from a trained classifier into a human-understandable format, for various purposes such as model assessment, debugging and trust. From a computaional viewpoint, it is formulated as approximating the target classifier using a simpler interpretable model, such as rule models like a decision set/list/tree. Often, this approximation is handled as standard supervised learning and the only difference is that the labels are provided by the target classifier instead of ground truth. This paradigm is particularly popular because there exists a variety of well-studied supervised algorithms for learning an interpretable classifier. However, we argue that this paradigm is suboptimal for it does not utilize the unique property of the model interpretation problem, that is, the ability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
