Comparing Feature Importance and Rule Extraction for Interpretability on Text Data
Gianluigi Lopardo, Damien Garreau

TL;DR
This paper compares feature importance and rule extraction interpretability methods for text data, revealing that different methods can produce significantly different explanations even for simple models, and introduces a new way to compare these explanations.
Contribution
It introduces a novel approach to quantitatively compare explanations from different interpretability methods applied to text data models.
Findings
Different interpretability methods can produce contrasting explanations.
The proposed comparison approach quantifies differences between explanation methods.
Explanations can vary significantly even for simple models.
Abstract
Complex machine learning algorithms are used more and more often in critical tasks involving text data, leading to the development of interpretability methods. Among local methods, two families have emerged: those computing importance scores for each feature and those extracting simple logical rules. In this paper we show that using different methods can lead to unexpectedly different explanations, even when applied to simple models for which we would expect qualitative coincidence. To quantify this effect, we propose a new approach to compare explanations produced by different methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning and Data Classification
