Evaluating Input Feature Explanations through a Unified Diagnostic Evaluation Framework
Jingyi Sun, Pepa Atanasova, Isabelle Augenstein

TL;DR
This paper introduces a unified framework for comparing different input feature explanation methods in machine learning, revealing that interactive span explanations generally outperform other types across various diagnostic properties.
Contribution
The authors develop a comprehensive framework for directly comparing highlight and interactive explanations, enabling systematic evaluation across multiple diagnostic properties.
Findings
Interactive span explanations outperform other explanation types in most diagnostic properties.
Different explanation methods have distinct strengths depending on the diagnostic property.
The study highlights the need for further research to improve and combine explanation techniques.
Abstract
Explaining the decision-making process of machine learning models is crucial for ensuring their reliability and transparency for end users. One popular explanation form highlights key input features, such as i) tokens (e.g., Shapley Values and Integrated Gradients), ii) interactions between tokens (e.g., Bivariate Shapley and Attention-based methods), or iii) interactions between spans of the input (e.g., Louvain Span Interactions). However, these explanation types have only been studied in isolation, making it difficult to judge their respective applicability. To bridge this gap, we develop a unified framework that facilitates an automated and direct comparison between highlight and interactive explanations comprised of four diagnostic properties. We conduct an extensive analysis across these three types of input feature explanations -- each utilizing three different explanation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition
