The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari,, Himabindu Lakkaraju

TL;DR
This paper investigates the disagreement problem in explainable machine learning, analyzing how often explanations differ across methods, how practitioners resolve these disagreements, and highlighting the need for more principled evaluation frameworks.
Contribution
It formalizes the disagreement notion, introduces a quantitative framework, and provides empirical and user study insights into explanation disagreements in practice.
Findings
Explanation methods often disagree significantly.
Practitioners rely on ad hoc heuristics to resolve disagreements.
Disagreements may lead to misleading explanations in high-stakes decisions.
Abstract
As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of whether and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we formalize and study the disagreement problem in explainable machine learning. More specifically, we define the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsHigh-Order Consensuses
