The Disagreement Problem in Explainable Machine Learning: A   Practitioner's Perspective

Satyapriya Krishna; Tessa Han; Alex Gu; Steven Wu; Shahin Jabbari,; Himabindu Lakkaraju

arXiv:2202.01602·cs.LG·April 18, 2025·42 cites

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari,, Himabindu Lakkaraju

PDF

Open Access 1 Repo

TL;DR

This paper investigates the disagreement problem in explainable machine learning, analyzing how often explanations differ across methods, how practitioners resolve these disagreements, and highlighting the need for more principled evaluation frameworks.

Contribution

It formalizes the disagreement notion, introduces a quantitative framework, and provides empirical and user study insights into explanation disagreements in practice.

Findings

01

Explanation methods often disagree significantly.

02

Practitioners rely on ad hoc heuristics to resolve disagreements.

03

Disagreements may lead to misleading explanations in high-stakes decisions.

Abstract

As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of whether and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questions. In this work, we formalize and study the disagreement problem in explainable machine learning. More specifically, we define the notion of disagreement between explanations, analyze how often such disagreements occur in practice, and how practitioners resolve these disagreements. We first conduct interviews with data scientists to understand what constitutes disagreement between explanations generated by different methods for the same model prediction, and introduce a novel quantitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

grobruegge/vitexplcomp
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning

MethodsHigh-Order Consensuses