On Evaluating Explanation Utility for Human-AI Decision Making in NLP
Fateme Hashemi Chaleshtori, Atreya Ghosal, Alexander Gill, Purbid, Bambroo, Ana Marasovi\'c

TL;DR
This paper critically evaluates the effectiveness of explanations in human-AI decision-making within NLP, emphasizing the need for application-grounded assessments, dataset selection criteria, and rethinking human-AI teaming strategies.
Contribution
It reviews existing evaluation metrics, establishes dataset selection criteria, and demonstrates the limited impact of explanations on decision speed and accuracy in NLP tasks.
Findings
Only 4 out of 50 datasets meet the criteria for evaluation
Explanations do not speed up decision-making without accuracy loss
Revisiting human-AI teaming and automatic deferral is necessary
Abstract
Is explainability a false promise? This debate has emerged from the insufficient evidence that explanations help people in situations they are introduced for. More human-centered, application-grounded evaluations of explanations are needed to settle this. Yet, with no established guidelines for such studies in NLP, researchers accustomed to standardized proxy evaluations must discover appropriate measurements, tasks, datasets, and sensible models for human-AI teams in their studies. To aid with this, we first review existing metrics suitable for application-grounded evaluation. We then establish criteria to select appropriate datasets, and using them, we find that only 4 out of over 50 datasets available for explainability research in NLP meet them. We then demonstrate the importance of reassessing the state of the art to form and study human-AI teams: teaming people with models for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Semantic Web and Ontologies · AI-based Problem Solving and Planning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
