On Evaluating Explanation Utility for Human-AI Decision Making in NLP

Fateme Hashemi Chaleshtori; Atreya Ghosal; Alexander Gill; Purbid; Bambroo; Ana Marasovi\'c

arXiv:2407.03545·cs.CL·November 6, 2024·2 cites

On Evaluating Explanation Utility for Human-AI Decision Making in NLP

Fateme Hashemi Chaleshtori, Atreya Ghosal, Alexander Gill, Purbid, Bambroo, Ana Marasovi\'c

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper critically evaluates the effectiveness of explanations in human-AI decision-making within NLP, emphasizing the need for application-grounded assessments, dataset selection criteria, and rethinking human-AI teaming strategies.

Contribution

It reviews existing evaluation metrics, establishes dataset selection criteria, and demonstrates the limited impact of explanations on decision speed and accuracy in NLP tasks.

Findings

01

Only 4 out of 50 datasets meet the criteria for evaluation

02

Explanations do not speed up decision-making without accuracy loss

03

Revisiting human-AI teaming and automatic deferral is necessary

Abstract

Is explainability a false promise? This debate has emerged from the insufficient evidence that explanations help people in situations they are introduced for. More human-centered, application-grounded evaluations of explanations are needed to settle this. Yet, with no established guidelines for such studies in NLP, researchers accustomed to standardized proxy evaluations must discover appropriate measurements, tasks, datasets, and sensible models for human-AI teams in their studies. To aid with this, we first review existing metrics suitable for application-grounded evaluation. We then establish criteria to select appropriate datasets, and using them, we find that only 4 out of over 50 datasets available for explainability research in NLP meet them. We then demonstrate the importance of reassessing the state of the art to form and study human-AI teams: teaming people with models for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utahnlp/nlp-explanation-utility-guideline
pytorchOfficial

Videos

On Evaluating Explanation Utility for Human-AI Decision Making in NLP· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Semantic Web and Ontologies · AI-based Problem Solving and Planning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings