On the Interaction of Belief Bias and Explanations

Ana Valeria Gonzalez; Anna Rogers; Anders S{\o}gaard

arXiv:2106.15355·cs.CL·June 30, 2021

On the Interaction of Belief Bias and Explanations

Ana Valeria Gonzalez, Anna Rogers, Anders S{\o}gaard

PDF

TL;DR

This paper examines how belief bias influences human evaluation of explainability methods in NLP, demonstrating that accounting for prior beliefs can significantly alter conclusions about method performance.

Contribution

It highlights the impact of belief bias on human evaluation of explanations and proposes methods to control for it in NLP explainability assessments.

Findings

01

Belief bias affects human judgments of explanation quality.

02

Controlling for prior beliefs changes evaluation outcomes.

03

Simple methods can mitigate belief bias in human assessments.

Abstract

A myriad of explainability methods have been proposed in recent years, but there is little consensus on how to evaluate them. While automatic metrics allow for quick benchmarking, it isn't clear how such metrics reflect human interaction with explanations. Human evaluation is of paramount importance, but previous protocols fail to account for belief biases affecting human performance, which may lead to misleading conclusions. We provide an overview of belief bias, its role in human evaluation, and ideas for NLP practitioners on how to account for it. For two experimental paradigms, we present a case study of gradient-based explainability introducing simple ways to account for humans' prior beliefs: models of varying quality and adversarial examples. We show that conclusions about the highest performing methods change when introducing such controls, pointing to the importance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.