F1 is Not Enough! Models and Evaluation Towards User-Centered   Explainable Question Answering

Hendrik Schuff; Heike Adel; Ngoc Thang Vu

arXiv:2010.06283·cs.CL·October 14, 2020

F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering

Hendrik Schuff, Heike Adel, Ngoc Thang Vu

PDF

1 Repo

TL;DR

This paper highlights the limitations of current explainable question answering models and evaluation metrics, proposing a hierarchical model and new scores to better align with user needs and improve answer-explanation coupling.

Contribution

It introduces a hierarchical model with a regularization term and new evaluation scores to enhance answer-explanation coupling in explainable QA systems.

Findings

01

Models improve users' ability to judge correctness

02

F1 score is insufficient for practical usefulness

03

New scores better align with user experience

Abstract

Explainable question answering systems predict an answer together with an explanation showing why the answer has been selected. The goal is to enable users to assess the correctness of the system and understand its reasoning process. However, we show that current models and evaluation settings have shortcomings regarding the coupling of answer and explanation which might cause serious issues in user experience. As a remedy, we propose a hierarchical model and a new regularization term to strengthen the answer-explanation coupling as well as two evaluation scores to quantify the coupling. We conduct experiments on the HOTPOTQA benchmark data set and perform a user study. The user study shows that our models increase the ability of the users to judge the correctness of the system and that scores like F1 are not enough to estimate the usefulness of a model in a practical setting with human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boschresearch/f1-is-not-enough
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.