ExpMRC: Explainability Evaluation for Machine Reading Comprehension
Yiming Cui, Ting Liu, Wanxiang Che, Zhigang Chen, Shijin Wang

TL;DR
This paper introduces ExpMRC, a new benchmark for evaluating the explainability of machine reading comprehension systems, emphasizing the importance of providing both answers and explanations, and demonstrating current models' limitations.
Contribution
The paper presents ExpMRC, a comprehensive benchmark with evidence annotations across multiple datasets, and evaluates state-of-the-art models' ability to generate explanations alongside answers.
Findings
Models lag behind human performance in explainability
Unsupervised evidence extraction approaches show promise
ExpMRC is a challenging benchmark for future research
Abstract
Achieving human-level performance on some of Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, it is necessary to provide both answer prediction and its explanation to further improve the MRC system's reliability, especially for real-life applications. In this paper, we propose a new benchmark called ExpMRC for evaluating the explainability of the MRC systems. ExpMRC contains four subsets, including SQuAD, CMRC 2018, RACE, and C with additional annotations of the answer's evidence. The MRC systems are required to give not only the correct answer but also its explanation. We use state-of-the-art pre-trained language models to build baseline systems and adopt various unsupervised approaches to extract evidence without a human-annotated training set. The experimental results show that these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
