R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for   the Right Reason

Naoya Inoue; Pontus Stenetorp; Kentaro Inui

arXiv:1910.04601·cs.CL·May 5, 2020

R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason

Naoya Inoue, Pontus Stenetorp, Kentaro Inui

PDF

Open Access

TL;DR

R4C introduces a new benchmark dataset for reading comprehension that emphasizes evaluating systems' reasoning abilities through derivations, addressing biases in existing datasets and enabling more reliable progress measurement.

Contribution

The paper presents R4C, a novel dataset with annotated derivations for RC, and a scalable crowdsourcing framework to evaluate reasoning skills in RC systems.

Findings

01

Automatic metrics using multiple derivations are reliable.

02

R4C assesses reasoning skills different from existing benchmarks.

03

The dataset contains 4.6k questions with 13.8k derivations.

Abstract

Recent studies have revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This prevents the community from reliably measuring the progress of RC systems. To address this issue, we introduce R4C, a new task for evaluating RC systems' internal reasoning. R4C requires giving not only answers but also derivations: explanations that justify predicted answers. We present a reliable, crowdsourced framework for scalably annotating RC datasets with derivations. We create and publicly release the R4C dataset, the first, quality-assured dataset consisting of 4.6k questions, each of which is annotated with 3 reference derivations (i.e. 13.8k derivations). Experiments show that our automatic evaluation metrics using multiple reference derivations are reliable, and that R4C assesses different skills from an existing benchmark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications