MRQA 2019 Shared Task: Evaluating Generalization in Reading   Comprehension

Adam Fisch; Alon Talmor; Robin Jia; Minjoon Seo; Eunsol Choi; Danqi; Chen

arXiv:1910.09753·cs.CL·December 24, 2019·21 cites

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi, Chen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

The MRQA 2019 shared task evaluated the ability of reading comprehension systems to generalize across diverse datasets, using unified data formats and multiple training strategies, with the best system significantly outperforming the baseline.

Contribution

This paper introduces a unified framework for evaluating generalization in reading comprehension across multiple datasets and reports on the performance of various innovative system approaches.

Findings

01

Best system achieved 72.5 F1 score, 10.7 points above baseline.

02

Multiple strategies like data sampling and adversarial training improved results.

03

Unified dataset format facilitated cross-dataset evaluation.

Abstract

We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the final six were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mrqa/MRQA-Shared-Task-2019
noneOfficial

Datasets

mrqa-workshop/mrqa
dataset· 793 dl
793 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Weight Decay · Residual Connection · Adam · Layer Normalization · Softmax · Attention Is All You Need · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention