Question and Answer Test-Train Overlap in Open-Domain Question Answering   Datasets

Patrick Lewis; Pontus Stenetorp; Sebastian Riedel

arXiv:2008.02637·cs.CL·August 7, 2020

Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets

Patrick Lewis, Pontus Stenetorp, Sebastian Riedel

PDF

1 Repo

TL;DR

This paper investigates the extent of memorization versus generalization in open-domain QA datasets, revealing significant overlap between training and test data and showing that models rely heavily on memorization rather than true understanding.

Contribution

The study provides a detailed analysis of test set overlaps in popular QA datasets and evaluates model performance differences between memorized and novel questions.

Findings

01

60-70% of test answers are in training sets

02

30% of questions have near-duplicate training questions

03

Models perform 63% worse on non-memorized questions

Abstract

Ideally Open-Domain Question Answering models should exhibit a number of competencies, ranging from simply memorizing questions seen at training time, to answering novel question formulations with answers seen during training, to generalizing to completely novel questions with novel answers. However, single aggregated test set scores do not show the full picture of what capabilities models truly have. In this work, we perform a detailed study of the test sets of three popular open-domain benchmark datasets with respect to these competencies. We find that 60-70% of test-time answers are also present somewhere in the training sets. We also find that 30% of test-set questions have a near-duplicate paraphrase in their corresponding training sets. Using these findings, we evaluate a variety of popular open-domain models to obtain greater insight into what extent they can actually generalize,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/QA-Overlap
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Attention Is All You Need · Dropout · Adam · Multi-Head Attention · Softmax · Dense Connections · Residual Connection