SelQA: A New Benchmark for Selection-based Question Answering

Tomasz Jurczyk; Michael Zhai; Jinho D. Choi

arXiv:1606.08513·cs.CL·October 31, 2016

SelQA: A New Benchmark for Selection-based Question Answering

Tomasz Jurczyk, Michael Zhai, Jinho D. Choi

PDF

1 Repo

TL;DR

SelQA introduces a new, challenging benchmark dataset for selection-based question answering, emphasizing diversity and reduced word overlap to improve system evaluation and development.

Contribution

The paper presents SelQA, a novel dataset with an annotation scheme that enhances diversity and reduces word co-occurrence, facilitating better training and evaluation of QA systems.

Findings

01

Strong baseline results established for answer sentence selection.

02

Effective crowdsourcing annotation scheme developed.

03

Dataset covers ten prevalent Wikipedia topics.

Abstract

This paper presents a new selection-based question answering dataset, SelQA. The dataset consists of questions generated through crowdsourcing and sentence length answers that are drawn from the ten most prevalent topics in the English Wikipedia. We introduce a corpus annotation scheme that enhances the generation of large, diverse, and challenging datasets by explicitly aiming to reduce word co-occurrences between the question and answers. Our annotation scheme is composed of a series of crowdsourcing tasks with a view to more effectively utilize crowdsourcing in the creation of question answering datasets in various domains. Several systems are compared on the tasks of answer sentence selection and answer triggering, providing strong baseline results for future work to improve upon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emorynlp/selqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.