GooAQ: Open Question Answering with Diverse Answer Types

Daniel Khashabi; Amos Ng; Tushar Khot; Ashish Sabharwal; Hannaneh; Hajishirzi; Chris Callison-Burch

arXiv:2104.08727·cs.CL·September 14, 2021

GooAQ: Open Question Answering with Diverse Answer Types

Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh, Hajishirzi, Chris Callison-Burch

PDF

1 Repo 10 Models 1 Datasets

TL;DR

GooAQ is a large-scale dataset from Google search responses that captures diverse answer types, aiming to improve question-answering models' ability to generate varied and natural answers.

Contribution

The paper introduces GooAQ, a novel dataset with diverse answer types collected from Google, and benchmarks T5 models, highlighting the importance of pre-training for complex answer generation.

Findings

01

LM performance improves with annotated data for short answers

02

Pre-training is crucial for generating coherent long responses

03

GooAQ enables research on diverse answer type question answering

Abstract

While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections. We benchmarkT5 models on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/gooaq
tfOfficial

Models

Datasets

allenai/gooaq
dataset· 134 dl
134 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.