GooAQ: Open Question Answering with Diverse Answer Types
Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh, Hajishirzi, Chris Callison-Burch

TL;DR
GooAQ is a large-scale dataset from Google search responses that captures diverse answer types, aiming to improve question-answering models' ability to generate varied and natural answers.
Contribution
The paper introduces GooAQ, a novel dataset with diverse answer types collected from Google, and benchmarks T5 models, highlighting the importance of pre-training for complex answer generation.
Findings
LM performance improves with annotated data for short answers
Pre-training is crucial for generating coherent long responses
GooAQ enables research on diverse answer type question answering
Abstract
While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google's responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections. We benchmarkT5 models on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗sentence-transformers/all-MiniLM-L6-v2model· 200.9M dl· ♡ 4639200.9M dl♡ 4639
- 🤗sentence-transformers/all-mpnet-base-v2model· 28.7M dl· ♡ 126628.7M dl♡ 1266
- 🤗Hum-Works/lodestone-base-4096-v1model· 112 dl· ♡ 12112 dl♡ 12
- 🤗arredondos/my_sentence_transformermodel· 1 dl1 dl
- 🤗flax-sentence-embeddings/all_datasets_v3_MiniLM-L12model· 5 dl· ♡ 25 dl♡ 2
- 🤗flax-sentence-embeddings/all_datasets_v3_MiniLM-L6model· 3 dl3 dl
- 🤗flax-sentence-embeddings/all_datasets_v3_distilroberta-basemodel· 1 dl· ♡ 21 dl♡ 2
- 🤗flax-sentence-embeddings/all_datasets_v3_mpnet-basemodel· 596 dl· ♡ 13596 dl♡ 13
- 🤗flax-sentence-embeddings/all_datasets_v3_roberta-largemodel· 24 dl· ♡ 1324 dl♡ 13
- 🤗flax-sentence-embeddings/all_datasets_v4_MiniLM-L12model· 2 dl· ♡ 22 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
