MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Marcelo Criscuolo; Erick Rocha Fonseca; Sandra Maria Alu\'isio; Ana; Carolina Speran\c{c}a-Criscuolo

arXiv:1801.03460·cs.CL·January 11, 2018

MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection

Marcelo Criscuolo, Erick Rocha Fonseca, Sandra Maria Alu\'isio, Ana, Carolina Speran\c{c}a-Criscuolo

PDF

1 Datasets

TL;DR

MilkQA is a new Portuguese dataset of consumer dairy questions designed to evaluate answer selection models, highlighting challenges posed by real-world, lengthy, and linguistically diverse questions.

Contribution

The paper introduces MilkQA, a novel dataset for answer selection in consumer questions, and evaluates multiple models, revealing the dataset's complexity and the need for advanced approaches.

Findings

01

MilkQA contains 2,657 question-answer pairs from real consumer questions.

02

Answer selection models struggle with MilkQA's linguistic complexity and question length.

03

One model achieves reasonable results but requires high computational resources.

Abstract

We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

eduagarcia/MilkQA
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.