MilkQA: a Dataset of Consumer Questions for the Task of Answer Selection
Marcelo Criscuolo, Erick Rocha Fonseca, Sandra Maria Alu\'isio, Ana, Carolina Speran\c{c}a-Criscuolo

TL;DR
MilkQA is a new Portuguese dataset of consumer dairy questions designed to evaluate answer selection models, highlighting challenges posed by real-world, lengthy, and linguistically diverse questions.
Contribution
The paper introduces MilkQA, a novel dataset for answer selection in consumer questions, and evaluates multiple models, revealing the dataset's complexity and the need for advanced approaches.
Findings
MilkQA contains 2,657 question-answer pairs from real consumer questions.
Answer selection models struggle with MilkQA's linguistic complexity and question length.
One model achieves reasonable results but requires high computational resources.
Abstract
We introduce MilkQA, a question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service. Our dataset was filtered and anonymized by three human annotators. Consumer questions are a challenging kind of question that is usually employed as a form of seeking information. Although several question answering datasets are available, most of such resources are not suitable for research on answer selection models for consumer questions. We aim to fill this gap by making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
