The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models
Jos\'e Lopes, Francisco J. Chiyah Garcia, Helen Hastie

TL;DR
This paper compares data quality and model performance between lab-collected and crowd-sourced dialogue data, finding lab data achieves similar accuracy with less data, highlighting trade-offs in collection methods.
Contribution
It provides an empirical comparison of dialogue data quality from lab and crowd-sourcing, revealing efficiency differences and discussing their implications.
Findings
Lab data achieves similar accuracy with less than half the data of crowd-sourced data.
Crowd-sourced data is more cost-effective and faster to collect.
Trade-offs exist between data quality and collection efficiency.
Abstract
Challenges around collecting and processing quality data have hampered progress in data-driven dialogue models. Previous approaches are moving away from costly, resource-intensive lab settings, where collection is slow but where the data is deemed of high quality. The advent of crowd-sourcing platforms, such as Amazon Mechanical Turk, has provided researchers with an alternative cost-effective and rapid way to collect data. However, the collection of fluid, natural spoken or textual interaction can be challenging, particularly between two crowd-sourced workers. In this study, we compare the performance of dialogue models for the same interaction task but collected in two different settings: in the lab vs. crowd-sourced. We find that fewer lab dialogues are needed to reach similar accuracy, less than half the amount of lab data as crowd-sourced data. We discuss the advantages and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
