Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset
Bill Byrne, Karthik Krishnamoorthi, Chinnadhurai Sankar, Arvind, Neelakantan, Daniel Duckworth, Semih Yavuz, Ben Goodrich, Amit Dubey, Andy, Cedilnik, Kyu-Young Kim

TL;DR
The paper introduces Taskmaster-1, a large, diverse, and realistic dataset of goal-oriented dialogs across six domains, created using innovative collection methods to improve dialog system training.
Contribution
It presents a new dataset with diverse, realistic dialogs using two novel collection procedures, and provides baseline models and evaluations for dialog system research.
Findings
Dataset contains 13,215 dialogs across six domains.
Dialogs are more realistic and diverse than existing datasets.
Baseline neural models achieve benchmark performance.
Abstract
A significant barrier to progress in data-driven approaches to building dialog systems is the lack of high quality, goal-oriented conversational data. To help satisfy this elementary requirement, we introduce the initial release of the Taskmaster-1 dataset which includes 13,215 task-based dialogs comprising six domains. Two procedures were used to create this collection, each with unique advantages. The first involves a two-person, spoken "Wizard of Oz" (WOz) approach in which trained agents and crowdsourced workers interact to complete the task while the second is "self-dialog" in which crowdsourced workers write the entire dialog themselves. We do not restrict the workers to detailed scripts or to a small knowledge base and hence we observe that our dataset contains more realistic and diverse conversations in comparison to existing datasets. We offer several baseline models including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
