Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav, Khaitan

TL;DR
This paper introduces the Schema-Guided Dialogue dataset, a large-scale multi-domain dataset for training and evaluating virtual assistants, and proposes a schema-guided paradigm enabling scalable, zero-shot support for new services.
Contribution
It presents a new large-scale dataset and a schema-guided approach that allows dialogue systems to support many domains and APIs with minimal training data.
Findings
The dataset contains over 16,000 multi-domain conversations across 16 domains.
The schema-guided paradigm enables zero-shot generalization to new APIs.
The proposed model performs competitively in dialogue state tracking tasks.
Abstract
Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
