Towards Scalable Multi-domain Conversational Agents: The Schema-Guided   Dialogue Dataset

Abhinav Rastogi; Xiaoxue Zang; Srinivas Sunkara; Raghav Gupta; Pranav; Khaitan

arXiv:1909.05855·cs.CL·January 30, 2020

Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset

Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav, Khaitan

PDF

4 Repos 5 Datasets

TL;DR

This paper introduces the Schema-Guided Dialogue dataset, a large-scale multi-domain dataset for training and evaluating virtual assistants, and proposes a schema-guided paradigm enabling scalable, zero-shot support for new services.

Contribution

It presents a new large-scale dataset and a schema-guided approach that allows dialogue systems to support many domains and APIs with minimal training data.

Findings

01

The dataset contains over 16,000 multi-domain conversations across 16 domains.

02

The schema-guided paradigm enables zero-shot generalization to new APIs.

03

The proposed model performs competitively in dialogue state tracking tasks.

Abstract

Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.