RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich Semantic Annotations for Task-Oriented Dialogue Modeling
Jun Quan, Shian Zhang, Qian Cao, Zizhong Li, Deyi Xiong

TL;DR
RiSAWOZ is a large-scale, richly annotated Chinese multi-domain dialogue dataset designed to advance task-oriented dialogue modeling, including coreference and ellipsis resolution, with benchmark results for various dialogue tasks.
Contribution
This paper introduces RiSAWOZ, the largest multi-domain Chinese Wizard-of-Oz dataset with comprehensive semantic and discourse annotations, filling a data gap for dialogue research.
Findings
Contains 11.2K dialogues with 150K utterances across 12 domains.
Provides detailed annotations including dialogue goals, states, acts, and discourse phenomena.
Benchmark results for intent detection, slot filling, state tracking, and coreference resolution.
Abstract
In order to alleviate the shortage of multi-domain data and to capture discourse phenomena for task-oriented dialogue modeling, we propose RiSAWOZ, a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets. Both single- and multi-domain dialogues are constructed, accounting for 65% and 35%, respectively. Each dialogue is labeled with comprehensive dialogue annotations, including dialogue goal in the form of natural language description, domain, dialogue states and acts at both the user and system side. In addition to traditional dialogue annotations, we especially provide linguistic annotations on discourse phenomena, e.g., ellipsis and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
