MultiWOZ 2.4: A Multi-Domain Task-Oriented Dialogue Dataset with Essential Annotation Corrections to Improve State Tracking Evaluation
Fanghua Ye, Jarana Manotumruksa, Emine Yilmaz

TL;DR
MultiWOZ 2.4 improves the annotation quality of the MultiWOZ dataset's validation and test sets, enabling more accurate evaluation of dialogue state tracking models and fostering advancements in task-oriented dialogue systems.
Contribution
This paper introduces MultiWOZ 2.4 with refined annotations for validation and test sets, addressing previous noise issues to improve evaluation accuracy.
Findings
Models perform significantly better on MultiWOZ 2.4 than on 2.1.
Refined annotations lead to more reliable state tracking evaluations.
Benchmarking shows improved model robustness with cleaner data.
Abstract
The MultiWOZ 2.0 dataset has greatly stimulated the research of task-oriented dialogue systems. However, its state annotations contain substantial noise, which hinders a proper evaluation of model performance. To address this issue, massive efforts were devoted to correcting the annotations. Three improved versions (i.e., MultiWOZ 2.1-2.3) have then been released. Nonetheless, there are still plenty of incorrect and inconsistent annotations. This work introduces MultiWOZ 2.4, which refines the annotations in the validation set and test set of MultiWOZ 2.1. The annotations in the training set remain unchanged (same as MultiWOZ 2.1) to elicit robust and noise-resilient model training. We benchmark eight state-of-the-art dialogue state tracking models on MultiWOZ 2.4. All of them demonstrate much higher performance than on MultiWOZ 2.1.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Context-Aware Activity Recognition Systems
