Annotation Inconsistency and Entity Bias in MultiWOZ
Kun Qian, Ahmad Beirami, Zhouhan Lin, Ankita De, Alborz Geramifard,, Zhou Yu, Chinnadhurai Sankar

TL;DR
This paper identifies annotation inconsistencies and entity biases in the MultiWOZ dataset, proposes automated corrections, and evaluates their impact on dialog state tracking performance, revealing significant effects of data quality and entity bias.
Contribution
It introduces an automated method to correct annotation inconsistencies and creates a new test set with unseen entities, highlighting their effects on dialog state tracking.
Findings
Correction of annotation inconsistencies improves JGA by 7-10%.
Entity bias influences model performance, with a 29% drop on unseen entity test set.
Dataset quality significantly impacts dialog modeling results.
Abstract
MultiWOZ is one of the most popular multi-domain task-oriented dialog datasets, containing 10K+ annotated dialogs covering eight domains. It has been widely accepted as a benchmark for various dialog tasks, e.g., dialog state tracking (DST), natural language generation (NLG), and end-to-end (E2E) dialog modeling. In this work, we identify an overlooked issue with dialog state annotation inconsistencies in the dataset, where a slot type is tagged inconsistently across similar dialogs leading to confusion for DST modeling. We propose an automated correction for this issue, which is present in a whopping 70% of the dialogs. Additionally, we notice that there is significant entity bias in the dataset (e.g., "cambridge" appears in 50% of the destination cities in the train domain). The entity bias can potentially lead to named entity memorization in generative models, which may go unnoticed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsDynamic Sparse Training
