Sources of Noise in Dialogue and How to Deal with Them
Derek Chen, Zhou Yu

TL;DR
This paper categorizes noise in dialogue systems, examines how different models respond to various noise types, and introduces a specialized data cleaning method to improve dialogue system robustness.
Contribution
It provides the first comprehensive taxonomy of dialogue noise and demonstrates a targeted denoising algorithm tailored for conversational data.
Findings
Models are robust to label errors but sensitive to dialogue-specific noise.
Dialogue noise significantly impacts system performance.
The proposed data cleaning algorithm improves dialogue system robustness.
Abstract
Training dialogue systems often entails dealing with noisy training examples and unexpected user inputs. Despite their prevalence, there currently lacks an accurate survey of dialogue noise, nor is there a clear sense of the impact of each noise type on task performance. This paper addresses this gap by first constructing a taxonomy of noise encountered by dialogue systems. In addition, we run a series of experiments to show how different models behave when subjected to varying levels of noise and types of noise. Our results reveal that models are quite robust to label errors commonly tackled by existing denoising algorithms, but that performance suffers from dialogue-specific noise. Driven by these observations, we design a data cleaning algorithm specialized for conversational settings and apply it as a proof-of-concept for targeted dialogue denoising.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech and Audio Processing · Speech Recognition and Synthesis
