TL;DR
This paper introduces a novel training method that simulates ASR errors to improve the robustness of spoken language understanding in conversational dialogue systems, significantly enhancing performance on the DSTC10 challenge.
Contribution
It proposes an error simulation and self-correction approach that leverages confusion networks from ASR to train more robust SLU models, outperforming baselines on multiple metrics.
Findings
Significant boost in knowledge-seeking turn detection F1 from 0.9433 to 0.9904.
Improved knowledge cluster classification Recall@1 from 0.7924 to 0.9333.
Enhanced knowledge selection metrics, with Recall@1 increasing from 0.7358 to 0.7806.
Abstract
Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion networks generated from an ASR decoder without human transcriptions to generate a variety of error patterns for model training. We evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded task-oriented conversational dialogues with ASR errors. Experimental results show the effectiveness of our proposed approach, boosting the knowledge-seeking turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster classification is boosted from 0.7924 to 0.9333 in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
