Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
Ya-Hsin Chang, Yun-Nung Chen

TL;DR
This paper proposes a contrastive learning approach to enhance the robustness of spoken language understanding systems against ASR errors, combining supervised contrastive learning and self-distillation to improve generalization.
Contribution
It introduces a novel contrastive learning framework that improves ASR robustness in SLU by integrating supervised contrastive learning with self-distillation during model fine-tuning.
Findings
Significant performance gains on three benchmark datasets.
Enhanced robustness to ASR errors compared to baseline models.
Effective combination of supervised contrastive learning and self-distillation.
Abstract
Spoken language understanding (SLU) is an essential task for machines to understand human speech for better interactions. However, errors from the automatic speech recognizer (ASR) usually hurt the understanding performance. In reality, ASR systems may not be easy to adjust for the target scenarios. Therefore, this paper focuses on learning utterance representations that are robust to ASR errors using a contrastive objective, and further strengthens the generalization ability by combining supervised contrastive learning and self-distillation in model fine-tuning. Experiments on three benchmark datasets demonstrate the effectiveness of our proposed approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning
