Contrastive Learning for Improving ASR Robustness in Spoken Language   Understanding

Ya-Hsin Chang; Yun-Nung Chen

arXiv:2205.00693·cs.CL·June 28, 2022

Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding

Ya-Hsin Chang, Yun-Nung Chen

PDF

Open Access 1 Repo

TL;DR

This paper proposes a contrastive learning approach to enhance the robustness of spoken language understanding systems against ASR errors, combining supervised contrastive learning and self-distillation to improve generalization.

Contribution

It introduces a novel contrastive learning framework that improves ASR robustness in SLU by integrating supervised contrastive learning with self-distillation during model fine-tuning.

Findings

01

Significant performance gains on three benchmark datasets.

02

Enhanced robustness to ASR errors compared to baseline models.

03

Effective combination of supervised contrastive learning and self-distillation.

Abstract

Spoken language understanding (SLU) is an essential task for machines to understand human speech for better interactions. However, errors from the automatic speech recognizer (ASR) usually hurt the understanding performance. In reality, ASR systems may not be easy to adjust for the target scenarios. Therefore, this paper focuses on learning utterance representations that are robust to ASR errors using a contrastive objective, and further strengthens the generalization ability by combining supervised contrastive learning and self-distillation in model fine-tuning. Experiments on three benchmark datasets demonstrate the effectiveness of our proposed approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miulab/spokencse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning