Interventional Speech Noise Injection for ASR Generalizable Spoken   Language Understanding

Yeonjoon Jung; Jaeseong Lee; Seungtaek Choi; Dohyeon Lee; Minsoo Kim,; Seung-won Hwang

arXiv:2410.15609·cs.CL·October 22, 2024

Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding

Yeonjoon Jung, Jaeseong Lee, Seungtaek Choi, Dohyeon Lee, Minsoo Kim,, Seung-won Hwang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel speech noise injection method to improve spoken language understanding models' robustness against diverse ASR errors, enhancing their generalizability across different speech recognition systems.

Contribution

The paper proposes a less biased noise augmentation technique that introduces plausible ASR noises, improving SLU model robustness and generalizability.

Findings

01

Enhanced SLU robustness against unseen ASR systems.

02

More diverse and plausible noise augmentation improves performance.

03

Effective in reducing ASR error impact on SLU models.

Abstract

Recently, pre-trained language models (PLMs) have been increasingly adopted in spoken language understanding (SLU). However, automatic speech recognition (ASR) systems frequently produce inaccurate transcriptions, leading to noisy inputs for SLU models, which can significantly degrade their performance. To address this, our objective is to train SLU models to withstand ASR errors by exposing them to noises commonly observed in ASR systems, referred to as ASR-plausible noises. Speech noise injection (SNI) methods have pursued this objective by introducing ASR-plausible noises, but we argue that these methods are inherently biased towards specific ASR systems, or ASR-specific noises. In this work, we propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system, by cutting off the non-causal effect of noises. Experimental results and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing