Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding
Yeonjoon Jung, Jaeseong Lee, Seungtaek Choi, Dohyeon Lee, Minsoo Kim,, Seung-won Hwang

TL;DR
This paper introduces a novel speech noise injection method to improve spoken language understanding models' robustness against diverse ASR errors, enhancing their generalizability across different speech recognition systems.
Contribution
The paper proposes a less biased noise augmentation technique that introduces plausible ASR noises, improving SLU model robustness and generalizability.
Findings
Enhanced SLU robustness against unseen ASR systems.
More diverse and plausible noise augmentation improves performance.
Effective in reducing ASR error impact on SLU models.
Abstract
Recently, pre-trained language models (PLMs) have been increasingly adopted in spoken language understanding (SLU). However, automatic speech recognition (ASR) systems frequently produce inaccurate transcriptions, leading to noisy inputs for SLU models, which can significantly degrade their performance. To address this, our objective is to train SLU models to withstand ASR errors by exposing them to noises commonly observed in ASR systems, referred to as ASR-plausible noises. Speech noise injection (SNI) methods have pursued this objective by introducing ASR-plausible noises, but we argue that these methods are inherently biased towards specific ASR systems, or ASR-specific noises. In this work, we propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system, by cutting off the non-causal effect of noises. Experimental results and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing
