How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR
Kazuma Iwamoto, Tsubasa Ochiai, Marc Delcroix, Rintaro Ikeshita,, Hiroshi Sato, Shoko Araki, Shigeru Katagiri

TL;DR
This paper analyzes how artifacts from speech enhancement impact automatic speech recognition, identifying artifacts as the main degradation source and proposing methods to mitigate their effect to improve ASR accuracy.
Contribution
It introduces an orthogonal projection-based decomposition to isolate noise and artifact components of SE errors and demonstrates that mitigating artifacts significantly enhances ASR performance.
Findings
Artifact components are the main cause of ASR degradation.
Adding scaled observed signals improves ASR by increasing signal-to-artifact ratio.
Mitigating artifacts leads to substantial ASR performance gains.
Abstract
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using orthogonal projection-based decomposition (OPD). OPD decomposes the SE errors into noise and artifact components. The artifact component is defined as the SE error signal that cannot be represented as a linear combination of speech and noise sources. We propose manually scaling the error components to analyze their impact on ASR. We experimentally identify the artifact component as the main cause of performance degradation, and we find that mitigating the artifact can greatly improve ASR performance. Furthermore, we demonstrate that the simple observation adding (OA) technique (i.e., adding a scaled version of the observed signal to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies
