Naaloss: Rethinking the objective of speech enhancement
Kuan-Hsun Ho, En-Lun Yu, Jeih-weih Hung, and Berlin Chen

TL;DR
This paper introduces NAaLoss, a novel loss function for speech enhancement that reduces artifacts and noise, thereby improving automatic speech recognition performance without compromising perceptual quality.
Contribution
The study proposes NAaLoss, a new artifact- and noise-aware loss function that enables speech enhancement models to better distinguish speech, artifacts, and noise, improving ASR outcomes.
Findings
NAaLoss significantly improves ASR performance across various models and input scenarios.
It preserves speech quality and intelligibility while reducing artifacts.
Visualization confirms reduced artifacts and better noise handling.
Abstract
Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates "processing artifacts" that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers the loss of estimation, de-artifact, and noise ignorance, enabling the learned SE to individually model speech, artifacts, and noise. We examine two SE models (simple/advanced) learned with NAaLoss under various input scenarios (clean/noisy) using two configurations of the ASR system (with/without noise robustness). Experiments reveal that NAaLoss significantly improves the ASR performance of most setups while preserving the quality of SE toward perception and intelligibility. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies
