A consolidated view of loss functions for supervised deep learning-based speech enhancement
Sebastian Braun, Ivan Tashev

TL;DR
This paper systematically evaluates various spectral loss functions for real-time deep speech enhancement, revealing that combining magnitude and phase-aware losses improves performance, with specific losses excelling in different aspects.
Contribution
It provides a comprehensive analysis of spectral loss functions for online speech enhancement, clarifying their individual and combined effects on performance.
Findings
Combining magnitude-only and phase-aware losses improves speech enhancement.
Using compressed spectral values significantly enhances results.
Linear domain losses like mean absolute error excel in phase-sensitive improvements.
Abstract
Deep learning-based speech enhancement for real-time applications recently made large advancements. Due to the lack of a tractable perceptual optimization target, many myths around training losses emerged, whereas the contribution to success of the loss functions in many cases has not been investigated isolated from other factors such as network architecture, features, or training procedures. In this work, we investigate a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing. We relate magnitude-only with phase-aware losses, ratios, correlation metrics, and compressed metrics. Our results reveal that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced. Furthermore, using compressed spectral values also yields a significant improvement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
