Components Loss for Neural Networks in Mask-Based Speech Enhancement
Ziyi Xu, Samy Elshamy, Ziyue Zhao, Tim Fingscheidt

TL;DR
This paper introduces a novel components loss function for neural network training in mask-based speech enhancement, improving speech quality and residual noise naturalness over traditional loss functions.
Contribution
The paper proposes a new components loss (CL) that separately controls speech preservation, noise suppression, and residual noise naturalness, enhancing speech enhancement performance.
Findings
Better PESQ scores and SNR improvements for seen noise types.
More natural residual noise and improved perceptual speech quality.
Enhanced performance on unseen noise types.
Abstract
Estimating time-frequency domain masks for single-channel speech enhancement using deep learning methods has recently become a popular research field with promising results. In this paper, we propose a novel components loss (CL) for the training of neural networks for mask-based speech enhancement. During the training process, the proposed CL offers separate control over preservation of the speech component quality, suppression of the residual noise component, and preservation of a naturally sounding residual noise component. We illustrate the potential of the proposed CL by evaluating a standard convolutional neural network (CNN) for mask-based speech enhancement. The new CL obtains a better and more balanced performance in almost all employed instrumental quality metrics over the baseline losses, the latter comprising the conventional mean squared error (MSE) loss and also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques
