On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

Morten Kolb{\ae}k; Zheng-Hua Tan; S{\o}ren Holdt Jensen; Jesper Jensen

arXiv:1909.01019·cs.SD·January 31, 2020

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

Morten Kolb{\ae}k, Zheng-Hua Tan, S{\o}ren Holdt Jensen, Jesper Jensen

PDF

TL;DR

This paper investigates how different loss functions affect the performance of deep learning-based monaural time-domain speech enhancement, highlighting the advantages of perceptually inspired losses and SI-SDR for better speech quality.

Contribution

It provides a comprehensive analysis of loss functions in time-domain speech enhancement, emphasizing perceptual losses, the importance of learning rate, and the effectiveness of SI-SDR as a general-purpose loss.

Findings

01

Perceptually inspired loss functions may improve speech quality for human listeners.

02

The learning rate significantly impacts training effectiveness in speech enhancement models.

03

SI-SDR-based loss performs well across multiple evaluation metrics.

Abstract

Many deep learning-based speech enhancement algorithms are designed to minimize the mean-square error (MSE) in some transform domain between a predicted and a target speech signal. However, optimizing for MSE does not necessarily guarantee high speech quality or intelligibility, which is the ultimate goal of many speech enhancement algorithms. Additionally, only little is known about the impact of the loss function on the emerging class of time-domain deep learning-based speech enhancement systems. We study how popular loss functions influence the performance of deep learning-based speech enhancement systems. First, we demonstrate that perceptually inspired loss functions might be advantageous if the receiver is the human auditory system. Furthermore, we show that the learning rate is a crucial design parameter even for adaptive gradient-based optimizers, which has been generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.