End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Jaeyoung Kim; Mostafa El-Khamy; Jungwon Lee

arXiv:1901.09146·cs.SD·March 10, 2023·24 cites

End-to-End Multi-Task Denoising for joint SDR and PESQ Optimization

Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces an end-to-end speech denoising method that jointly optimizes for SDR and PESQ metrics by operating in the time domain and using specialized loss functions, leading to improved speech quality.

Contribution

It proposes a novel end-to-end framework that addresses spectrum and metric mismatches by optimizing directly in the time domain with new loss functions for SDR and PESQ.

Findings

01

Significant SDR and PESQ improvements over existing methods.

02

Effective mitigation of spectrum and metric mismatches.

03

Enhanced speech quality in denoising tasks.

Abstract

Supervised learning based on a deep neural network recently has achieved substantial improvement on speech enhancement. Denoising networks learn mapping from noisy speech to clean one directly, or to a spectrum mask which is the ratio between clean and noisy spectra. In either case, the network is optimized by minimizing mean square error (MSE) between ground-truth labels and time-domain or spectrum output. However, existing schemes have either of two critical issues: spectrum and metric mismatches. The spectrum mismatch is a well known issue that any spectrum modification after short-time Fourier transform (STFT), in general, cannot be fully recovered after inverse short-time Fourier transform (ISTFT). The metric mismatch is that a conventional MSE metric is sub-optimal to maximize our target metrics, signal-to-distortion ratio (SDR) and perceptual evaluation of speech quality (PESQ).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audiolabs/torch-pesq
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Music and Audio Processing