Diffusion-based speech enhancement with a weighted generative-supervised   learning loss

Jean-Eudes Ayilo (MULTISPEECH); Mostafa Sadeghi (MULTISPEECH); Romain; Serizel (MULTISPEECH)

arXiv:2309.10457·cs.CV·September 20, 2023

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

Jean-Eudes Ayilo (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Romain, Serizel (MULTISPEECH)

PDF

Open Access

TL;DR

This paper introduces a diffusion-based speech enhancement method that combines a generative model with a weighted supervised loss to improve the quality of enhanced speech, showing promising experimental results.

Contribution

The paper proposes augmenting diffusion-based speech enhancement with a weighted MSE loss to better incorporate ground-truth speech information during training.

Findings

01

Improved speech quality over baseline diffusion models

02

Effective integration of supervised loss enhances enhancement performance

03

Experimental results validate the proposed method's effectiveness

Abstract

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at noisy speech, and subsequently learn a parameterized model to reverse this process, conditionally on noisy speech. Unlike supervised methods, generative-based SE approaches usually rely solely on an unsupervised loss, which may result in less efficient incorporation of conditioned noisy speech. To address this issue, we propose augmenting the original diffusion training objective with a mean squared error (MSE) loss, measuring the discrepancy between estimated enhanced speech and ground-truth clean speech at each reverse process iteration. Experimental results demonstrate the effectiveness of our proposed methodology.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development

MethodsDiffusion