Single and Few-step Diffusion for Generative Speech Enhancement

Bunlong Lay; Jean-Marie Lemercier; Julius Richter; Timo Gerkmann

arXiv:2309.09677·eess.AS·January 17, 2024

Single and Few-step Diffusion for Generative Speech Enhancement

Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-stage training method for diffusion-based speech enhancement that significantly reduces inference time from 60 to 5 function evaluations while maintaining high performance, outperforming traditional diffusion models in low evaluation scenarios.

Contribution

The paper proposes a novel two-stage training approach that enables single or few-step diffusion for speech enhancement, improving efficiency and robustness over existing methods.

Findings

01

Achieves comparable performance with only 5 function evaluations.

02

Maintains steady performance when reducing function evaluations.

03

Outperforms baseline diffusion models in low evaluation settings.

Abstract

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate over the sampling trajectory. In this paper, we address these limitations through a two-stage training approach. In the first stage, we train the diffusion model the usual way using the generative denoising score matching loss. In the second stage, we compute the enhanced signal by solving the reverse process and compare the resulting estimate to the clean speech target using a predictive loss. We show that using this second training stage enables achieving the same performance as the baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sp-uhh/sgmse_crp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Acoustic Wave Phenomena Research

MethodsDiffusion · Denoising Score Matching