Schr\"odinger Bridge for Generative Speech Enhancement

Ante Juki\'c; Roman Korostik; Jagadeesh Balam; Boris Ginsburg

arXiv:2407.16074·eess.AS·July 24, 2024

Schr\"odinger Bridge for Generative Speech Enhancement

Ante Juki\'c, Roman Korostik, Jagadeesh Balam, Boris Ginsburg

PDF

2 Models

TL;DR

This paper introduces a novel generative speech enhancement model based on Schr"odinger bridge theory, which outperforms diffusion models in speech quality and ASR performance while being more computationally efficient.

Contribution

The paper develops a Schr"odinger bridge-based model for speech enhancement, offering a new data-to-data process formulation that improves quality and efficiency over existing diffusion models.

Findings

01

Outperforms diffusion models in speech quality metrics

02

Reduces word error rate by 20% in denoising and 6% in dereverberation

03

Achieves better quality with fewer sampling steps and lower computational cost

Abstract

This paper proposes a generative speech enhancement model based on Schr\"odinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distribution. The model is trained with a data prediction loss, aiming to recover the complex-valued clean speech coefficients, and an auxiliary time-domain loss is used to improve training of the model. The effectiveness of the proposed SB-based model is evaluated in two different speech enhancement tasks: speech denoising and speech dereverberation. The experimental results demonstrate that the proposed SB-based outperforms diffusion-based models in terms of speech quality metrics and ASR performance, e.g., resulting in relative word error rate reduction of 20% for denoising and 6% for dereverberation compared to the best baseline model. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.