Few-step Adversarial Schr\"{o}dinger Bridge for Generative Speech Enhancement

Seungu Han; Sungho Lee; Juheon Lee; Kyogu Lee

arXiv:2506.01460·cs.SD·June 3, 2025

Few-step Adversarial Schr\"{o}dinger Bridge for Generative Speech Enhancement

Seungu Han, Sungho Lee, Juheon Lee, Kyogu Lee

PDF

Open Access

TL;DR

This paper introduces a novel approach combining Schr"odinger Bridge and GANs for speech enhancement, significantly reducing sampling steps while maintaining high-quality denoising and dereverberation performance.

Contribution

It presents a new method that integrates Schr"odinger Bridge with GANs to improve efficiency and performance in generative speech enhancement tasks.

Findings

01

Outperforms existing models with only one inference step

02

Maintains high-quality speech enhancement at low SNRs

03

Reduces sampling steps from over 50 to just 1

Abstract

Deep generative models have recently been employed for speech enhancement to generate perceptually valid clean speech on large-scale datasets. Several diffusion models have been proposed, and more recently, a tractable Schr\"odinger Bridge has been introduced to transport between the clean and noisy speech distributions. However, these models often suffer from an iterative reverse process and require a large number of sampling steps -- more than 50. Our investigation reveals that the performance of baseline models significantly degrades when the number of sampling steps is reduced, particularly under low-SNR conditions. We propose integrating Schr\"odinger Bridge with GANs to effectively mitigate this issue, achieving high-quality outputs on full-band datasets while substantially reducing the required sampling steps. Experimental results demonstrate that our proposed model outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development