Noise-robust Speech Separation with Fast Generative Correction

Helin Wang; Jesus Villalba; Laureano Moro-Velazquez; Jiarui Hai,; Thomas Thebaud; Najim Dehak

arXiv:2406.07461·eess.AS·June 12, 2024·Interspeech

Noise-robust Speech Separation with Fast Generative Correction

Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai,, Thomas Thebaud, Najim Dehak

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a diffusion-based generative correction method to improve speech separation in noisy environments, achieving state-of-the-art results and strong generalization across datasets.

Contribution

It presents a novel generative correction approach using a diffusion model to enhance discriminative speech separators, especially in noisy conditions.

Findings

01

Achieves state-of-the-art performance on Libri2Mix noisy dataset.

02

Improves SI-SNR by 22-35% relative to SepFormer.

03

Demonstrates robustness and generalization across different noise conditions.

Abstract

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and perceptually unnatural distortions. Furthermore, we optimize the generative model using a predictive loss to streamline the diffusion model's reverse process into a single step and rectify any associated errors by the reverse process. Our method achieves state-of-the-art performance on the in-domain Libri2Mix noisy dataset, and out-of-domain WSJ with a variety of noises, improving SI-SNR by 22-35% relative to SepFormer, demonstrating robustness and strong generalization capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WangHelin1997/Fast-GeCo
pytorchOfficial

Models

🤗
westbrook/Fast-GeCo
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Attention Is All You Need · Dense Connections · Softmax · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection · Parameterized ReLU · Layer Normalization