Noise-robust Speech Separation with Fast Generative Correction
Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai,, Thomas Thebaud, Najim Dehak

TL;DR
This paper introduces a diffusion-based generative correction method to improve speech separation in noisy environments, achieving state-of-the-art results and strong generalization across datasets.
Contribution
It presents a novel generative correction approach using a diffusion model to enhance discriminative speech separators, especially in noisy conditions.
Findings
Achieves state-of-the-art performance on Libri2Mix noisy dataset.
Improves SI-SNR by 22-35% relative to SepFormer.
Demonstrates robustness and generalization across different noise conditions.
Abstract
Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative separator. By leveraging a generative corrector based on a diffusion model, we refine the separation process for single-channel mixture speech by removing noises and perceptually unnatural distortions. Furthermore, we optimize the generative model using a predictive loss to streamline the diffusion model's reverse process into a single step and rectify any associated errors by the reverse process. Our method achieves state-of-the-art performance on the in-domain Libri2Mix noisy dataset, and out-of-domain WSJ with a variety of noises, improving SI-SNR by 22-35% relative to SepFormer, demonstrating robustness and strong generalization capabilities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Attention Is All You Need · Dense Connections · Softmax · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Residual Connection · Parameterized ReLU · Layer Normalization
