Diffusion-based Generative Speech Source Separation

Robin Scheibler; Youna Ji; Soo-Whan Chung; Jaeuk Byun; Soyeon Choe,; Min-Seok Choi

arXiv:2210.17327·eess.AS·November 3, 2022·1 cites

Diffusion-based Generative Speech Source Separation

Robin Scheibler, Youna Ji, Soo-Whan Chung, Jaeuk Byun, Soyeon Choe,, Min-Seok Choi

PDF

Open Access 1 Repo

TL;DR

This paper introduces DiffSep, a novel diffusion-based approach for single-channel speech source separation that leverages score-matching of an SDE, demonstrating competitive results in speech separation and enhancement tasks.

Contribution

The paper presents a new diffusion-mixing process and a tailored training strategy for source separation, extending score-based generative modeling to speech separation and enhancement.

Findings

01

Effective separation on WSJ0 2mix dataset

02

Competitive speech enhancement on VoiceBank-DEMAND dataset

03

Demonstrates potential of diffusion models in speech processing

Abstract

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to a Gaussian distribution centered on their mixture. This formulation lets us apply the machinery of score-based generative modelling. First, we train a neural network to approximate the score function of the marginal probabilities or the diffusion-mixing process. Then, we use it to solve the reverse time SDE that progressively separates the sources starting from their mixture. We propose a modified training strategy to handle model mismatch and source permutation ambiguity. Experiments on the WSJ0 2mix dataset demonstrate the potential of the method. Furthermore, the method is also suitable for speech enhancement and shows performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fakufaku/diffusion-separation
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing