ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior

Zhongweiyang Xu; Xulin Fan; Zhong-Qiu Wang; Xilin Jiang; Romit Roy Choudhury

arXiv:2505.05657·eess.AS·June 18, 2025

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior

Zhongweiyang Xu, Xulin Fan, Zhong-Qiu Wang, Xilin Jiang, Romit Roy Choudhury

PDF

Open Access 1 Repo 1 Video

TL;DR

ArrayDPS introduces an unsupervised, array-agnostic speech separation method using diffusion priors, effectively separating sources without prior microphone array information and outperforming baseline methods.

Contribution

The paper presents ArrayDPS, a novel unsupervised diffusion-based approach for blind speech separation that does not require microphone array details.

Findings

01

Outperforms baseline unsupervised methods in SDR

02

Comparable to supervised methods in separation quality

03

Operates without microphone array information

Abstract

Blind Speech Separation (BSS) aims to separate multiple speech sources from audio mixtures recorded by a microphone array. The problem is challenging because it is a blind inverse problem, i.e., the microphone array geometry, the room impulse response (RIR), and the speech sources, are all unknown. We propose ArrayDPS to solve the BSS problem in an unsupervised, array-agnostic, and generative manner. The core idea builds on diffusion posterior sampling (DPS), but unlike DPS where the likelihood is tractable, ArrayDPS must approximate the likelihood by formulating a separate optimization problem. The solution to the optimization approximates room acoustics and the relative transfer functions between microphones. These approximations, along with the diffusion priors, iterate through the ArrayDPS sampling process and ultimately yield separated voice sources. We only need a simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arraydps/arraydps
pytorchOfficial

Videos

ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Speech Recognition and Synthesis

MethodsDiffusion