ArrayDPS-Refine: Generative Refinement of Discriminative Multi-Channel Speech Enhancement

Zhongweiyang Xu; Ashutosh Pandey; Juan Azcarreta; Zhaoheng Ni; Sanjeel Parekh; Buye Xu

arXiv:2603.24385·eess.AS·March 26, 2026

ArrayDPS-Refine: Generative Refinement of Discriminative Multi-Channel Speech Enhancement

Zhongweiyang Xu, Ashutosh Pandey, Juan Azcarreta, Zhaoheng Ni, Sanjeel Parekh, Buye Xu

PDF

Open Access

TL;DR

ArrayDPS-Refine introduces a training-free, generative refinement method that enhances discriminative multi-channel speech enhancement outputs using a diffusion prior, improving performance across various models without retraining.

Contribution

The paper presents ArrayDPS-Refine, a novel, training-free generative approach that refines discriminative speech enhancement outputs with a diffusion prior, applicable to any model and array configuration.

Findings

01

Consistently improves various discriminative models' performance

02

Effective across waveform and STFT domain models

03

No retraining required for the enhancement models

Abstract

Multi-channel speech enhancement aims to recover clean speech from noisy multi-channel recordings. Most deep learning methods employ discriminative training, which can lead to non-linear distortions from regression-based objectives, especially under challenging environmental noise conditions. Inspired by ArrayDPS for unsupervised multi-channel source separation, we introduce ArrayDPS-Refine, a method designed to enhance the outputs of discriminative models using a clean speech diffusion prior. ArrayDPS-Refine is training-free, generative, and array-agnostic. It first estimates the noise spatial covariance matrix (SCM) from the enhanced speech produced by a discriminative model, then uses this estimated noise SCM for diffusion posterior sampling. This approach allows direct refinement of any discriminative model's output without retraining. Our results show that ArrayDPS-Refine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Voice and Speech Disorders