Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement

Hao Shi; Xugang Lu; Kazuki Shimada; Tatsuya Kawahara

arXiv:2505.13983·cs.SD·October 8, 2025

Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement

Hao Shi, Xugang Lu, Kazuki Shimada, Tatsuya Kawahara

PDF

Open Access 1 Repo

TL;DR

This paper explores combining deterministic enhanced features with dual-stream encoding in diffusion-based speech enhancement, introducing a novel model that improves performance and stability by leveraging different deterministic models.

Contribution

The paper proposes the DERDM-SE model that effectively combines coarse- and fine-grained deterministic features with dual-stream encoding for improved diffusion-based speech enhancement.

Findings

01

Enhanced speech quality on CHiME4 dataset

02

More stable diffusion performance compared to existing models

03

Deterministic features improve objective evaluation scores

Abstract

Diffusion-based speech enhancement (SE) models need to incorporate correct prior knowledge as reliable conditions to generate accurate predictions. However, providing reliable conditions using noisy features is challenging. One solution is to use features enhanced by deterministic methods as conditions. However, the information distortion and loss caused by deterministic methods might affect the diffusion process. In this paper, we first investigate the effects of using different deterministic SE models as conditions for diffusion. We validate two conditions depending on whether the noisy feature was used as part of the condition: one using only the deterministic feature (deterministic-only), and the other using both deterministic and noisy features (deterministic-noisy). Preliminary investigation found that using deterministic enhanced conditions improves hearing experiences on real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hshi-speech/repair-diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Speech Recognition and Synthesis