MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement
Nan Xu, Zhaolong Huang, Xiaonan Zhi

TL;DR
The paper introduces MDDM, a novel multi-view diffusion-based model for speech enhancement that leverages features from time, frequency, and noise domains to improve speech quality with reduced computational cost.
Contribution
MDDM is the first to integrate multi-view features with a diffusion-based approach for speech enhancement, reducing sampling steps while maintaining performance.
Findings
MDDM outperforms existing methods on public and real-world datasets.
It achieves competitive speech enhancement with fewer sampling steps.
Subjective and objective metrics confirm its effectiveness.
Abstract
With the development of deep learning, speech enhancement has been greatly optimized in terms of speech quality. Previous methods typically focus on the discriminative supervised learning or generative modeling, which tends to introduce speech distortions or high computational cost. In this paper, we propose MDDM, a Multi-view Discriminative enhanced Diffusion-based Model. Specifically, we take the features of three domains (time, frequency and noise) as inputs of a discriminative prediction network, generating the preliminary spectrogram. Then, the discriminative output can be converted to clean speech by several inference sampling steps. Due to the intersection of the distributions between discriminative output and clean target, the smaller sampling steps can achieve the competitive performance compared to other diffusion-based methods. Experiments conducted on a public dataset and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Infant Health and Development · Speech Recognition and Synthesis
MethodsFocus
