Pre-training Feature Guided Diffusion Model for Speech Enhancement
Yiyuan Yang, Niki Trigoni, Andrew Markham

TL;DR
This paper presents a novel pretraining feature-guided diffusion model for speech enhancement that integrates spectral features and leverages DDIM for efficient sampling, achieving state-of-the-art results with improved efficiency and robustness.
Contribution
The paper introduces a new diffusion model for speech enhancement that combines spectral feature guidance and DDIM sampling to improve performance and efficiency.
Findings
Achieves state-of-the-art results on public datasets.
Outperforms baseline models in efficiency and robustness.
Maintains low computational demands while enhancing speech quality.
Abstract
Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement, addressing the limitations of existing discriminative and generative models. By integrating spectral features into a variational autoencoder (VAE) and leveraging pre-trained features for guidance during the reverse process, coupled with the utilization of the deterministic discrete integration method (DDIM) to streamline sampling steps, our model improves efficiency and speech enhancement quality. Demonstrating state-of-the-art results on two public datasets with different SNRs, our model outshines other baselines in efficiency and robustness. The proposed method not only optimizes performance but also enhances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
MethodsDiffusion
