Pre-training Feature Guided Diffusion Model for Speech Enhancement

Yiyuan Yang; Niki Trigoni; Andrew Markham

arXiv:2406.07646·cs.SD·June 13, 2024

Pre-training Feature Guided Diffusion Model for Speech Enhancement

Yiyuan Yang, Niki Trigoni, Andrew Markham

PDF

Open Access

TL;DR

This paper presents a novel pretraining feature-guided diffusion model for speech enhancement that integrates spectral features and leverages DDIM for efficient sampling, achieving state-of-the-art results with improved efficiency and robustness.

Contribution

The paper introduces a new diffusion model for speech enhancement that combines spectral feature guidance and DDIM sampling to improve performance and efficiency.

Findings

01

Achieves state-of-the-art results on public datasets.

02

Outperforms baseline models in efficiency and robustness.

03

Maintains low computational demands while enhancing speech quality.

Abstract

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement, addressing the limitations of existing discriminative and generative models. By integrating spectral features into a variational autoencoder (VAE) and leveraging pre-trained features for guidance during the reverse process, coupled with the utilization of the deterministic discrete integration method (DDIM) to streamline sampling steps, our model improves efficiency and speech enhancement quality. Demonstrating state-of-the-art results on two public datasets with different SNRs, our model outshines other baselines in efficiency and robustness. The proposed method not only optimizes performance but also enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing

MethodsDiffusion