Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context   Learning

Shengkui Zhao; Zexu Pan; Kun Zhou; Yukun Ma; Chong Zhang; Bin Ma

arXiv:2501.10052·cs.SD·January 20, 2025

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Shengkui Zhao, Zexu Pan, Kun Zhou, Yukun Ma, Chong Zhang, Bin Ma

PDF

Open Access 2 Repos

TL;DR

This paper introduces a conditional latent diffusion model with dual-context learning for speech enhancement, reducing complexity and improving generalization to unseen noise environments by operating in a low-dimensional latent space.

Contribution

It proposes a novel combination of a variational autoencoder and a conditional latent diffusion model with dual-context learning for more efficient and robust speech enhancement.

Findings

01

Outperforms existing diffusion-based methods in speech enhancement tasks.

02

Requires fewer iterative steps for effective denoising.

03

Shows superior generalization to out-of-domain noise datasets.

Abstract

Recently, the application of diffusion probabilistic models has advanced speech enhancement through generative approaches. However, existing diffusion-based methods have focused on the generation process in high-dimensional waveform or spectral domains, leading to increased generation complexity and slower inference speeds. Additionally, these methods have primarily modelled clean speech distributions, with limited exploration of noise distributions, thereby constraining the discriminative capability of diffusion models for speech enhancement. To address these issues, we propose a novel approach that integrates a conditional latent diffusion model (cLDM) with dual-context learning (DCL). Our method utilizes a variational autoencoder (VAE) to compress mel-spectrograms into a low-dimensional latent space. We then apply cLDM to transform the latent representations of both clean speech and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing