GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech   Enhancement

Chengzhong Wang; Jianjun Gu; Dingding Yao; Junfeng Li; Yonghong Yan

arXiv:2409.15101·cs.SD·January 23, 2025

GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement

Chengzhong Wang, Jianjun Gu, Dingding Yao, Junfeng Li, Yonghong Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces GALD-SE, a diffusion-based speech enhancement method that uses anisotropic guidance to reduce computational complexity and improve robustness, achieving state-of-the-art results with fewer parameters.

Contribution

It proposes a novel anisotropic guidance approach in diffusion models that preserves clean speech clues, significantly lowering computational costs and enhancing performance in noisy conditions.

Findings

01

Achieves state-of-the-art speech enhancement results.

02

Uses only approximately 4.5 million parameters.

03

Performs well in highly noisy environments.

Abstract

Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion model has gained lots of attention in speech enhancement area, achieving competitive results. Current diffusion-based methods blur the signal with isotropic Gaussian noise and recover clean speech from the prior. However, these methods often suffer from a substantial computational burden. We argue that the computational inefficiency partially stems from the oversight that speech enhancement is not purely a generative task; it primarily involves noise reduction and completion of missing information, while the clean clues in the original mixture do not need to be regenerated. In this paper, we propose a method that introduces noise with anisotropic guidance during the diffusion process, allowing the neural network to preserve clean clues within noisy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangchengzhong/galdse
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis