WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration

Kevin Putra Santoso; Rizka Wakhidatus Sholikah; Raden Venantius Hari Ginardi

arXiv:2508.21153·cs.SD·September 1, 2025

WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration

Kevin Putra Santoso, Rizka Wakhidatus Sholikah, Raden Venantius Hari Ginardi

PDF

Open Access

TL;DR

WaveLLDM introduces a lightweight latent diffusion approach for speech enhancement that reduces computational load while maintaining high-quality audio reconstruction, offering a promising foundation for future improvements.

Contribution

The paper presents a novel architecture combining neural audio codecs with latent diffusion in a compressed space, improving efficiency over traditional diffusion models for audio restoration.

Findings

01

Achieves low Log-Spectral Distance scores (0.48-0.60) indicating accurate spectral reconstruction.

02

Demonstrates good adaptability to unseen data in speech enhancement tasks.

03

Underperforms in perceptual quality metrics compared to state-of-the-art methods.

Abstract

High-quality audio is essential in a wide range of applications, including online communication, virtual assistants, and the multimedia industry. However, degradation caused by noise, compression, and transmission artifacts remains a major challenge. While diffusion models have proven effective for audio restoration, they typically require significant computational resources and struggle to handle longer missing segments. This study introduces WaveLLDM (Wave Lightweight Latent Diffusion Model), an architecture that integrates an efficient neural audio codec with latent diffusion for audio restoration and denoising. Unlike conventional approaches that operate in the time or spectral domain, WaveLLDM processes audio in a compressed latent space, reducing computational complexity while preserving reconstruction quality. Empirical evaluations on the Voicebank+DEMAND test set demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis