WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration
Kevin Putra Santoso, Rizka Wakhidatus Sholikah, Raden Venantius Hari Ginardi

TL;DR
WaveLLDM introduces a lightweight latent diffusion approach for speech enhancement that reduces computational load while maintaining high-quality audio reconstruction, offering a promising foundation for future improvements.
Contribution
The paper presents a novel architecture combining neural audio codecs with latent diffusion in a compressed space, improving efficiency over traditional diffusion models for audio restoration.
Findings
Achieves low Log-Spectral Distance scores (0.48-0.60) indicating accurate spectral reconstruction.
Demonstrates good adaptability to unseen data in speech enhancement tasks.
Underperforms in perceptual quality metrics compared to state-of-the-art methods.
Abstract
High-quality audio is essential in a wide range of applications, including online communication, virtual assistants, and the multimedia industry. However, degradation caused by noise, compression, and transmission artifacts remains a major challenge. While diffusion models have proven effective for audio restoration, they typically require significant computational resources and struggle to handle longer missing segments. This study introduces WaveLLDM (Wave Lightweight Latent Diffusion Model), an architecture that integrates an efficient neural audio codec with latent diffusion for audio restoration and denoising. Unlike conventional approaches that operate in the time or spectral domain, WaveLLDM processes audio in a compressed latent space, reducing computational complexity while preserving reconstruction quality. Empirical evaluations on the Voicebank+DEMAND test set demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis
