DiffVQE: Hybrid Diffusion Voice Quality Enhancement Under Acoustic Echo and Noise
Haljan Lugo Girao, Ernst Seidel, Pejman Mowlaee, Ziyue Zhao, Tim Fingscheidt

TL;DR
This paper introduces DiffVQE, a novel diffusion-based model for acoustic echo cancellation and noise reduction, outperforming previous discriminative models in quality and efficiency.
Contribution
First diffusion-based AEC model that is reproducible, demonstrating superior performance and efficiency over discriminative models like DeepVQE.
Findings
DiffVQE outperforms DeepVQE in echo and noise control.
DiffVQE has lower computational complexity and smaller model size.
DiffVQE achieves state-of-the-art results on the URGENT Challenge dataset.
Abstract
Acoustic echo and background noise pose challenges on speech enhancement in hands-free systems and speakerphones. Discriminatively trained end-to-end methods represent a powerful solution for joint acoustic echo control (AEC) and denoising. However, with the advent of generative methods, diffusion-based approaches have seen remarkable performance in speech enhancement tasks. In this work, to the best of our knowledge, we provide the first (still non-causal) diffusion-based AEC model (DiffVQE) that is reproducible in terms of topology, training data, and training framework. So far, without employing diffusion, Microsoft's discriminative DeepVQE model has been shown to excel any of the ICASSP 2023 AEC Challenge entries achieving remarkable performance. Using data from the Interspeech 2025 URGENT Challenge for a diverse, high-quality training dataset, our DiffVQE excels DeepVQE both in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
