DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration

Sanberk Serbest; Tijana Stojkovic; Milos Cernak; Andrew Harper

arXiv:2505.23515·eess.AS·May 30, 2025

DeepFilterGAN: A Full-band Real-time Speech Enhancement System with GAN-based Stochastic Regeneration

Sanberk Serbest, Tijana Stojkovic, Milos Cernak, Andrew Harper

PDF

Open Access

TL;DR

DeepFilterGAN is a real-time speech enhancement system that uses GAN-based stochastic regeneration to improve audio quality while maintaining low latency and computational efficiency.

Contribution

It introduces a lightweight, full-band GAN-based model for real-time speech enhancement, incorporating stochastic regeneration and noisy conditioning.

Findings

01

Improves NISQA-MOS scores over baseline

02

Low latency and 3.58M parameters for real-time use

03

Effective noisy conditioning demonstrated in ablation study

Abstract

In this work, we propose a full-band real-time speech enhancement system with GAN-based stochastic regeneration. Predictive models focus on estimating the mean of the target distribution, whereas generative models aim to learn the full distribution. This behavior of predictive models may lead to over-suppression, i.e. the removal of speech content. In the literature, it was shown that combining a predictive model with a generative one within the stochastic regeneration framework can reduce the distortion in the output. We use this framework to obtain a real-time speech enhancement system. With 3.58M parameters and a low latency, our system is designed for real-time streaming with a lightweight architecture. Experiments show that our system improves over the first stage in terms of NISQA-MOS metric. Finally, through an ablation study, we show the importance of noisy conditioning in our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques

MethodsFocus