GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning
Shrishti Saha Shetu, Emanu\"el A. P. Habets, Andreas Brendel

TL;DR
This paper introduces DisCoGAN, a GAN-based speech enhancement method conditioned on latent features from a pre-trained discriminative model, significantly improving speech quality in low SNR conditions over existing methods.
Contribution
The paper presents a novel GAN architecture conditioned on latent features from a pre-trained discriminative model for low SNR speech enhancement, outperforming existing discriminative and GAN models.
Findings
DisCoGAN outperforms state-of-the-art discriminative methods.
DisCoGAN surpasses end-to-end trained GAN models.
Conditioning configurations influence speech enhancement quality.
Abstract
Enhancing speech quality under adverse SNR conditions remains a significant challenge for discriminative deep neural network (DNN)-based approaches. In this work, we propose DisCoGAN, which is a time-frequency-domain generative adversarial network (GAN) conditioned by the latent features of a discriminative model pre-trained for speech enhancement in low SNR scenarios. Our proposed method achieves superior performance compared to state-of-the-arts discriminative methods and also surpasses end-to-end (E2E) trained GAN models. We also investigate the impact of various configurations for conditioning the proposed GAN model with the discriminative model and assess their influence on enhancing speech quality
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
