GAN-Based Speech Enhancement for Low SNR Using Latent Feature   Conditioning

Shrishti Saha Shetu; Emanu\"el A. P. Habets; Andreas Brendel

arXiv:2410.13599·eess.AS·October 18, 2024

GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning

Shrishti Saha Shetu, Emanu\"el A. P. Habets, Andreas Brendel

PDF

Open Access

TL;DR

This paper introduces DisCoGAN, a GAN-based speech enhancement method conditioned on latent features from a pre-trained discriminative model, significantly improving speech quality in low SNR conditions over existing methods.

Contribution

The paper presents a novel GAN architecture conditioned on latent features from a pre-trained discriminative model for low SNR speech enhancement, outperforming existing discriminative and GAN models.

Findings

01

DisCoGAN outperforms state-of-the-art discriminative methods.

02

DisCoGAN surpasses end-to-end trained GAN models.

03

Conditioning configurations influence speech enhancement quality.

Abstract

Enhancing speech quality under adverse SNR conditions remains a significant challenge for discriminative deep neural network (DNN)-based approaches. In this work, we propose DisCoGAN, which is a time-frequency-domain generative adversarial network (GAN) conditioned by the latent features of a discriminative model pre-trained for speech enhancement in low SNR scenarios. Our proposed method achieves superior performance compared to state-of-the-arts discriminative methods and also surpasses end-to-end (E2E) trained GAN models. We also investigate the impact of various configurations for conditioning the proposed GAN model with the discriminative model and assess their influence on enhancing speech quality

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis