Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Daniel Michelsanti, Zheng-Hua Tan

TL;DR
This paper investigates the use of conditional GANs for speech enhancement, demonstrating that cGANs outperform traditional methods and are comparable to deep neural network approaches in noisy environments.
Contribution
It introduces a novel cGAN-based speech enhancement framework that learns to map noisy spectrograms to clean ones, improving speech quality and robustness.
Findings
cGAN outperforms classical SE algorithms in PESQ and STOI scores
cGAN achieves comparable performance to DNN-based SE methods
Enhanced speech improves speaker verification accuracy
Abstract
Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image processing framework proposed by Isola et al. [1] to learn a mapping from the spectrogram of noisy speech to an enhanced counterpart. The SE cGAN consists of two networks, trained in an adversarial manner: a generator that tries to enhance the input noisy spectrogram, and a discriminator that tries to distinguish between enhanced spectrograms provided by the generator and clean ones from the database using the noisy spectrogram as a condition. We evaluate the performance of the cGAN method in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
