Conditional Generative Adversarial Networks for Speech Enhancement and   Noise-Robust Speaker Verification

Daniel Michelsanti; Zheng-Hua Tan

arXiv:1709.01703·eess.AS·November 5, 2019

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification

Daniel Michelsanti, Zheng-Hua Tan

PDF

TL;DR

This paper investigates the use of conditional GANs for speech enhancement, demonstrating that cGANs outperform traditional methods and are comparable to deep neural network approaches in noisy environments.

Contribution

It introduces a novel cGAN-based speech enhancement framework that learns to map noisy spectrograms to clean ones, improving speech quality and robustness.

Findings

01

cGAN outperforms classical SE algorithms in PESQ and STOI scores

02

cGAN achieves comparable performance to DNN-based SE methods

03

Enhanced speech improves speaker verification accuracy

Abstract

Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image processing framework proposed by Isola et al. [1] to learn a mapping from the spectrogram of noisy speech to an enhanced counterpart. The SE cGAN consists of two networks, trained in an adversarial manner: a generator that tries to enhance the input noisy spectrogram, and a discriminator that tries to distinguish between enhanced spectrograms provided by the generator and clean ones from the database using the noisy spectrogram as a condition. We evaluate the performance of the cGAN method in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.