Exploring Speech Enhancement with Generative Adversarial Networks for   Robust Speech Recognition

Chris Donahue; Bo Li; Rohit Prabhavalkar

arXiv:1711.05747·cs.SD·November 1, 2018

Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition

Chris Donahue, Bo Li, Rohit Prabhavalkar

PDF

TL;DR

This paper explores using GANs on log-Mel spectra for speech enhancement to improve noise robustness in ASR, achieving notable WER reductions but still trailing behind multi-style training methods.

Contribution

It introduces a novel approach of applying GANs to log-Mel spectra for speech enhancement in ASR, demonstrating improved performance over raw waveform methods.

Findings

01

GANs on log-Mel spectra improve ASR noise robustness

02

Appending GAN-enhanced features yields 7% WER reduction

03

GAN enhancement outperforms waveform-based methods in noisy conditions

Abstract

We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated by both additive and reverberant noise. Motivated by recent advances in image processing, we propose operating GANs on log-Mel filterbank spectra instead of waveforms, which requires less computation and is more robust to reverberant noise. While GAN enhancement improves the performance of a clean-trained ASR system on noisy speech, it falls short of the performance achieved by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.