Investigating Generative Adversarial Networks based Speech   Dereverberation for Robust Speech Recognition

Ke Wang; Junbo Zhang; Sining Sun; Yujun Wang; Fei Xiang; Lei Xie

arXiv:1803.10132·cs.SD·January 1, 2019

Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition

Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie

PDF

1 Repo

TL;DR

This paper explores the application of GANs for speech dereverberation to improve robustness in speech recognition, demonstrating significant CER reductions through optimized network architectures and training strategies.

Contribution

It provides a comprehensive analysis of GAN-based speech dereverberation, highlighting the effectiveness of LSTM generators, residual connections, and specific training practices.

Findings

01

LSTM generators outperform DNN and CNN in dereverberation tasks.

02

Residual connections enhance dereverberation performance.

03

Proper training data synchronization is crucial for GAN success.

Abstract

We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangkenpu/rsrgan
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Convolution · Dogecoin Customer Service Number +1-833-534-1729 · Long Short-Term Memory