TL;DR
This paper explores the application of GANs for speech dereverberation to improve robustness in speech recognition, demonstrating significant CER reductions through optimized network architectures and training strategies.
Contribution
It provides a comprehensive analysis of GAN-based speech dereverberation, highlighting the effectiveness of LSTM generators, residual connections, and specific training practices.
Findings
LSTM generators outperform DNN and CNN in dereverberation tasks.
Residual connections enhance dereverberation performance.
Proper training data synchronization is crucial for GAN success.
Abstract
We investigate the use of generative adversarial networks (GANs) in speech dereverberation for robust speech recognition. GANs have been recently studied for speech enhancement to remove additive noises, but there still lacks of a work to examine their ability in speech dereverberation and the advantages of using GANs have not been fully established. In this paper, we provide deep investigations in the use of GAN-based dereverberation front-end in ASR. First, we study the effectiveness of different dereverberation networks (the generator in GAN) and find that LSTM leads a significant improvement as compared with feed-forward DNN and CNN in our dataset. Second, further adding residual connections in the deep LSTMs can boost the performance as well. Finally, we find that, for the success of GAN, it is important to update the generator and the discriminator using the same mini-batch data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Convolution · Dogecoin Customer Service Number +1-833-534-1729 · Long Short-Term Memory
