Improving Speech Recognition on Noisy Speech via Speech Enhancement with Multi-Discriminators CycleGAN
Chia-Yu Li, Ngoc Thang Vu

TL;DR
This paper introduces Multi-Discriminators CycleGAN, a novel speech enhancement method that improves noisy speech recognition by reducing noise through multiple discriminators focusing on different frequency areas, leading to significant WER improvements.
Contribution
The paper proposes a new CycleGAN-based speech enhancement approach with multiple discriminators and demonstrates its effectiveness on noisy speech recognition tasks.
Findings
Up to 10.03% WER reduction on development set
Up to 14.09% WER reduction on evaluation set
Training multiple generators on subsets outperforms single generator
Abstract
This paper presents our latest investigations on improving automatic speech recognition for noisy speech via speech enhancement. We propose a novel method named Multi-discriminators CycleGAN to reduce noise of input speech and therefore improve the automatic speech recognition performance. Our proposed method leverages the CycleGAN framework for speech enhancement without any parallel data and improve it by introducing multiple discriminators that check different frequency areas. Furthermore, we show that training multiple generators on homogeneous subset of the training data is better than training one generator on all the training data. We evaluate our method on CHiME-3 data set and observe up to 10.03% relatively WER improvement on the development set and up to 14.09% on the evaluation set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies
MethodsHuMan(Expedia)||How do I get a human at Expedia? · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Sigmoid Activation · Batch Normalization · Residual Block · Tanh Activation · GAN Least Squares Loss · Instance Normalization · PatchGAN
