Improving noisy student training for low-resource languages in End-to-End ASR using CycleGAN and inter-domain losses
Chia-Yu Li, Ngoc Thang Vu

TL;DR
This paper enhances semi-supervised end-to-end speech recognition for low-resource languages by improving CycleGAN and inter-domain losses, incorporating hyperparameter tuning, and integrating into noisy student training, resulting in substantial WER reductions.
Contribution
It introduces an improved CycleGAN and inter-domain loss method with hyperparameter tuning, integrated into noisy student training for low-resource ASR, achieving notable performance gains.
Findings
20% WER reduction over baseline teacher model
10% WER reduction over baseline student model
Effective in six non-English languages
Abstract
Training a semi-supervised end-to-end speech recognition system using noisy student training has significantly improved performance. However, this approach requires a substantial amount of paired speech-text and unlabeled speech, which is costly for low-resource languages. Therefore, this paper considers a more extreme case of semi-supervised end-to-end automatic speech recognition where there are limited paired speech-text, unlabeled speech (less than five hours), and abundant external text. Firstly, we observe improved performance by training the model using our previous work on semi-supervised learning "CycleGAN and inter-domain losses" solely with external text. Secondly, we enhance "CycleGAN and inter-domain losses" by incorporating automatic hyperparameter tuning, calling it "enhanced CycleGAN inter-domain losses." Thirdly, we integrate it into the noisy student training approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
MethodsHuMan(Expedia)||How do I get a human at Expedia? · *Communicated@Fast*How Do I Communicate to Expedia? · Instance Normalization · Batch Normalization · PatchGAN · Residual Connection · RandAugment · Sigmoid Activation · Cycle Consistency Loss · GAN Least Squares Loss
