The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schl\"uter,, Hermann Ney

TL;DR
This paper details a comprehensive approach to building a high-performance hybrid HMM-based ASR system for TED-LIUM Release 2, utilizing SpecAugment data augmentation to significantly improve accuracy without increasing model complexity.
Contribution
The study introduces the effective application of SpecAugment to hybrid HMM models, enhancing performance without additional model size or training time, and combines it with sMBR fine-tuning and advanced language models.
Findings
Achieved 5.6% WER on test set
Outperformed previous state-of-the-art by 27% relative
Validated effectiveness of SpecAugment for hybrid HMM ASR
Abstract
We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors. By investigating the effect of different maskings, we achieve improvements from SpecAugment on hybrid HMM models without increasing model size and training time. A subsequent sMBR training is applied to fine-tune the final acoustic model, and both LSTM and Transformer language models are trained and evaluated. Our best system achieves a 5.6% WER on the test set, which outperforms the previous state-of-the-art by 27% relative.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
