The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with   SpecAugment

Wei Zhou; Wilfried Michel; Kazuki Irie; Markus Kitza; Ralf Schl\"uter,; Hermann Ney

arXiv:2004.00960·eess.AS·April 3, 2020·1 cites

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schl\"uter,, Hermann Ney

PDF

Open Access

TL;DR

This paper details a comprehensive approach to building a high-performance hybrid HMM-based ASR system for TED-LIUM Release 2, utilizing SpecAugment data augmentation to significantly improve accuracy without increasing model complexity.

Contribution

The study introduces the effective application of SpecAugment to hybrid HMM models, enhancing performance without additional model size or training time, and combines it with sMBR fine-tuning and advanced language models.

Findings

01

Achieved 5.6% WER on test set

02

Outperformed previous state-of-the-art by 27% relative

03

Validated effectiveness of SpecAugment for hybrid HMM ASR

Abstract

We present a complete training pipeline to build a state-of-the-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors. By investigating the effect of different maskings, we achieve improvements from SpecAugment on hybrid HMM models without increasing model size and training time. A subsequent sMBR training is applied to fine-tune the final acoustic model, and both LSTM and Transformer language models are trained and evaluated. Our best system achieves a 5.6% WER on the test set, which outperforms the previous state-of-the-art by 27% relative.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling