On lattice-free boosted MMI training of HMM and CTC-based full-context ASR models
Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi,, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer

TL;DR
This paper introduces a unified framework for hybrid ASR training using LF-MMI across various modeling units and topologies, proposing novel training schemes that improve performance and efficiency.
Contribution
It generalizes LF-MMI training to full-context models and multiple units, and proposes three new training schemes with demonstrated advantages.
Findings
LF-MMI is effective for both limited and full-context models.
Proposed schemes improve training performance and decoding efficiency.
Bi-char HMM-MMI models outperform traditional GMM-HMMs as alignment models.
Abstract
Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concepts of modeling units and label topologies and building proper numerator/denominator graphs accordingly, we establish a generalized framework for hybrid acoustic modeling (AM). In this framework, we show that LF-MMI is a powerful training criterion applicable to both limited-context and full-context models, for wordpiece/mono-char/bi-char/chenone units, with both HMM/CTC topologies. From this framework, we propose three novel training schemes: chenone(ch)/wordpiece(wp)-CTC-bMMI, and wordpiece(wp)-HMM-bMMI with different advantages in training performance, decoding efficiency and decoding time-stamp accuracy. The advantages of different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
