Approximations to the MMI criterion and their effect on lattice-based MMI
Steven Wegmann

TL;DR
This paper analyzes lattice-based MMI in speech recognition, revealing its poor convergence behavior is due to approximation issues rather than overfitting, and proposes modifications to improve its stability without losing accuracy.
Contribution
It provides a detailed analysis of lattice-based MMI's convergence issues and introduces methodological modifications to enhance its stability and effectiveness.
Findings
Lattice-based MMI does not truly converge asymptotically.
Overfitting is not the cause of performance degradation.
Modified methodology improves convergence without losing accuracy.
Abstract
Maximum mutual information (MMI) is a model selection criterion used for hidden Markov model (HMM) parameter estimation that was developed more than twenty years ago as a discriminative alternative to the maximum likelihood criterion for HMM-based speech recognition. It has been shown in the speech recognition literature that parameter estimation using the current MMI paradigm, lattice-based MMI, consistently outperforms maximum likelihood estimation, but this is at the expense of undesirable convergence properties. In particular, recognition performance is sensitive to the number of times that the iterative MMI estimation algorithm, extended Baum-Welch, is performed. In fact, too many iterations of extended Baum-Welch will lead to degraded performance, despite the fact that the MMI criterion improves at each iteration. This phenomenon is at variance with the analogous behavior of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Bayesian Methods and Mixture Models
