Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
Armen E. Allahverdyan, Aram Galstyan

TL;DR
This paper compares Viterbi Training and Maximum Likelihood estimation for Hidden Markov Models, showing that VT often converges faster, produces simpler models, and can be more effective in certain scenarios, especially with an exactly solvable model.
Contribution
It provides an asymptotic analysis contrasting VT and ML, demonstrating VT's advantages in convergence speed and model simplicity through an analytical framework and a solvable HMM example.
Findings
VT converges faster than ML in the studied model.
VT yields sparser, simpler models with finite degeneracy.
VT can recover most parameters even when worse than ML in general cases.
Abstract
We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam's razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Target Tracking and Data Fusion in Sensor Networks · Neural Networks and Applications
