TL;DR
This paper introduces scalable methods for Hidden Markov Models that enable their application to large language datasets, achieving improved accuracy and efficiency compared to traditional HMMs and n-gram models.
Contribution
It presents novel techniques for scaling HMMs to large datasets with efficient inference, compact parameters, and regularization, bridging the gap with neural models.
Findings
Models outperform previous HMM and n-gram methods
Achieve closer performance to neural language models
Maintain efficient exact inference at large scale
Abstract
The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling datasets, taking ideas from recent approaches to neural modeling. We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization. Experiments show that this approach leads to models that are more accurate than previous HMM and n-gram-based methods, making progress towards the performance of state-of-the-art neural models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
