Scaling Hidden Markov Language Models

Justin T. Chiu; Alexander M. Rush

arXiv:2011.04640·cs.CL·November 10, 2020

Scaling Hidden Markov Language Models

Justin T. Chiu, Alexander M. Rush

PDF

1 Repo

TL;DR

This paper introduces scalable methods for Hidden Markov Models that enable their application to large language datasets, achieving improved accuracy and efficiency compared to traditional HMMs and n-gram models.

Contribution

It presents novel techniques for scaling HMMs to large datasets with efficient inference, compact parameters, and regularization, bridging the gap with neural models.

Findings

01

Models outperform previous HMM and n-gram methods

02

Achieve closer performance to neural language models

03

Maintain efficient exact inference at large scale

Abstract

The hidden Markov model (HMM) is a fundamental tool for sequence modeling that cleanly separates the hidden state from the emission structure. However, this separation makes it difficult to fit HMMs to large datasets in modern NLP, and they have fallen out of use due to very poor performance compared to fully observed models. This work revisits the challenge of scaling HMMs to language modeling datasets, taking ideas from recent approaches to neural modeling. We propose methods for scaling HMMs to massive state spaces while maintaining efficient exact inference, a compact parameterization, and effective regularization. Experiments show that this approach leads to models that are more accurate than previous HMM and n-gram-based methods, making progress towards the performance of state-of-the-art neural models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harvardnlp/hmm-lm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.