Aggregate and mixed-order Markov models for statistical language processing
Lawrence Saul, Fernando Pereira (AT&T Labs -- Research)

TL;DR
This paper explores intermediate language models, specifically aggregate and mixed-order Markov models, trained with EM algorithms, which improve prediction accuracy and reduce perplexity in language processing tasks.
Contribution
It introduces and evaluates aggregate and mixed-order Markov models that bridge different n-gram orders, enhancing language modeling performance.
Findings
Significant reduction in perplexity for unseen word combinations.
Effective use of EM algorithms for training complex models.
Improved accuracy over traditional n-gram models.
Abstract
We consider the use of language models whose size and accuracy are intermediate between different order n-gram models. Two types of models are studied in particular. Aggregate Markov models are class-based bigram models in which the mapping from words to classes is probabilistic. Mixed-order Markov models combine bigram models whose predictions are conditioned on different words. Both types of models are trained by Expectation-Maximization (EM) algorithms for maximum likelihood estimation. We examine smoothing procedures in which these models are interposed between different order n-grams. This is found to significantly reduce the perplexity of unseen word combinations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models · Speech Recognition and Synthesis
