Aggregate and mixed-order Markov models for statistical language   processing

Lawrence Saul; Fernando Pereira (AT&T Labs -- Research)

arXiv:cmp-lg/9706007·cmp-lg·February 3, 2008·139 cites

Aggregate and mixed-order Markov models for statistical language processing

Lawrence Saul, Fernando Pereira (AT&T Labs -- Research)

PDF

Open Access

TL;DR

This paper explores intermediate language models, specifically aggregate and mixed-order Markov models, trained with EM algorithms, which improve prediction accuracy and reduce perplexity in language processing tasks.

Contribution

It introduces and evaluates aggregate and mixed-order Markov models that bridge different n-gram orders, enhancing language modeling performance.

Findings

01

Significant reduction in perplexity for unseen word combinations.

02

Effective use of EM algorithms for training complex models.

03

Improved accuracy over traditional n-gram models.

Abstract

We consider the use of language models whose size and accuracy are intermediate between different order n-gram models. Two types of models are studied in particular. Aggregate Markov models are class-based bigram models in which the mapping from words to classes is probabilistic. Mixed-order Markov models combine bigram models whose predictions are conditioned on different words. Both types of models are trained by Expectation-Maximization (EM) algorithms for maximum likelihood estimation. We examine smoothing procedures in which these models are interposed between different order n-grams. This is found to significantly reduce the perplexity of unseen word combinations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models · Speech Recognition and Synthesis