Efficient MDI Adaptation for n-gram Language Models
Ruizhe Huang, Ke Li, Ashish Arora, Dan Povey, Sanjeev Khudanpur

TL;DR
This paper introduces a linear-time algorithm for efficient MDI-based n-gram language model adaptation, enabling practical application to large datasets with improved word error rates over simple interpolation.
Contribution
It presents a novel, scalable algorithm that reduces MDI adaptation complexity to linear time using backoff structure and hierarchical training, making it feasible for large-scale language models.
Findings
Algorithm achieves linear-time complexity per iteration.
MDI adaptation yields better WER than linear interpolation.
Scalability confirmed on very large datasets.
Abstract
This paper presents an efficient algorithm for n-gram language model adaptation under the minimum discrimination information (MDI) principle, where an out-of-domain language model is adapted to satisfy the constraints of marginal probabilities of the in-domain data. The challenge for MDI language model adaptation is its computational complexity. By taking advantage of the backoff structure of n-gram model and the idea of hierarchical training method, originally proposed for maximum entropy (ME) language models, we show that MDI adaptation can be computed in linear-time complexity to the inputs in each iteration. The complexity remains the same as ME models, although MDI is more general than ME. This makes MDI adaptation practical for large corpus and vocabulary. Experimental results confirm the scalability of our algorithm on very large datasets, while MDI adaptation gets slightly worse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
