Unigram-Normalized Perplexity as a Language Model Performance Measure with Different Vocabulary Sizes
Jihyeon Roh, Sang-Hoon Oh, Soo-Young Lee

TL;DR
This paper introduces a new unigram-normalized Perplexity metric that enables fair comparison of language models across different vocabulary sizes, addressing limitations of traditional Perplexity.
Contribution
It proposes a novel metric for evaluating language models that remains consistent across varying vocabulary sizes, supported by theoretical and experimental validation.
Findings
The new metric is robust to vocabulary size changes.
It effectively measures performance improvements over unigram models.
Theoretical analysis confirms the metric's validity.
Abstract
Although Perplexity is a widely used performance metric for language models, the values are highly dependent upon the number of words in the corpus and is useful to compare performance of the same corpus only. In this paper, we propose a new metric that can be used to evaluate language model performance with different vocabulary sizes. The proposed unigram-normalized Perplexity actually presents the performance improvement of the language models from that of simple unigram model, and is robust on the vocabulary size. Both theoretical analysis and computational experiments are reported.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
