Unigram-Normalized Perplexity as a Language Model Performance Measure   with Different Vocabulary Sizes

Jihyeon Roh; Sang-Hoon Oh; Soo-Young Lee

arXiv:2011.13220·cs.CL·November 30, 2020

Unigram-Normalized Perplexity as a Language Model Performance Measure with Different Vocabulary Sizes

Jihyeon Roh, Sang-Hoon Oh, Soo-Young Lee

PDF

Open Access

TL;DR

This paper introduces a new unigram-normalized Perplexity metric that enables fair comparison of language models across different vocabulary sizes, addressing limitations of traditional Perplexity.

Contribution

It proposes a novel metric for evaluating language models that remains consistent across varying vocabulary sizes, supported by theoretical and experimental validation.

Findings

01

The new metric is robust to vocabulary size changes.

02

It effectively measures performance improvements over unigram models.

03

Theoretical analysis confirms the metric's validity.

Abstract

Although Perplexity is a widely used performance metric for language models, the values are highly dependent upon the number of words in the corpus and is useful to compare performance of the same corpus only. In this paper, we propose a new metric that can be used to evaluate language model performance with different vocabulary sizes. The proposed unigram-normalized Perplexity actually presents the performance improvement of the language models from that of simple unigram model, and is robust on the vocabulary size. Both theoretical analysis and computational experiments are reported.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis