Perplexity Cannot Always Tell Right from Wrong
Petar Veli\v{c}kovi\'c, Federico Barbero, Christos Perivolaropoulos, Simon Osindero, Razvan Pascanu

TL;DR
This paper demonstrates that perplexity, a common metric for evaluating language models, can be misleading for model selection because it does not reliably correlate with actual accuracy, especially in the context of Transformer models.
Contribution
The paper provides a rigorous theoretical analysis showing the limitations of perplexity as a model selection metric for Transformer-based language models.
Findings
Perplexity may not indicate the most accurate model.
Accurate and confident predictions imply the existence of low-perplexity sequences.
Increases in confidence require corresponding accuracy improvements for model selection.
Abstract
Perplexity -- a function measuring a model's overall level of "surprise" when encountering a particular output -- has gained significant traction in recent years, both as a loss function and as a simple-to-compute metric of model quality. Prior studies have pointed out several limitations of perplexity, often from an empirical manner. Here we leverage recent results on Transformer continuity to show in a rigorous manner how perplexity may be an unsuitable metric for model selection. Specifically, we prove that, if there is any sequence that a compact decoder-only Transformer model predicts accurately and confidently -- a necessary pre-requisite for strong generalisation -- it must imply existence of another sequence with very low perplexity, but not predicted correctly by that same model. Further, by analytically studying iso-perplexity plots, we find that perplexity will not always…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
