Perplexity Cannot Always Tell Right from Wrong

Petar Veli\v{c}kovi\'c; Federico Barbero; Christos Perivolaropoulos; Simon Osindero; Razvan Pascanu

arXiv:2601.22950·cs.LG·February 2, 2026

Perplexity Cannot Always Tell Right from Wrong

Petar Veli\v{c}kovi\'c, Federico Barbero, Christos Perivolaropoulos, Simon Osindero, Razvan Pascanu

PDF

Open Access

TL;DR

This paper demonstrates that perplexity, a common metric for evaluating language models, can be misleading for model selection because it does not reliably correlate with actual accuracy, especially in the context of Transformer models.

Contribution

The paper provides a rigorous theoretical analysis showing the limitations of perplexity as a model selection metric for Transformer-based language models.

Findings

01

Perplexity may not indicate the most accurate model.

02

Accurate and confident predictions imply the existence of low-perplexity sequences.

03

Increases in confidence require corresponding accuracy improvements for model selection.

Abstract

Perplexity -- a function measuring a model's overall level of "surprise" when encountering a particular output -- has gained significant traction in recent years, both as a loss function and as a simple-to-compute metric of model quality. Prior studies have pointed out several limitations of perplexity, often from an empirical manner. Here we leverage recent results on Transformer continuity to show in a rigorous manner how perplexity may be an unsuitable metric for model selection. Specifically, we prove that, if there is any sequence that a compact decoder-only Transformer model predicts accurately and confidently -- a necessary pre-requisite for strong generalisation -- it must imply existence of another sequence with very low perplexity, but not predicted correctly by that same model. Further, by analytically studying iso-perplexity plots, we find that perplexity will not always…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)