Extending Minimal Pairs with Ordinal Surprisal Curves and Entropy Across Applied Domains
Andrew Katz

TL;DR
This paper extends the minimal pairs paradigm by using surprisal curves and entropy to evaluate language models across multiple domains, providing richer insights into model uncertainty and response preferences beyond binary judgments.
Contribution
It introduces a novel surprisal-based evaluation framework that applies ordinal-scaled scoring and measures model uncertainty across diverse tasks, moving beyond binary grammaticality judgments.
Findings
Surprisal curves produce interpretable classification signals.
Entropy distinguishes ambiguous items from easier ones.
Framework applicable across multiple domains.
Abstract
The minimal pairs paradigm of comparing model probabilities for contrasting completions has proven useful for evaluating linguistic knowledge in language models, yet its application has largely been confined to binary grammaticality judgments over syntactic phenomena. Additionally, standard prompting-based evaluation requires expensive text generation, may elicit post-hoc rationalizations rather than model judgments, and discards information about model uncertainty. We address both limitations by extending surprisal-based evaluation from binary grammaticality contrasts to ordinal-scaled classification and scoring tasks across multiple domains. Rather than asking models to generate answers, we measure the information-theoretic "surprise" (negative log probability) they assign to each position on rating scales (e.g., 1-5 or 1-9), yielding full surprisal curves that reveal both the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Syntax, Semantics, Linguistic Variation · Neurobiology of Language and Bilingualism
