Clozing the Gap: Exploring Why Language Model Surprisal Outperforms Cloze Surprisal

Sathvik Nair; Byung-Doh Oh

arXiv:2601.09886·cs.CL·January 16, 2026

Clozing the Gap: Exploring Why Language Model Surprisal Outperforms Cloze Surprisal

Sathvik Nair, Byung-Doh Oh

PDF

Open Access

TL;DR

This paper investigates why language model surprisal better predicts language processing effort than cloze surprisal, highlighting the importance of resolution, semantic distinction, and frequency sensitivity in probabilistic predictions.

Contribution

The study provides evidence for three reasons why LM probabilities outperform cloze data, emphasizing the need to improve cloze resolution and understand human prediction mechanisms.

Findings

01

LM probabilities do not suffer from low resolution

02

LM distinguishes semantically similar words better

03

LM assigns more accurate probabilities to low-frequency words

Abstract

How predictable a word is can be quantified in two ways: using human responses to the cloze task or using probabilities from language models (LMs).When used as predictors of processing effort, LM probabilities outperform probabilities derived from cloze data. However, it is important to establish that LM probabilities do so for the right reasons, since different predictors can lead to different scientific conclusions about the role of prediction in language comprehension. We present evidence for three hypotheses about the advantage of LM probabilities: not suffering from low resolution, distinguishing semantically similar words, and accurately assigning probabilities to low-frequency words. These results call for efforts to improve the resolution of cloze studies, coupled with experiments on whether human-like prediction is also as sensitive to the fine-grained distinctions made by LM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Language Development and Disorders · Language and cultural evolution