Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough

Dario Paape; Tal Linzen; Shravan Vasishth

arXiv:2602.04489·cs.CL·February 5, 2026

Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough

Dario Paape, Tal Linzen, Shravan Vasishth

PDF

Open Access

TL;DR

This paper introduces a latent-process mixture model that better captures human reading behavior during sentence disambiguation tasks, outperforming GPT-2 surprisal-based models in predicting reading patterns.

Contribution

The study presents a novel mixture model that distinguishes different processing costs and accounts for inattentive reading, improving prediction of human reading behavior over existing surprisal-based models.

Findings

01

The model accurately reproduces rereading and comprehension patterns.

02

It outperforms GPT-2 surprisal models in predictive accuracy.

03

It provides more realistic estimates of processing costs.

Abstract

Using temporarily ambiguous garden-path sentences ("While the team trained the striker wondered ...") as a test case, we present a latent-process mixture model of human reading behavior across four different reading paradigms (eye tracking, uni- and bidirectional self-paced reading, Maze). The model distinguishes between garden-path probability, garden-path cost, and reanalysis cost, and yields more realistic processing cost estimates by taking into account trials with inattentive reading. We show that the model is able to reproduce empirical patterns with regard to rereading behavior, comprehension question responses, and grammaticality judgments. Cross-validation reveals that the mixture model also has better predictive fit to human reading patterns and end-of-trial task data than a mixture-free model based on GPT-2-derived surprisal values. We discuss implications for future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeurobiology of Language and Bilingualism · Text Readability and Simplification · Reading and Literacy Development