N-gram-like Language Models Predict Reading Time Best

James A. Michaelov; Roger P. Levy

arXiv:2603.09872·cs.CL·March 11, 2026

N-gram-like Language Models Predict Reading Time Best

James A. Michaelov, Roger P. Levy

PDF

Open Access

TL;DR

This paper shows that simple n-gram models predict reading time more accurately than complex transformer models, highlighting the importance of basic statistical structures in understanding reading behavior.

Contribution

The study demonstrates that n-gram-like models better predict reading time than advanced transformer models, suggesting simple statistical features are crucial for modeling reading behavior.

Findings

01

N-gram models correlate more strongly with reading time than transformers.

02

Transformer models' probabilities are less aligned with eye-tracking metrics.

03

Simple statistical models outperform complex models in predicting reading behavior.

Abstract

Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Neurobiology of Language and Bilingualism · Natural Language Processing Techniques