N-gram-like Language Models Predict Reading Time Best
James A. Michaelov, Roger P. Levy

TL;DR
This paper shows that simple n-gram models predict reading time more accurately than complex transformer models, highlighting the importance of basic statistical structures in understanding reading behavior.
Contribution
The study demonstrates that n-gram-like models better predict reading time than advanced transformer models, suggesting simple statistical features are crucial for modeling reading behavior.
Findings
N-gram models correlate more strongly with reading time than transformers.
Transformer models' probabilities are less aligned with eye-tracking metrics.
Simple statistical models outperform complex models in predicting reading behavior.
Abstract
Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Neurobiology of Language and Bilingualism · Natural Language Processing Techniques
