Probing for Reading Times
Eleftheria Tsipidi, Samuel Kiegeland, Francesco Ignazio Re, Tianyang Xu, Mario Giulianelli, Karolina Stanczak, Ryan Cotterell

TL;DR
This study investigates whether language models encode cognitive signals related to human reading times by comparing model representations with eye-tracking data across multiple languages.
Contribution
It demonstrates that early-layer representations in language models better predict early reading measures, revealing a functional alignment with human reading stages.
Findings
Early layers outperform surprisal in predicting first fixation and gaze duration.
Early-layer representations capture human-like processing signatures.
Combining surprisal with early-layer representations improves prediction accuracy.
Abstract
Probing has shown that language model representations encode rich linguistic information, but it remains unclear whether they also capture cognitive signals about human processing. In this work, we probe language model representations for human reading times. Using regularized linear regression on two eye-tracking corpora spanning five languages (English, Greek, Hebrew, Russian, and Turkish), we compare the representations from every model layer against scalar predictors -- surprisal, information value, and logit-lens surprisal. We find that the representations from early layers outperform surprisal in predicting early-pass measures such as first fixation and gaze duration. The concentration of predictive power in the early layers suggests that human-like processing signatures are captured by low-level structural or lexical representations, pointing to a functional alignment between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
