Individual corpora predict fast memory retrieval during reading
Markus J. Hofmann, Lara M\"uller, Andre R\"olke, Ralph Radach and, Chris Biemann

TL;DR
This study shows that individual reading experiences, captured through personal corpora, better predict early eye movement patterns during reading than generic norm-based corpora, highlighting the importance of personalized language models.
Contribution
It demonstrates that individual-specific corpora improve predictions of reading behavior over traditional norm-based models, emphasizing personalized language representations.
Findings
Individual corpora better predict eye movement durations.
Recently acquired information is accessed rapidly during reading.
Personalized models outperform norm-based models in predicting reading behavior.
Abstract
The corpus, from which a predictive language model is trained, can be considered the experience of a semantic system. We recorded everyday reading of two participants for two months on a tablet, generating individual corpus samples of 300/500K tokens. Then we trained word2vec models from individual corpora and a 70 million-sentence newspaper corpus to obtain individual and norm-based long-term memory structure. To test whether individual corpora can make better predictions for a cognitive task of long-term memory retrieval, we generated stimulus materials consisting of 134 sentences with uncorrelated individual and norm-based word probabilities. For the subsequent eye tracking study 1-2 months later, our regression analyses revealed that individual, but not norm-corpus-based word probabilities can account for first-fixation duration and first-pass gaze duration. Word length additionally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Text Readability and Simplification
