PoTeC: A German Naturalistic Eye-tracking-while-reading Corpus
Deborah N. Jakobi, Thomas Kern, David R. Reich, Patrick Haller, Lena A. J\"ager

TL;DR
PoTeC is a comprehensive eye-tracking corpus capturing naturalistic reading behaviors of experts and novices across scientific texts, enabling diverse research on reading strategies and comprehension.
Contribution
It introduces the first naturalistic eye-tracking corpus with domain-expert and novice data, using a factorial design and detailed linguistic annotations.
Findings
Includes eye-movement data from 75 participants reading 12 texts
Contains annotations for linguistic features at multiple levels
Data and code are openly available for research use
Abstract
The Potsdam Textbook Corpus (PoTeC) is a naturalistic eye-tracking-while-reading corpus containing data from 75 participants reading 12 scientific texts. PoTeC is the first naturalistic eye-tracking-while-reading corpus that contains eye-movements from domain-experts as well as novices in a within-participant manipulation: It is based on a 2x2x2 fully-crossed factorial design which includes the participants' level of study and the participants' discipline of study as between-subject factors and the text domain as a within-subject factor. The participants' reading comprehension was assessed by a series of text comprehension questions and their domain knowledge was tested by text-independent background questions for each of the texts. The materials are annotated for a variety of linguistic features at different levels. We envision PoTeC to be used for a wide range of studies including but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
