EMTeC: A Corpus of Eye Movements on Machine-Generated Texts
Lena Sophia Bolliger, Patrick Haller, Isabelle Caroline Rose, Cretton, David Robert Reich, Tannon Kew, Lena Ann J\"ager

TL;DR
EMTeC provides a comprehensive eye-movement corpus for studying how humans read machine-generated texts, including detailed data, model internals, and annotations, enabling diverse research on reading behavior and language models.
Contribution
This paper introduces EMTeC, a novel corpus combining eye-tracking data, model internals, and linguistic annotations for machine-generated texts, facilitating multifaceted research.
Findings
Provides detailed eye movement data on machine-generated texts
Includes model internals like attention and hidden states
Enables analysis of decoding strategies and text types
Abstract
The Eye Movements on Machine-Generated Texts Corpus (EMTeC) is a naturalistic eye-movements-while-reading corpus of 107 native English speakers reading machine-generated texts. The texts are generated by three large language models using five different decoding strategies, and they fall into six different text type categories. EMTeC entails the eye movement data at all stages of pre-processing, i.e., the raw coordinate data sampled at 2000 Hz, the fixation sequences, and the reading measures. It further provides both the original and a corrected version of the fixation sequences, accounting for vertical calibration drift. Moreover, the corpus includes the language models' internals that underlie the generation of the stimulus texts: the transition scores, the attention scores, and the hidden states. The stimuli are annotated for a range of linguistic features both at text and at word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSoftmax · Attention Is All You Need
