Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset
Eduardo Santos, Juliana Carvalho, C\'esar Renn\'o-Costa

TL;DR
This paper introduces and validates an eye-tracking dataset capturing cognitive effort in L2 idiomatic processing, useful for benchmarking human and AI language understanding across proficiency levels.
Contribution
It provides a novel, validated eye-tracking dataset for studying L2 idiomatic comprehension, enabling evaluation of models against human cognitive processing.
Findings
Higher proficiency correlates with fewer regressions in eye movements.
60 Hz hardware is sufficient for detecting key eye-tracking metrics.
The dataset supports benchmarking of language models against human cognitive data.
Abstract
This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
