Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading
Cfir Avraham Hadar, Omer Shubi, Yoav Meiri, Amit Heshes, Yevgeni Berzak

TL;DR
This paper demonstrates that open-ended reading goals can be decoded from eye movements using large-scale data and multimodal models, enabling understanding of reader intentions and advancing educational and assistive technologies.
Contribution
It introduces goal decoding tasks and evaluation frameworks for reading, and develops multimodal models that successfully decode and reconstruct reader goals from eye movements.
Findings
High accuracy in goal selection among multiple options
Progress in reconstructing precise goal formulations
Potential for real-time goal decoding in educational tools
Abstract
When reading, we often have specific information that interests us in a text. For example, you might be reading this paper because you are curious about LLMs for eye movements in reading, the experimental design, or perhaps you wonder ``This sounds like science fiction. Does it actually work?''. More broadly, in daily life, people approach texts with any number of text-specific goals that guide their reading behavior. In this work, we ask, for the first time, whether open-ended reading goals can be automatically decoded solely from eye movements in reading. To address this question, we introduce goal decoding tasks and evaluation frameworks using large-scale eye tracking for reading data in English with hundreds of text-specific information seeking tasks. We develop and compare several discriminative and generative multimodal text and eye movements LLMs for these tasks. Our experiments…
Peer Reviews
Decision·ICLR 2026 Poster
The dataset (OneStop) and problem setup is well suited to the objective of recovering reading goals. The evaluation regimes which include splitting data by new participant and new text is well conceived and creation of two tiers of difficulty are useful in comparing model performance in challenging settings. The authors experiment with different types of baselines – heuristics, discriminative models based on adaptions of prior work and generative LLM models (DalEye-LLaVA, DalEye-Llama).
1. I would like to see what types of gaze features (eg: fixation durations, word revisits) are more useful for recovering the information seeking goals. Stronger experiments are required to investigate the feature attributions by gradually phasing out these features one by one from the eye movements data to train the models. 2. It is not clear in the paper if the question and the text span containing the corresponding answer have significant substring overlap. If so, the problem becomes more tr
1.This is the first work to systematically address the decoding of arbitrary, text-specific information goals from eye movements. It moves beyond previous work that classified pre-defined procedural reading tasks (e.g., reading vs. skim-reading) to a more challenging and practically relevant semantic decoding task. The contribution is significant and opens a new research direction. 2.The experimental design is exemplary. The data splits are carefully constructed to evaluate generalization to "N
1. As the results show, the generative task is exceptionally difficult, and model performance, especially on new texts, is still limited. The generated questions are not yet on par with human-composed ones. 2.The paper successfully demonstrates that goals can be decoded, but offers less insight into how or why the models make their decisions from a cognitive perspective. The models remain somewhat black-box. 3. While generalization is tested, the significant performance drop in the "New Text
- The authors claim that they are the first study to decode open-ended, text-specific reading goals from eye movements, framed as dual tasks of selection and reconstruction. - Integrating text and gaze features markedly improves target selection; RoBERTEye-Fixations achieves 49.3% accuracy vs. 33% baseline. - DalEye-Llama attains 76.3% QA accuracy on unseen participants (vs. 68.1% for human distractors), validating the gaze–goal correspondence.
1. Lacks cognitive interpretation of gaze behavior and its link to goal decoding. 2. Sharp performance drop on unseen texts (Kappa 0.478 → 0.069) unexplained. 3. No comparison with fine-tuned multimodal LLMs (e.g., GPT-4o, LLaVA-1.5).
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Strategies and Epistemologies · Intelligent Tutoring Systems and Adaptive Learning · Visual and Cognitive Learning Processes
