EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning
Yue Jiang, Zixin Guo, Hamed Rezazadegan Tavakoli, Luis A. Leiva, Antti, Oulasvirta

TL;DR
EyeFormer is a novel model that combines Transformer-based policy networks with reinforcement learning to predict personalized eye scanpaths, including fixation points and durations, for individual users across different stimuli.
Contribution
It introduces a Transformer-guided reinforcement learning approach for personalized scanpath prediction, filling a gap in individual-level eye movement modeling.
Findings
Predicts full scanpaths including fixation positions and durations.
Capable of personalized predictions with few user samples.
Applications in GUI layout optimization.
Abstract
From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Gaze Tracking and Assistive Technology · Spam and Phishing Detection
MethodsDropout · Adam · Attention Is All You Need · Position-Wise Feed-Forward Layer · Linear Layer · Layer Normalization · Byte Pair Encoding · Absolute Position Encodings · Multi-Head Attention · Dense Connections
