STARE: Predicting Decision Making Based on Spatio-Temporal Eye Movements
Moshe Unger, Alexander Tuzhilin, and Michel Wedel

TL;DR
This paper introduces STARE, a deep learning model that predicts consumer decision-making from eye movement data by leveraging a novel tokenization strategy and a foundation model architecture.
Contribution
It presents a new deep learning architecture, STARE, that effectively models spatio-temporal eye movements for predicting consumer choices, filling a gap in foundational models for this task.
Findings
STARE outperforms several state-of-the-art models on multiple datasets.
The tokenization strategy effectively captures spatial information from eye movements.
The model demonstrates the potential of deep learning in understanding visual attention and decision-making.
Abstract
The present work proposes a Deep Learning architecture for the prediction of various consumer choice behaviors from time series of raw gaze or eye fixations on images of the decision environment, for which currently no foundational models are available. The architecture, called STARE (Spatio-Temporal Attention Representation for Eye Tracking), uses a new tokenization strategy, which involves mapping the x- and y- pixel coordinates of eye-movement time series on predefined, contiguous Regions of Interest. That tokenization makes the spatio-temporal eye-movement data available to the Chronos, a time-series foundation model based on the T5 architecture, to which co-attention and/or cross-attention is added to capture directional and/or interocular influences of eye movements. We compare STARE with several state-of-the art alternatives on multiple datasets with the purpose of predicting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
