Beyond Scanpaths: Graph-Based Gaze Simulation in Dynamic Scenes
Luke Palmer, Petar Palasek, Hazem Abdelkawy

TL;DR
This paper introduces a novel graph-based model for simulating human gaze in dynamic scenes, leveraging a transformer architecture and a new dataset to improve the realism of gaze trajectory predictions.
Contribution
It proposes a unified approach combining a graph transformer and object density network to explicitly model gaze dynamics in complex environments, and releases a new gaze dataset.
Findings
Our model produces more natural gaze trajectories and scanpath dynamics.
It outperforms existing attention models in generating realistic saliency maps.
The Focus100 dataset enables training directly on raw gaze data without fixation filtering.
Abstract
Accurately modelling human attention is essential for numerous computer vision applications, particularly in the domain of automotive safety. Existing methods typically collapse gaze into saliency maps or scanpaths, treating gaze dynamics only implicitly. We instead formulate gaze modelling as an autoregressive dynamical system and explicitly unroll raw gaze trajectories over time, conditioned on both gaze history and the evolving environment. Driving scenes are represented as gaze-centric graphs processed by the Affinity Relation Transformer (ART), a heterogeneous graph transformer that models interactions between driver gaze, traffic objects, and road structure. We further introduce the Object Density Network (ODN) to predict next-step gaze distributions, capturing the stochastic and object-centric nature of attentional shifts in complex environments. We also release Focus100, a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
