Infinite Gaze Generation for Videos with Autoregressive Diffusion

Jenna Kang; Colin Groth; Tong Wu; Finley Torrens; Patsorn Sangkloy; Gordon Wetzstein; Qi Sun

arXiv:2603.24938·cs.CV·March 27, 2026

Infinite Gaze Generation for Videos with Autoregressive Diffusion

Jenna Kang, Colin Groth, Tong Wu, Finley Torrens, Patsorn Sangkloy, Gordon Wetzstein, Qi Sun

PDF

Open Access

TL;DR

This paper introduces an autoregressive diffusion model for generating continuous, long-range human gaze trajectories in videos, surpassing existing short-term models in accuracy and realism.

Contribution

It presents a novel generative framework for infinite-horizon gaze prediction, capturing long-term dependencies and detailed temporal dynamics in videos.

Findings

01

Outperforms existing models in long-range accuracy

02

Produces more realistic gaze trajectories

03

Handles videos of arbitrary length

Abstract

Predicting human gaze in video is fundamental to advancing scene understanding and multimodal interaction. While traditional saliency maps provide spatial probability distributions and scanpaths offer ordered fixations, both abstractions often collapse the fine-grained temporal dynamics of raw gaze. Furthermore, existing models are typically constrained to short-term windows ( $\approx$ 3-5s), failing to capture the long-range behavioral dependencies inherent in real-world content. We present a generative framework for infinite-horizon raw gaze prediction in videos of arbitrary length. By leveraging an autoregressive diffusion model, we synthesize gaze trajectories characterized by continuous spatial coordinates and high-resolution timestamps. Our model is conditioned on a saliency-aware visual latent space. Quantitative and qualitative evaluations demonstrate that our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Multimodal Machine Learning Applications