StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Daeun Lee; Subhojyoti Mukherjee; Branislav Kveton; Ryan A. Rossi; Viet Dac Lai; Seunghyun Yoon; Trung Bui; Franck Dernoncourt; Mohit Bansal

arXiv:2512.01707·cs.CV·May 14, 2026

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Daeun Lee, Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Mohit Bansal

PDF

1 Repo

TL;DR

StreamGaze introduces a new benchmark to evaluate how well Multimodal Large Language Models utilize gaze signals for temporal reasoning and proactive understanding in streaming videos, highlighting current limitations.

Contribution

This work presents the first benchmark specifically designed to assess gaze-guided reasoning in streaming video understanding with MLLMs, including a novel QA generation pipeline.

Findings

01

MLLMs lag behind humans in gaze-based streaming video tasks.

02

Gaze-guided reasoning reveals key limitations in current models.

03

Analysis suggests directions for improving gaze utilization in models.

Abstract

Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications such as Augmented Reality (AR) glasses. While prior streaming benchmarks evaluate temporal reasoning, none measure whether Multimodal Large Language Models (MLLMs) can interpret or leverage human gaze signals within a streaming setting. To fill this gap, we introduce StreamGaze, the first benchmark designed to evaluate how effectively MLLMs utilize gaze for temporal and proactive reasoning in streaming videos. StreamGaze introduces gaze-guided past, present, and proactive tasks that comprehensively assess streaming video understanding. These tasks evaluate whether models can use real-time gaze signals to follow shifting attention and infer user intentions based only on past and currently observed frames. To build StreamGaze, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daeunni/StreamGaze
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.