TL;DR
Gaze4HRI introduces a comprehensive benchmark and dataset for zero-shot 3D gaze estimation in human-robot interaction, revealing the importance of data diversity over complex modeling.
Contribution
The paper presents a large-scale HRI-specific gaze dataset and benchmark, highlighting data diversity as key to robustness over complex models.
Findings
All evaluated methods fail in at least one HRI condition.
PureGaze trained on ETH-X-Gaze maintains robustness across conditions.
Data diversity outweighs complex modeling for zero-shot gaze estimation.
Abstract
While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smaller and less varied sets, failing to assess true robustness. To bridge these gaps, we introduce Gaze4HRI, a large-scale dataset (50+ subjects, 3,000+ videos, 600,000+ frames) designed to evaluate state-of-the-art performance against critical HRI variables: illumination, head-gaze conflict, as well as the motion of camera and gaze target in video. Our benchmark reveals that all evaluated methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
