Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span
Heeseung Yun, Joonil Na, Jaeyeon Kim, Calvin Murdock, Gunhee Kim

TL;DR
This paper introduces EgoSpanLift, a novel 3D forecasting method for predicting where egocentric users will focus their gaze next, leveraging 3D scene understanding and a large new benchmark dataset.
Contribution
EgoSpanLift transforms 2D gaze forecasting into 3D, integrating SLAM, volumetric analysis, and deep learning models, and provides a comprehensive egocentric 3D visual span benchmark.
Findings
Outperforms existing 2D gaze prediction baselines
Achieves accurate 3D visual span forecasting
Works effectively when projected onto 2D images
Abstract
People continuously perceive and interact with their surroundings based on underlying intentions that drive their exploration and behaviors. While research in egocentric user and scene understanding has focused primarily on motion and contact-based interaction, forecasting human visual perception itself remains less explored despite its fundamental role in guiding human actions and its implications for AR/VR and assistive technologies. We address the challenge of egocentric 3D visual span forecasting, predicting where a person's visual perception will focus next within their three-dimensional environment. To this end, we propose EgoSpanLift, a novel method that transforms egocentric visual span forecasting from 2D image planes to 3D scenes. EgoSpanLift converts SLAM-derived keypoints into gaze-compatible geometry and extracts volumetric visual span regions. We further combine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Face Recognition and Perception
