GazeShift: Unsupervised Gaze Estimation and Dataset for VR
Gil Shapira, Ishay Goldin, Evgeny Artyomov, Donghoon Kim, Yosi Keller, Niv Zehngut

TL;DR
GazeShift introduces an unsupervised, real-time gaze estimation framework tailored for VR headsets, leveraging a large-scale dataset and achieving high accuracy with minimal calibration and computational resources.
Contribution
The paper presents GazeShift, a novel unsupervised gaze estimation method specifically designed for near-eye VR imagery, along with the VRGaze dataset for training and evaluation.
Findings
Achieves 1.84° mean error with few-shot calibration.
Operates in 5 ms inference time on VR headset GPU.
Uses 10x fewer parameters and 35x fewer FLOPs than baseline methods.
Abstract
Gaze estimation is instrumental in modern virtual reality (VR) systems. Despite significant progress in remote-camera gaze estimation, VR gaze research remains constrained by data scarcity, particularly the lack of large-scale, accurately labeled datasets captured with the off-axis camera configurations typical of modern headsets. Gaze annotation is difficult since fixation on intended targets cannot be guaranteed. To address these challenges, we introduce VRGaze, the first large-scale off-axis gaze estimation dataset for VR, comprising 2.1 million near-eye infrared images collected from 68 participants. We further propose GazeShift, an attention-guided unsupervised framework for learning gaze representations without labeled data. Unlike prior redirection-based methods that rely on multi-view or 3D geometry, GazeShift is tailored to near-eye imagery, achieving effective gaze-appearance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Hand Gesture Recognition Systems · Visual Attention and Saliency Detection
