TL;DR
This paper introduces a new video dataset and method for end-to-end eye-tracking that leverages visual and eye image information to improve gaze estimation accuracy without requiring user-specific labeled data.
Contribution
The paper presents a novel dataset and a method that explicitly learn semantic and temporal eye-gaze relationships, enabling label-free refinement and improved accuracy in webcam-based eye tracking.
Findings
Achieved up to 28% improvement in Point-of-Gaze accuracy
Reduced angular error to 2.49 degrees
Demonstrated performance comparable to supervised personalization
Abstract
Estimating eye-gaze from images alone is a challenging task, in large parts due to un-observable person-specific factors. Achieving high accuracy typically requires labeled data from test users which may not be attainable in real applications. We observe that there exists a strong relationship between what users are looking at and the appearance of the user's eyes. In response to this understanding, we propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. Our video dataset consists of time-synchronized screen recordings, user-facing camera views, and eye gaze data, which allows for new benchmarks in temporal gaze tracking as well as label-free refinement of gaze. Importantly, we demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
