Rapidly deploying on-device eye tracking by distilling visual foundation models
Cheng Jiang, Jogendra Kundu, David Colmenares, Fengting Yang, Joseph Robinson, Yatong An, Ali Behrooz

TL;DR
This paper introduces DistillGaze, a framework that efficiently adapts foundation models for high-accuracy, on-device eye tracking by leveraging synthetic and real data, addressing hardware variability challenges.
Contribution
It presents a novel two-stage distillation process that combines synthetic supervision and unlabeled real data for rapid, high-performance gaze estimation on diverse devices.
Findings
Reduces median gaze error by 58.62% compared to synthetic-only baselines.
Maintains a lightweight 256K-parameter model suitable for real-time deployment.
Demonstrates effective adaptation to hardware changes using synthetic and unlabeled real data.
Abstract
Eye tracking (ET) plays a critical role in augmented and virtual reality applications. However, rapidly deploying high-accuracy, on-device gaze estimation for new products remains challenging because hardware configurations (e.g., camera placement, camera pose, and illumination) often change across device generations. Visual foundation models (VFMs) are a promising direction for rapid training and deployment, and they excel on natural-image benchmarks; yet we find that off-the-shelf VFMs still struggle to achieve high accuracy on specialized near-eye infrared imagery. To address this gap, we introduce DistillGaze, a framework that distills a foundation model by leveraging labeled synthetic data and unlabeled real data for rapid and high-performance on-device gaze estimation. DistillGaze proceeds in two stages. First, we adapt a VFM into a domain-specialized teacher using self-supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
