Hybrid-Domain Adaptative Representation Learning for Gaze Estimation
Qida Tan, Hongyu Yang, Wenchao Du

TL;DR
This paper introduces HARL, a hybrid-domain adaptation framework for gaze estimation that disentangles gaze-relevant features and leverages head-pose constraints, achieving state-of-the-art accuracy across multiple datasets.
Contribution
The novel HARL framework effectively disentangles gaze features and incorporates head-pose information, improving cross-domain gaze estimation without significant computational costs.
Findings
Achieves state-of-the-art accuracy on EyeDiap, MPIIFaceGaze, and Gaze360 datasets.
Effectively disentangles gaze-relevant features from low-quality images.
Demonstrates strong cross-dataset generalization performance.
Abstract
Appearance-based gaze estimation, aiming to predict accurate 3D gaze direction from a single facial image, has made promising progress in recent years. However, most methods suffer significant performance degradation in cross-domain evaluation due to interference from gaze-irrelevant factors, such as expressions, wearables, and image quality. To alleviate this problem, we present a novel Hybrid-domain Adaptative Representation Learning (shorted by HARL) framework that exploits multi-source hybrid datasets to learn robust gaze representation. More specifically, we propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images in an unsupervised domain-adaptation manner, which hardly requires any computational or inference costs. Additionally, we analyze the effect of head-pose and design a simple yet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGaze Tracking and Assistive Technology · Face recognition and analysis · Visual Attention and Saliency Detection
