TL;DR
This paper introduces a robust, real-time appearance-based gaze estimation framework that enhances generalization in unconstrained scenarios through data augmentation, multi-task learning, and new benchmark datasets, enabling mobile device deployment.
Contribution
It proposes a novel augmentation and multi-task learning approach to improve AGE robustness without extra human annotations, and curates new challenging datasets for evaluation.
Findings
Achieves competitive accuracy with less than 1% of UniGaze-H parameters.
Enhances gaze estimation robustness in unconstrained conditions.
Provides new benchmarks for evaluating gaze robustness.
Abstract
Appearance-based gaze estimation (AGE) has achieved remarkable performance in constrained settings, yet we reveal a significant generalization gap where existing AGE models often fail in practical, unconstrained scenarios, particularly those involving facial wearables and poor lighting conditions. We attribute this failure to two core factors: limited image diversity and inconsistent label fidelity across different datasets, especially along the pitch axis. To address these, we propose a robust AGE framework that enhances generalization without requiring additional human-annotated data. First, we expand the image manifold via an ensemble of augmentation techniques, including synthesis of eyeglasses, masks, and varied lighting. Second, to mitigate the impact of anisotropic inter-dataset label deviation, we reformulate gaze regression as a multi-task learning problem, incorporating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
