UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training
Jiawei Qin, Xucong Zhang, Yusuke Sugano

TL;DR
UniGaze introduces a large-scale self-supervised pre-training approach for gaze estimation, significantly enhancing cross-domain generalization and reducing dependence on labeled data by leveraging diverse in-the-wild facial datasets.
Contribution
It is the first to apply large-scale self-supervised pre-training to gaze estimation, demonstrating improved cross-domain performance and establishing effective pretraining strategies.
Findings
Self-supervised pretraining improves cross-domain gaze estimation.
Semantic-task designed pretraining approaches fail for gaze estimation.
UniGaze outperforms existing models in diverse dataset evaluations.
Abstract
Despite decades of research on data collection and model architectures, current gaze estimation models encounter significant challenges in generalizing across diverse data domains. Recent advances in self-supervised pre-training have shown remarkable performances in generalization across various vision tasks. However, their effectiveness in gaze estimation remains unexplored. We propose UniGaze, for the first time, leveraging large-scale in-the-wild facial datasets for gaze estimation through self-supervised pre-training. Through systematic investigation, we clarify critical factors that are essential for effective pretraining in gaze estimation. Our experiments reveal that self-supervised approaches designed for semantic tasks fail when applied to gaze estimation, while our carefully designed pre-training pipeline consistently improves cross-domain performance. Through comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Hand Gesture Recognition Systems · Facial Nerve Paralysis Treatment and Research
MethodsAttention Is All You Need · Label Smoothing · Layer Normalization · Linear Layer · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
