Differential Contrastive Training for Gaze Estimation
Lin Zhang, Yi Tian, XiYun Wang, Wanru Xu, Yi Jin, Yaping Huang

TL;DR
This paper introduces a novel Differential Contrastive Training strategy leveraging CLIP to improve gaze estimation accuracy and generalization across diverse scenarios, through a dual-branch network that combines appearance and semantic features.
Contribution
It proposes a new training strategy and a dual-branch network architecture that effectively utilize CLIP's semantic understanding for gaze estimation.
Findings
Significant performance improvements on four challenging datasets.
Effective cross-domain gaze estimation demonstrated.
Enhanced semantic understanding improves gaze prediction accuracy.
Abstract
The complex application scenarios have raised critical requirements for precise and generalizable gaze estimation methods. Recently, the pre-trained CLIP has achieved remarkable performance on various vision tasks, but its potentials have not been fully exploited in gaze estimation. In this paper, we propose a novel Differential Contrastive Training strategy, which boosts gaze estimation performance with the help of the CLIP. Accordingly, a Differential Contrastive Gaze Estimation network (DCGaze) composed of a Visual Appearance-aware branch and a Semantic Differential-aware branch is introduced. The Visual Appearance-aware branch is essentially a primary gaze estimation network and it incorporates an Adaptive Feature-refinement Unit (AFU) and a Double-head Gaze Regressor (DGR), which both help the primary network to extract informative and gaze-related appearance features. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Language-Image Pre-training
