Differential Contrastive Training for Gaze Estimation

Lin Zhang; Yi Tian; XiYun Wang; Wanru Xu; Yi Jin; Yaping Huang

arXiv:2502.20128·cs.CV·July 31, 2025

Differential Contrastive Training for Gaze Estimation

Lin Zhang, Yi Tian, XiYun Wang, Wanru Xu, Yi Jin, Yaping Huang

PDF

TL;DR

This paper introduces a novel Differential Contrastive Training strategy leveraging CLIP to improve gaze estimation accuracy and generalization across diverse scenarios, through a dual-branch network that combines appearance and semantic features.

Contribution

It proposes a new training strategy and a dual-branch network architecture that effectively utilize CLIP's semantic understanding for gaze estimation.

Findings

01

Significant performance improvements on four challenging datasets.

02

Effective cross-domain gaze estimation demonstrated.

03

Enhanced semantic understanding improves gaze prediction accuracy.

Abstract

The complex application scenarios have raised critical requirements for precise and generalizable gaze estimation methods. Recently, the pre-trained CLIP has achieved remarkable performance on various vision tasks, but its potentials have not been fully exploited in gaze estimation. In this paper, we propose a novel Differential Contrastive Training strategy, which boosts gaze estimation performance with the help of the CLIP. Accordingly, a Differential Contrastive Gaze Estimation network (DCGaze) composed of a Visual Appearance-aware branch and a Semantic Differential-aware branch is introduced. The Visual Appearance-aware branch is essentially a primary gaze estimation network and it incorporates an Adaptive Feature-refinement Unit (AFU) and a Double-head Gaze Regressor (DGR), which both help the primary network to extract informative and gaze-related appearance features. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training