Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with   Gaze Following Labels

Pierre Vuillecard; Jean-Marc Odobez

arXiv:2502.20249·cs.CV·February 28, 2025

Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

Pierre Vuillecard, Jean-Marc Odobez

PDF

Open Access

TL;DR

This paper introduces a novel weakly-supervised framework and a modality-agnostic transformer architecture for improved 3D gaze estimation in real-world environments, leveraging diverse datasets and pseudo-labels.

Contribution

The paper proposes ST-WSGE and Gaze Transformer, enabling better generalization and cross-modal performance in 3D gaze estimation using weak supervision and combined image/video data.

Findings

01

Achieved state-of-the-art results on Gaze360 and GFIE datasets.

02

Demonstrated superior cross-domain performance over existing methods.

03

Enhanced video gaze estimation accuracy with cross-modal data.

Abstract

Accurate 3D gaze estimation in unconstrained real-world environments remains a significant challenge due to variations in appearance, head pose, occlusion, and the limited availability of in-the-wild 3D gaze datasets. To address these challenges, we introduce a novel Self-Training Weakly-Supervised Gaze Estimation framework (ST-WSGE). This two-stage learning framework leverages diverse 2D gaze datasets, such as gaze-following data, which offer rich variations in appearances, natural scenes, and gaze distributions, and proposes an approach to generate 3D pseudo-labels and enhance model generalization. Furthermore, traditional modality-specific models, designed separately for images or videos, limit the effective use of available training data. To overcome this, we propose the Gaze Transformer (GaT), a modality-agnostic architecture capable of simultaneously learning static and dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Facial Nerve Paralysis Treatment and Research

MethodsAbsolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Label Smoothing · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer