TL;DR
This paper introduces a weakly-supervised method for gaze estimation that leverages activity labels from videos, particularly the 'looking at each other' activity, to improve accuracy and generalization in unconstrained environments.
Contribution
It proposes a novel training algorithm and loss functions that utilize LAEO activity labels for 3D gaze supervision, advancing weakly-supervised gaze estimation.
Findings
Significant accuracy improvements in semi-supervised gaze estimation.
Enhanced cross-domain generalization on in-the-wild benchmarks.
Open-sourced code for reproducibility and further research.
Abstract
A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios. In contrast, videos of human interactions in unconstrained environments are abundantly available and can be much more easily annotated with frame-level activity labels. In this work, we tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions. We leverage the insight that strong gaze-related geometric constraints exist when people perform the activity of "looking at each other" (LAEO). To acquire viable 3D gaze supervision from LAEO labels, we propose a training algorithm along with several novel loss functions especially designed for the task. With weak supervision from two large scale CMU-Panoptic and AVA-LAEO activity datasets, we show significant improvements in (a) the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
