TL;DR
MTGLS is a novel multi-task framework for gaze estimation that effectively leverages limited supervision and non-annotated data, achieving state-of-the-art results across multiple datasets.
Contribution
Introduces MTGLS, a multi-task learning approach that distills knowledge from existing models and uses auxiliary signals to improve gaze estimation with limited supervision.
Findings
Outperforms unsupervised state-of-the-art on CAVE by 6.43%.
Outperforms supervised state-of-the-art on Gaze360 by 6.59%.
Learns highly generalized eye feature representations.
Abstract
Robust gaze estimation is a challenging task, even for deep CNNs, due to the non-availability of large-scale labeled data. Moreover, gaze annotation is a time-consuming process and requires specialized hardware setups. We propose MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision, which leverages abundantly available non-annotated facial image data. MTGLS distills knowledge from off-the-shelf facial image analysis models, and learns strong feature representations of human eyes, guided by three complementary auxiliary signals: (a) the line of sight of the pupil (i.e. pseudo-gaze) defined by the localized facial landmarks, (b) the head-pose given by Euler angles, and (c) the orientation of the eye patch (left/right eye). To overcome inherent noise in the supervisory signals, MTGLS further incorporates a noise distribution modelling approach. Our experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
MTGLS: Multi-Task Gaze Estimation with Limited Supervision· youtube
