Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues

Cristina Palmero; Javier Selva; Mohammad Ali Bagheri; Sergio Escalera

arXiv:1805.03064·cs.CV·September 18, 2018·26 cites

Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues

Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, Sergio Escalera

PDF

Open Access 3 Repos

TL;DR

This paper introduces a multi-modal recurrent CNN approach for 3D gaze estimation that integrates appearance and shape cues, significantly improving accuracy over previous methods.

Contribution

It presents a novel multi-modal recurrent CNN architecture that combines face, eyes, and landmarks for person- and head pose-independent 3D gaze estimation.

Findings

01

Achieved 14.6% improvement over state-of-the-art on EYEDIAP dataset.

02

Further improved accuracy by 4% using temporal information.

03

Effective across diverse head poses and gaze directions.

Abstract

Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on EYEDIAP dataset, further improved by 4% when the temporal modality is included.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Hand Gesture Recognition Systems · Advanced Computing and Algorithms