On estimating gaze by self-attention augmented convolutions
Gabriel Lefundes, Luciano Oliveira

TL;DR
This paper introduces ARes-gaze, a novel gaze estimation network using self-attention augmented convolutions, achieving higher accuracy by capturing dependencies in full-face images with a shallower residual network.
Contribution
The paper proposes a new self-attention augmented convolutional architecture for gaze estimation, improving feature learning and accuracy over existing methods.
Findings
Decreased average angular error by 2.38% on MPIIFaceGaze dataset
Achieved second place on EyeDiap dataset
First to perform well on both datasets simultaneously
Abstract
Estimation of 3D gaze is highly relevant to multiple fields, including but not limited to interactive systems, specialized human-computer interfaces, and behavioral research. Although recently deep learning methods have boosted the accuracy of appearance-based gaze estimation, there is still room for improvement in the network architectures for this particular task. Therefore we propose here a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features during the training of a shallower residual network. The rationale is that self-attention mechanism can help outperform deeper architectures by learning dependencies between distant regions in full-face images. This mechanism can also create better and more spatially-aware feature representations derived from the face and eye images before gaze regression. We dubbed our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Retinal Imaging and Analysis
Methods1x1 Convolution · Kaiming Initialization · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling · Residual Block · Average Pooling · Bottleneck Residual Block · Convolution
