Gaze Estimation using Transformer

Yihua Cheng; Feng Lu

arXiv:2105.14424·cs.CV·June 1, 2021·1 cites

Gaze Estimation using Transformer

Yihua Cheng, Feng Lu

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of transformer architectures for gaze estimation, comparing pure and hybrid models, and demonstrates that hybrid transformers achieve state-of-the-art results with fewer parameters.

Contribution

It introduces a hybrid transformer model combining CNNs and transformers for gaze estimation, showing superior performance over pure transformers.

Findings

01

Hybrid transformer outperforms pure transformer in all datasets.

02

Hybrid transformer achieves state-of-the-art performance with fewer parameters.

03

Self-attention mechanism provides significant advantages in gaze estimation.

Abstract

Recent work has proven the effectiveness of transformers in many computer vision tasks. However, the performance of transformers in gaze estimation is still unexplored. In this paper, we employ transformers and assess their effectiveness for gaze estimation. We consider two forms of vision transformer which are pure transformers and hybrid transformers. We first follow the popular ViT and employ a pure transformer to estimate gaze from images. On the other hand, we preserve the convolutional layers and integrate CNNs as well as transformers. The transformer serves as a component to complement CNNs. We compare the performance of the two transformers in gaze estimation. The Hybrid transformer significantly outperforms the pure transformer in all evaluation datasets with less parameters. We further conduct experiments to assess the effectiveness of the hybrid transformer and explore the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yihuacheng/GazeTR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Advanced Computing and Algorithms · Retinal Imaging and Analysis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Dense Connections · Softmax · Vision Transformer