Sharingan: A Transformer-based Architecture for Gaze Following

Samy Tafasca; Anshul Gupta; Jean-Marc Odobez

arXiv:2310.00816·cs.CV·October 3, 2023·2 cites

Sharingan: A Transformer-based Architecture for Gaze Following

Samy Tafasca, Anshul Gupta, Jean-Marc Odobez

PDF

Open Access

TL;DR

This paper introduces a transformer-based model for 2D gaze following that outperforms previous CNN-based methods, enabling accurate multi-person gaze prediction in images and videos.

Contribution

It presents a novel transformer architecture for gaze prediction, including variants for heatmap and point regression, advancing multi-person gaze following capabilities.

Findings

01

Achieves state-of-the-art results on GazeFollow dataset

02

Enables multi-person gaze prediction with a single model

03

Outperforms CNN-based approaches in accuracy

Abstract

Gaze is a powerful form of non-verbal communication and social interaction that humans develop from an early age. As such, modeling this behavior is an important task that can benefit a broad set of application domains ranging from robotics to sociology. In particular, Gaze Following is defined as the prediction of the pixel-wise 2D location where a person in the image is looking. Prior efforts in this direction have focused primarily on CNN-based architectures to perform the task. In this paper, we introduce a novel transformer-based architecture for 2D gaze prediction. We experiment with 2 variants: the first one retains the same task formulation of predicting a gaze heatmap for one person at a time, while the second one casts the problem as a 2D point regression and allows us to perform multi-person gaze prediction with a single forward pass. This new architecture achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Neonatal and fetal brain pathology · Hand Gesture Recognition Systems

MethodsHeatmap