A Novel Framework for Multi-Person Temporal Gaze Following and Social   Gaze Prediction

Anshul Gupta; Samy Tafasca; Arya Farkhondeh; Pierre Vuillecard,; Jean-Marc Odobez

arXiv:2403.10511·cs.CV·March 18, 2024·1 cites

A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction

Anshul Gupta, Samy Tafasca, Arya Farkhondeh, Pierre Vuillecard,, Jean-Marc Odobez

PDF

Open Access

TL;DR

This paper introduces a unified, transformer-based framework for joint multi-person gaze following and social gaze prediction, leveraging a new dataset to improve generalization and performance in understanding social interactions.

Contribution

The paper presents a novel joint model and a new dataset, VSGaze, enabling simultaneous prediction of gaze targets and social gaze labels for multiple individuals.

Findings

01

Achieves state-of-the-art results on multi-person gaze following

02

Successfully predicts social gaze labels in complex scenes

03

Demonstrates the effectiveness of a unified temporal transformer approach

Abstract

Gaze following and social gaze prediction are fundamental tasks providing insights into human communication behaviors, intent, and social interactions. Most previous approaches addressed these tasks separately, either by designing highly specialized social gaze models that do not generalize to other social gaze tasks or by considering social gaze inference as an ad-hoc post-processing of the gaze following task. Furthermore, the vast majority of gaze following approaches have proposed static models that can handle only one person at a time, therefore failing to take advantage of social interactions and temporal dynamics. In this paper, we address these limitations and introduce a novel framework to jointly predict the gaze target and social gaze label for all people in the scene. The framework comprises of: (i) a temporal, transformer-based architecture that, in addition to image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Human Pose and Action Recognition