Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed   Human Attention

Sounak Mondal; Zhibo Yang; Seoyoung Ahn; Dimitris Samaras; Gregory; Zelinsky; Minh Hoai

arXiv:2303.15274·cs.CV·July 4, 2023·1 cites

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory, Zelinsky, Minh Hoai

PDF

Open Access 1 Repo

TL;DR

Gazeformer is a transformer-based model that predicts human gaze for unseen objects using natural language encoding, achieving superior accuracy and speed in goal-directed attention tasks, especially in zero-shot scenarios.

Contribution

The paper introduces Gazeformer, a novel zero-shot gaze prediction model that encodes targets via language, overcoming scalability issues of previous detector-based methods.

Findings

01

Gazeformer outperforms existing models in ZeroGaze tasks.

02

It surpasses target-detection models on standard gaze prediction.

03

Gazeformer is over five times faster than previous models.

Abstract

Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target detectors for all possible objects, and the availability of human gaze data for their training (both not scalable). In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. In contrast to existing methods using object detector modules, Gazeformer encodes the target using a natural language model, thus leveraging semantic similarities in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvlab-stonybrook/gazeformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Neonatal and fetal brain pathology