MGTR: End-to-End Mutual Gaze Detection with Transformer

Hang Guo; Zhengxi Hu; Jingtai Liu

arXiv:2209.10930·cs.CV·October 7, 2022·1 cites

MGTR: End-to-End Mutual Gaze Detection with Transformer

Hang Guo, Zhengxi Hu, Jingtai Liu

PDF

Open Access 1 Repo

TL;DR

MGTR introduces an end-to-end transformer-based framework for mutual gaze detection, improving speed and maintaining accuracy by jointly detecting heads and inferring gaze relationships in a single process.

Contribution

The paper presents a novel one-stage transformer-based approach for mutual gaze detection, streamlining the process and enhancing efficiency over traditional two-stage methods.

Findings

01

Accelerates mutual gaze detection without performance loss

02

Effectively captures semantic information at multiple levels

03

Demonstrates superior speed and accuracy on benchmark datasets

Abstract

People's looking at each other or mutual gaze is ubiquitous in our daily interactions, and detecting mutual gaze is of great significance for understanding human social scenes. Current mutual gaze detection methods focus on two-stage methods, whose inference speed is limited by the two-stage pipeline and the performance in the second stage is affected by the first one. In this paper, we propose a novel one-stage mutual gaze detection framework called Mutual Gaze TRansformer or MGTR to perform mutual gaze detection in an end-to-end manner. By designing mutual gaze instance triples, MGTR can detect each human head bounding box and simultaneously infer mutual gaze relationship based on global image information, which streamlines the whole process with simplicity. Experimental results on two mutual gaze datasets show that our method is able to accelerate mutual gaze detection process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gmbition/mgtr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Indoor and Outdoor Localization Technologies · Hand Gesture Recognition Systems

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings