Interaction-aware Joint Attention Estimation Using People Attributes
Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita

TL;DR
This paper introduces a novel Transformer-based approach for joint attention estimation in images, explicitly modeling interactions among people attributes and improving accuracy over existing methods.
Contribution
It proposes a new interaction-aware Transformer model that explicitly encodes relationships among people attributes for more accurate joint attention estimation.
Findings
Outperforms state-of-the-art methods quantitatively
Uses a novel Transformer-based attention network with pixelwise confidence prediction
Integrates with image-based attention for improved results
Abstract
This paper proposes joint attention estimation in a single image. Different from related work in which only the gaze-related attributes of people are independently employed, (I) their locations and actions are also employed as contextual cues for weighting their attributes, and (ii) interactions among all of these attributes are explicitly modeled in our method. For the interaction modeling, we propose a novel Transformer-based attention network to encode joint attention as low-dimensional features. We introduce a specialized MLP head with positional embedding to the Transformer so that it predicts pixelwise confidence of joint attention for generating the confidence heatmap. This pixelwise prediction improves the heatmap accuracy by avoiding the ill-posed problem in which the high-dimensional heatmap is predicted from the low-dimensional features. The estimated joint attention is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Interaction-aware Joint Attention Estimation Using People Attributes· youtube
Taxonomy
TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout
