TL;DR
LAEO-Net++ is a deep learning model that detects mutual gaze between people in videos by analyzing spatio-temporal tracks, introducing new datasets, and outperforming previous methods in accuracy.
Contribution
The paper presents LAEO-Net++, a novel CNN architecture that reasons about entire tracks for LAEO detection, along with two new datasets and applications to social relationship inference.
Findings
Achieves state-of-the-art results on TVHID-LAEO dataset.
Successfully determines if two people are LAEO and identifies when it occurs.
Enables inference of social relationships based on LAEO patterns.
Abstract
Capturing the 'mutual gaze' of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this purpose, we propose LAEO-Net++, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net++ takes spatio-temporal tracks as input and reasons about the whole track. It consists of three branches, one for each character's tracked head and one for their relative position. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO. A thorough experimental evaluation demonstrates the ability of LAEO-Net++ to successfully determine if two people are LAEO and the temporal window where it happens. Our model achieves state-of-the-art results on the existing TVHID-LAEO video dataset, significantly outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
