TL;DR
This survey reviews recent advances in automated recognition of human-human interactions in videos, emphasizing deep learning methods, challenges, datasets, and future research directions.
Contribution
It provides a comprehensive summary of challenges, datasets, and deep learning-based methods for recognizing human interactions from videos, highlighting recent progress and future outlook.
Findings
Deep learning and CNNs are prominent in recent methods.
Datasets vary in recording settings and interaction types.
Challenges include variation in appearance and coordination.
Abstract
Many videos depict people, and it is their interactions that inform us of their activities, relation to one another and the cultural and social setting. With advances in human action recognition, researchers have begun to address the automated recognition of these human-human interactions from video. The main challenges stem from dealing with the considerable variation in recording setting, the appearance of the people depicted and the coordinated performance of their interaction. This survey provides a summary of these challenges and datasets to address these, followed by an in-depth discussion of relevant vision-based recognition and detection methods. We focus on recent, promising work based on deep learning and convolutional neural networks (CNNs). Finally, we outline directions to overcome the limitations of the current state-of-the-art to analyze and, eventually, understand social…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
