Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos
Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid,, Hamid Rezatofighi

TL;DR
This paper introduces an end-to-end framework that simultaneously identifies social groups, individual actions, and group activities in videos, advancing understanding of complex social interactions.
Contribution
It proposes a novel trainable model for social grouping and activity recognition, achieving state-of-the-art results and extending existing datasets with new annotations.
Findings
State-of-the-art performance on social activity benchmarks
Effective social grouping and action prediction in videos
Enhanced dataset annotations for social activity analysis
Abstract
The state-of-the art solutions for human activity understanding from a video stream formulate the task as a spatio-temporal problem which requires joint localization of all individuals in the scene and classification of their actions or group activity over time. Who is interacting with whom, e.g. not everyone in a queue is interacting with each other, is often not predicted. There are scenarios where people are best to be split into sub-groups, which we call social groups, and each social group may be engaged in a different social activity. In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task. Our main contributions are: i) we propose an end-to-end trainable framework for the social task; ii) our proposed method also sets the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
