Hierarchical Graph-RNNs for Action Detection of Multiple Activities
Sovan Biswas, Yaser Souri, Juergen Gall

TL;DR
This paper introduces a hierarchical graph-RNN framework that models both temporal context and inter-person action relations to improve multi-activity detection in videos, achieving state-of-the-art results on AVA.
Contribution
It presents a novel combination of temporal RNNs and graph RNNs for localized multi-activity detection in videos, integrating scene context and action relations.
Findings
Achieves state-of-the-art results on AVA dataset.
Effectively models temporal and relational aspects of activities.
Improves multi-activity localization accuracy.
Abstract
In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are modeled by a graph RNN. Both networks are trained together and the proposed approach achieves state of the art results on the AVA dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications
