Hierarchical Graph-RNNs for Action Detection of Multiple Activities

Sovan Biswas; Yaser Souri; Juergen Gall

arXiv:2101.08581·cs.CV·January 22, 2021

Hierarchical Graph-RNNs for Action Detection of Multiple Activities

Sovan Biswas, Yaser Souri, Juergen Gall

PDF

Open Access

TL;DR

This paper introduces a hierarchical graph-RNN framework that models both temporal context and inter-person action relations to improve multi-activity detection in videos, achieving state-of-the-art results on AVA.

Contribution

It presents a novel combination of temporal RNNs and graph RNNs for localized multi-activity detection in videos, integrating scene context and action relations.

Findings

01

Achieves state-of-the-art results on AVA dataset.

02

Effectively models temporal and relational aspects of activities.

03

Improves multi-activity localization accuracy.

Abstract

In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are modeled by a graph RNN. Both networks are trained together and the proposed approach achieves state of the art results on the AVA dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications