Videos as Space-Time Region Graphs
Xiaolong Wang, Abhinav Gupta

TL;DR
This paper introduces a novel video representation as space-time region graphs that model temporal dynamics and object interactions, enabling improved action recognition performance.
Contribution
The paper proposes a new graph-based video representation capturing long-range object dependencies and interactions, and demonstrates its effectiveness with state-of-the-art results.
Findings
Achieved state-of-the-art results on Charades and Something-Something datasets.
Significant 4.4% performance gain on Charades in complex environments.
Effectively models temporal shape dynamics and object relationships.
Abstract
How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications
MethodsGraph Convolutional Networks
