Video action detection by learning graph-based spatio-temporal   interactions

Matteo Tomei; Lorenzo Baraldi; Simone Calderara; Simone Bronzin; Rita; Cucchiara

arXiv:1912.04316·cs.CV·March 2, 2021

Video action detection by learning graph-based spatio-temporal interactions

Matteo Tomei, Lorenzo Baraldi, Simone Calderara, Simone Bronzin, Rita, Cucchiara

PDF

1 Repo

TL;DR

This paper introduces a graph-based framework that models high-level spatio-temporal interactions in videos for action detection, leveraging self-attention on a multi-layer graph to capture long-range dependencies, achieving state-of-the-art results.

Contribution

It proposes a backbone-independent, non-end-to-end graph-based approach for learning spatio-temporal relationships in video action detection.

Findings

01

State-of-the-art results on AVA dataset

02

Consistent improvements over various backbones

03

Effective modeling of long-range spatial-temporal dependencies

Abstract

Action Detection is a complex task that aims to detect and classify human actions in video clips. Typically, it has been addressed by processing fine-grained features extracted from a video classification backbone. Recently, thanks to the robustness of object and people detectors, a deeper focus has been added on relationship modelling. Following this line, we propose a graph-based framework to learn high-level interactions between people and objects, in both space and time. In our formulation, spatio-temporal relationships are learned through self-attention on a multi-layer graph structure which can connect entities from consecutive clips, thus considering long-range spatial and temporal dependencies. The proposed module is backbone independent by design and does not require end-to-end training. Extensive experiments are conducted on the AVA dataset, where our model demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/STAGE_action_detection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.