A Survey on Deep Learning-based Spatio-temporal Action Detection

Peng Wang; Fanwei Zeng; Yuntao Qian

arXiv:2308.01618·cs.CV·August 4, 2023

A Survey on Deep Learning-based Spatio-temporal Action Detection

Peng Wang, Fanwei Zeng, Yuntao Qian

PDF

Open Access

TL;DR

This paper reviews recent deep learning methods for spatio-temporal action detection in videos, discussing their taxonomy, linking algorithms, datasets, evaluation metrics, and future research directions.

Contribution

It provides a comprehensive taxonomy and comparison of state-of-the-art deep learning approaches for STAD, highlighting current challenges and future research directions.

Findings

01

Performance benchmarks of leading models are summarized.

02

Linking algorithms effectively associate detection results over time.

03

Potential research directions are discussed for advancing STAD.

Abstract

Spatio-temporal action detection (STAD) aims to classify the actions present in a video and localize them in space and time. It has become a particularly active area of research in computer vision because of its explosively emerging real-world applications, such as autonomous driving, visual surveillance, entertainment, etc. Many efforts have been devoted in recent years to building a robust and effective framework for STAD. This paper provides a comprehensive review of the state-of-the-art deep learning-based methods for STAD. Firstly, a taxonomy is developed to organize these methods. Next, the linking algorithms, which aim to associate the frame- or clip-level detection results together to form action tubes, are reviewed. Then, the commonly used benchmark datasets and evaluation metrics are introduced, and the performance of state-of-the-art models is compared. At last, this paper is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications