Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani, Yingli Tian

TL;DR
This survey comprehensively reviews deep learning methods for temporal and spatio-temporal action detection in untrimmed videos, covering various supervision levels, datasets, evaluation metrics, and real-world applications.
Contribution
It provides an extensive overview of recent algorithms, benchmarks, and challenges in deep learning-based action detection in untrimmed videos, highlighting future research directions.
Findings
Deep learning methods vary across supervision levels.
Benchmark datasets and metrics are standardized.
State-of-the-art methods show promising results.
Abstract
Understanding human behavior and activity facilitates advancement of numerous real-world applications, and is critical for video analysis. Despite the progress of action recognition algorithms in trimmed videos, the majority of real-world videos are lengthy and untrimmed with sparse segments of interest. The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories. Temporal activity detection task has been investigated in full and limited supervision settings depending on the availability of action annotations. This paper provides an extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels including fully-supervised, weakly-supervised, unsupervised, self-supervised, and semi-supervised. In addition, this paper also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
