Temporal Action Detection with Structured Segment Networks
Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, Dahua, Lin

TL;DR
This paper introduces the structured segment network (SSN), a novel end-to-end framework for accurate temporal action detection in videos, utilizing a structured temporal pyramid and a discriminative model to distinguish complete actions from incomplete or background segments.
Contribution
The paper proposes the SSN framework with a structured temporal pyramid and a decomposed discriminative model, along with the temporal actionness grouping scheme, achieving state-of-the-art results in action detection.
Findings
Outperforms previous methods on THUMOS14 and ActivityNet benchmarks.
Effectively distinguishes complete actions from incomplete or background segments.
Achieves superior accuracy and adaptability across various action temporal structures.
Abstract
Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and determining completeness. This allows the framework to effectively distinguish positive proposals from background or incomplete ones, thus leading to both accurate recognition and localization. These components are integrated into a unified network that can be efficiently trained in an end-to-end fashion. Additionally, a simple yet effective temporal action proposal scheme, dubbed temporal actionness grouping (TAG) is devised to generate high quality action proposals. On two challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
