Temporal Action Localization with Multi-temporal Scales
Zan Gao, Xinglei Cui, Tao Zhuo, Zhiyong Cheng, An-An Liu, Meng Wang,, and Shenyong Chen

TL;DR
This paper introduces a multi-temporal scale approach for temporal action localization in videos, utilizing feature pyramids, a transformer encoder, and self-attention modules to improve accuracy and boundary detection.
Contribution
The paper proposes a novel multi-temporal scale feature pyramid, a spatial-temporal transformer encoder, and a frame-level self-attention module for enhanced action localization.
Findings
Outperforms state-of-the-art on THUMOS14 dataset
Achieves comparable results on ActivityNet1.3
Improves localization accuracy significantly
Abstract
Temporal action localization plays an important role in video analysis, which aims to localize and classify actions in untrimmed videos. The previous methods often predict actions on a feature space of a single-temporal scale. However, the temporal features of a low-level scale lack enough semantics for action classification while a high-level scale cannot provide rich details of the action boundaries. To address this issue, we propose to predict actions on a feature space of multi-temporal scales. Specifically, we use refined feature pyramids of different scales to pass semantics from high-level scales to low-level scales. Besides, to establish the long temporal scale of the entire video, we use a spatial-temporal transformer encoder to capture the long-range dependencies of video frames. Then the refined features with long-range dependencies are fed into a classifier for the coarse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
