Gaussian Temporal Awareness Networks for Action Localization
Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, and Tao Mei

TL;DR
This paper introduces Gaussian Temporal Awareness Networks (GTAN), a novel approach for action localization in videos that dynamically models temporal structures using Gaussian kernels, improving robustness and accuracy over existing methods.
Contribution
The paper proposes GTAN, a new architecture that integrates Gaussian kernels to adaptively model temporal action proposals, addressing limitations of fixed-scale methods.
Findings
GTAN outperforms state-of-the-art methods on THUMOS14 and ActivityNet v1.3 datasets.
Achieves 1.9% and 1.1% improvements in mAP on the two datasets.
Demonstrates robustness in localizing actions with complex temporal variations.
Abstract
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex variations. In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. Specifically, we present Gaussian Temporal Awareness Networks (GTAN) --- a new architecture that novelly integrates the exploitation of temporal structure into an one-stage action localization framework. Technically, GTAN models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications
MethodsRegion Proposal Network · RoIPool · Softmax · Faster R-CNN · Convolution · Non Maximum Suppression · 1x1 Convolution · SSD
