Gaussian Temporal Awareness Networks for Action Localization

Fuchen Long; Ting Yao; Zhaofan Qiu; Xinmei Tian; Jiebo Luo; and Tao Mei

arXiv:1909.03877·cs.CV·September 10, 2019

Gaussian Temporal Awareness Networks for Action Localization

Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, and Tao Mei

PDF

Open Access 1 Repo

TL;DR

This paper introduces Gaussian Temporal Awareness Networks (GTAN), a novel approach for action localization in videos that dynamically models temporal structures using Gaussian kernels, improving robustness and accuracy over existing methods.

Contribution

The paper proposes GTAN, a new architecture that integrates Gaussian kernels to adaptively model temporal action proposals, addressing limitations of fixed-scale methods.

Findings

01

GTAN outperforms state-of-the-art methods on THUMOS14 and ActivityNet v1.3 datasets.

02

Achieves 1.9% and 1.1% improvements in mAP on the two datasets.

03

Demonstrates robustness in localizing actions with complex temporal variations.

Abstract

Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Nevertheless, the results can suffer from robustness problem due to the design of predetermined temporal scales, which overlooks the temporal structure of an action and limits the utility on detecting actions with complex variations. In this paper, we propose to address the problem by introducing Gaussian kernels to dynamically optimize temporal scale of each action proposal. Specifically, we present Gaussian Temporal Awareness Networks (GTAN) --- a new architecture that novelly integrates the exploitation of temporal structure into an one-stage action localization framework. Technically, GTAN models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

812618101/TAL-Demo
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications

MethodsRegion Proposal Network · RoIPool · Softmax · Faster R-CNN · Convolution · Non Maximum Suppression · 1x1 Convolution · SSD