Prior-enhanced Temporal Action Localization using Subject-aware Spatial   Attention

Yifan Liu; Youbao Tang; Ning Zhang; Ruei-Sung Lin; Haoqian; Wang

arXiv:2211.05299·cs.CV·November 11, 2022

Prior-enhanced Temporal Action Localization using Subject-aware Spatial Attention

Yifan Liu, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Haoqian, Wang

PDF

Open Access

TL;DR

This paper introduces PETAL, a novel method for temporal action localization that uses subject priors and spatial attention to improve boundary detection in videos, achieving competitive results with only RGB input.

Contribution

The paper proposes a subject-aware spatial attention module (SA-SAM) that incorporates action subject priors into temporal action localization, enhancing boundary detection without additional modalities.

Findings

01

PETAL outperforms existing methods on THUMOS-14 with a 2.41% mAP increase.

02

The approach achieves competitive results using only RGB features.

03

Experimental results validate the effectiveness of subject priors in TAL.

Abstract

Temporal action localization (TAL) aims to detect the boundary and identify the class of each action instance in a long untrimmed video. Current approaches treat video frames homogeneously, and tend to give background and key objects excessive attention. This limits their sensitivity to localize action boundaries. To this end, we propose a prior-enhanced temporal action localization method (PETAL), which only takes in RGB input and incorporates action subjects as priors. This proposal leverages action subjects' information with a plug-and-play subject-aware spatial attention module (SA-SAM) to generate an aggregated and subject-prioritized representation. Experimental results on THUMOS-14 and ActivityNet-1.3 datasets demonstrate that the proposed PETAL achieves competitive performance using only RGB features, e.g., boosting mAP by 2.41% or 0.25% over the state-of-the-art approach that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Stroke Rehabilitation and Recovery

MethodsConvolution · Sigmoid Activation · Max Pooling · Average Pooling