PcmNet: Position-Sensitive Context Modeling Network for Temporal Action   Localization

Xin Qin; Hanbin Zhao; Guangchen Lin; Hao Zeng; Songcen Xu; Xi Li

arXiv:2103.05270·cs.CV·March 10, 2021

PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization

Xin Qin, Hanbin Zhao, Guangchen Lin, Hao Zeng, Songcen Xu, Xi Li

PDF

Open Access

TL;DR

PcmNet introduces a position-sensitive context modeling approach that combines positional and semantic cues to improve the accuracy of temporal action localization in videos, achieving state-of-the-art results.

Contribution

The paper proposes a novel temporal-position-sensitive context modeling method that incorporates directed temporal positional encoding and attention mechanisms for better action boundary detection.

Findings

01

Achieves state-of-the-art performance on THUMOS-14 and ActivityNet-1.3 datasets.

02

Effectively encodes position-aware context for improved localization.

03

Enhances boundary detection and proposal evaluation accuracy.

Abstract

Temporal action localization is an important and challenging task that aims to locate temporal regions in real-world untrimmed videos where actions occur and recognize their classes. It is widely acknowledged that video context is a critical cue for video understanding, and exploiting the context has become an important strategy to boost localization performance. However, previous state-of-the-art methods focus more on exploring semantic context which captures the feature similarity among frames or proposals, and neglect positional context which is vital for temporal localization. In this paper, we propose a temporal-position-sensitive context modeling approach to incorporate both positional and semantic information for more precise action localization. Specifically, we first augment feature representations with directed temporal positional encoding, and then conduct attention-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods