Adaptive Perception Transformer for Temporal Action Localization

Yizheng Ouyang; Tianjin Zhang; Weibo Gu; and Hongfa Wang

arXiv:2208.11908·cs.CV·September 16, 2022·5 cites

Adaptive Perception Transformer for Temporal Action Localization

Yizheng Ouyang, Tianjin Zhang, Weibo Gu, and Hongfa Wang

PDF

Open Access

TL;DR

This paper introduces AdaPerFormer, an end-to-end adaptive perception transformer that effectively models global and local contexts for accurate temporal action localization in videos.

Contribution

The paper proposes a novel dual-branch attention mechanism within an end-to-end transformer framework for improved action boundary and category prediction.

Findings

01

Achieves competitive results on THUMOS14 dataset

02

Effectively models global and local video contexts

03

Demonstrates the benefits of end-to-end design

Abstract

Temporal action localization aims to predict the boundary and category of each action instance in untrimmed long videos. Most of previous methods based on anchors or proposals neglect the global-local context interaction in entire video sequences. Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly. To address the above issues, this paper proposes a end-to-end model, called Adaptive Perception transformer (AdaPerFormer for short). Specifically, AdaPerFormer explores a dual-branch attention mechanism. One branch takes care of the global perception attention, which can model entire video sequences and aggregate global relevant contexts. While the other branch concentrates on the local convolutional shift to aggregate intra-frame and inter-frame information through our bidirectional shift operation. The end-to-end nature produces the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Multimodal Machine Learning Applications