Towards High-Quality Temporal Action Detection with Sparse Proposals

Jiannan Wu; Peize Sun; Shoufa Chen; Jiewen Yang; Zihao Qi; Lan Ma,; Ping Luo

arXiv:2109.08847·cs.CV·September 21, 2021·6 cites

Towards High-Quality Temporal Action Detection with Sparse Proposals

Jiannan Wu, Peize Sun, Shoufa Chen, Jiewen Yang, Zihao Qi, Lan Ma,, Ping Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces SP-TAD, a sparse proposal-based Transformer method for temporal action detection that effectively handles variable action durations and ambiguous boundaries, achieving state-of-the-art results on THUMOS14.

Contribution

The paper proposes Sparse Proposals within a Transformer framework to improve multi-scale feature utilization and boundary precision in temporal action detection.

Findings

01

Achieves state-of-the-art performance on THUMOS14 at high tIoU thresholds.

02

Effectively handles large variance in action durations.

03

Utilizes local segment interactions to preserve action details.

Abstract

Temporal Action Detection (TAD) is an essential and challenging topic in video understanding, aiming to localize the temporal segments containing human action instances and predict the action categories. The previous works greatly rely upon dense candidates either by designing varying anchors or enumerating all the combinations of boundaries on video sequences; therefore, they are related to complicated pipelines and sensitive hand-crafted designs. Recently, with the resurgence of Transformer, query-based methods have tended to become the rising solutions for their simplicity and flexibility. However, there still exists a performance gap between query-based methods and well-established methods. In this paper, we identify the main challenge lies in the large variants of action duration and the ambiguous boundaries for short action instances; nevertheless, quadratic-computational global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wjn922/sp-tad
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections · Label Smoothing · Multi-Head Attention · Byte Pair Encoding · Softmax