Temporal Action Localization with Enhanced Instant Discriminability

Dingfeng Shi; Qiong Cao; Yujie Zhong; Shan An; Jian Cheng; Haogang; Zhu; Dacheng Tao

arXiv:2309.05590·cs.CV·September 12, 2023·1 cites

Temporal Action Localization with Enhanced Instant Discriminability

Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang, Zhu, Dacheng Tao

PDF

Open Access 3 Repos

TL;DR

This paper introduces TriDet, a one-stage framework for temporal action detection that enhances boundary modeling and instant discriminability using a novel Trident-head, a scalable-granularity perception layer, and large pretrained models, achieving state-of-the-art results.

Contribution

The paper proposes a novel one-stage TAD framework with a Trident-head, an SGP layer, and the integration of large pretrained models to improve boundary detection and discriminability.

Findings

01

TriDet achieves state-of-the-art performance on multiple TAD datasets.

02

The SGP layer effectively mitigates rank-loss in transformer-based methods.

03

Large pretrained models enhance the representation capability for TAD.

Abstract

Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. Then, we analyze the rank-loss problem (i.e. instant discriminability deterioration) in transformer-based methods and propose an efficient scalable-granularity perception (SGP) layer to mitigate this issue. To further push the limit of instant discriminability in the video backbone, we leverage the strong representation capability of pretrained large models and investigate their performance on TAD. Last, considering the adequate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications