BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Min Yang; Guo Chen; Yin-Dong Zheng; Tong Lu; Limin Wang

arXiv:2205.02717·cs.CV·April 12, 2023

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, Limin Wang

PDF

Open Access 2 Repos

TL;DR

BasicTAD is a simple, efficient RGB-only baseline for temporal action detection that achieves near state-of-the-art performance and can be further improved with preservation of temporal and spatial information.

Contribution

The paper introduces BasicTAD, a straightforward and effective RGB-only baseline for TAD, emphasizing simplicity and end-to-end training, and proposes PlusTAD for enhanced performance.

Findings

01

BasicTAD achieves near state-of-the-art results.

02

PlusTAD significantly outperforms previous methods.

03

The approach is efficient and suitable for real-time applications.

Abstract

Temporal action detection (TAD) is extensively studied in the video understanding community by generally following the object detection pipeline in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low detection efficiency in TAD. In our simple baseline (termed BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We extensively investigate the existing techniques in each component for this baseline, and more importantly, perform end-to-end training over the entire pipeline thanks to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications