BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, Limin Wang

TL;DR
BasicTAD is a simple, efficient RGB-only baseline for temporal action detection that achieves near state-of-the-art performance and can be further improved with preservation of temporal and spatial information.
Contribution
The paper introduces BasicTAD, a straightforward and effective RGB-only baseline for TAD, emphasizing simplicity and end-to-end training, and proposes PlusTAD for enhanced performance.
Findings
BasicTAD achieves near state-of-the-art results.
PlusTAD significantly outperforms previous methods.
The approach is efficient and suitable for real-time applications.
Abstract
Temporal action detection (TAD) is extensively studied in the video understanding community by generally following the object detection pipeline in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low detection efficiency in TAD. In our simple baseline (termed BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We extensively investigate the existing techniques in each component for this baseline, and more importantly, perform end-to-end training over the entire pipeline thanks to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Multimodal Machine Learning Applications
