OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu, Chen Zhao, Fatimah Zohra, Mattia Soldan, Alejandro Pardo,, Mengmeng Xu, Lama Alssum, Merey Ramazanova, Juan Le\'on Alc\'azar, Anthony, Cioppa, Silvio Giancola, Carlos Hinojosa, Bernard Ghanem

TL;DR
OpenTAD is a unified, modular framework for temporal action detection that consolidates multiple methods and datasets, enabling fair benchmarking and revealing effective design choices through extensive experiments.
Contribution
It introduces OpenTAD, a comprehensive, flexible framework that standardizes evaluation of TAD methods and facilitates systematic analysis of network components.
Findings
Identified key design choices that improve detection performance.
Achieved new state-of-the-art results on multiple datasets.
Provided an open-source toolkit for fair benchmarking.
Abstract
Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. Although this field has achieved remarkable progress in recent years, further progress and real-world applications are impeded by the absence of a standardized framework. Currently, different methods are compared under different implementation settings, evaluation protocols, etc., making it difficult to assess the real effectiveness of a specific technique. To address this issue, we propose \textbf{OpenTAD}, a unified TAD framework consolidating 16 different TAD methods and 9 standard datasets into a modular codebase. In OpenTAD, minimal effort is required to replace one module with a different design, train a feature-based TAD model in end-to-end mode, or switch between the two. OpenTAD also facilitates straightforward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
