OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action   Detection

Shuming Liu; Chen Zhao; Fatimah Zohra; Mattia Soldan; Alejandro Pardo,; Mengmeng Xu; Lama Alssum; Merey Ramazanova; Juan Le\'on Alc\'azar; Anthony; Cioppa; Silvio Giancola; Carlos Hinojosa; Bernard Ghanem

arXiv:2502.20361·cs.CV·February 28, 2025

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Shuming Liu, Chen Zhao, Fatimah Zohra, Mattia Soldan, Alejandro Pardo,, Mengmeng Xu, Lama Alssum, Merey Ramazanova, Juan Le\'on Alc\'azar, Anthony, Cioppa, Silvio Giancola, Carlos Hinojosa, Bernard Ghanem

PDF

Open Access 1 Repo

TL;DR

OpenTAD is a unified, modular framework for temporal action detection that consolidates multiple methods and datasets, enabling fair benchmarking and revealing effective design choices through extensive experiments.

Contribution

It introduces OpenTAD, a comprehensive, flexible framework that standardizes evaluation of TAD methods and facilitates systematic analysis of network components.

Findings

01

Identified key design choices that improve detection performance.

02

Achieved new state-of-the-art results on multiple datasets.

03

Provided an open-source toolkit for fair benchmarking.

Abstract

Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. Although this field has achieved remarkable progress in recent years, further progress and real-world applications are impeded by the absence of a standardized framework. Currently, different methods are compared under different implementation settings, evaluation protocols, etc., making it difficult to assess the real effectiveness of a specific technique. To address this issue, we propose \textbf{OpenTAD}, a unified TAD framework consolidating 16 different TAD methods and 9 standard datasets into a modular codebase. In OpenTAD, minimal effort is required to replace one module with a different design, train a feature-based TAD model in end-to-end mode, or switch between the two. OpenTAD also facilitates straightforward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sming256/OpenTAD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications