TadML: A fast temporal action detection with Mechanics-MLP

Bowen Deng; Dongchang Liu

arXiv:2206.02997·cs.CV·February 5, 2024

TadML: A fast temporal action detection with Mechanics-MLP

Bowen Deng, Dongchang Liu

PDF

Open Access 1 Repo

TL;DR

TadML introduces a fast, one-stage, anchor-free temporal action detection method using Mechanics-MLP that achieves high accuracy and significantly faster inference speeds on untrimmed videos without optical flow computation.

Contribution

The paper proposes a novel Mechanics-MLP architecture for real-time, one-stage temporal action detection using only RGB data, improving inference speed over existing methods.

Findings

01

Achieves 4.44 videos per second inference speed on THUMOS14

02

Maintains comparable accuracy with state-of-the-art models

03

Eliminates the need for optical flow computation

Abstract

Temporal Action Detection(TAD) is a crucial but challenging task in video understanding.It is aimed at detecting both the type and start-end frame for each action instance in a long, untrimmed video.Most current models adopt both RGB and Optical-Flow streams for the TAD task. Thus, original RGB frames must be converted manually into Optical-Flow frames with additional computation and time cost, which is an obstacle to achieve real-time processing. At present, many models adopt two-stage strategies, which would slow the inference speed down and complicatedly tuning on proposals generating.By comparison, we propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established. It has comparable accuracy with all existing state-of-the-art models, while surpasses the inference speed of these methods by a large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boneddeng/tadml
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsAverage Pooling · Layer Normalization · Residual Connection · Dropout · Global Average Pooling · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · MLP-Mixer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings